
We asked a question in Lao, and the response came back in English — that was the first wall we hit when building our first AI chatbot in Laos. For the world's major LLMs, Lao is essentially an "almost unknown language," and the same approach used for English simply doesn't reach a practical level. In this article, based on the RAG (Retrieval-Augmented Generation) + LLM architecture we actually employ in our Laos-focused projects, we walk through step by step: how to select an LLM for Lao, how to build the system, and how to automate quality evaluation. This is a practical guide for anyone who wants to build a chatbot that truly works in Lao.

An English chatbot works by calling an API and writing a prompt. However, the same approach does not work for Lao. The reason lies in the characteristics of the language itself and the bias in LLM training data.
The multilingual performance of LLMs is roughly proportional to the amount of text included in the training corpus. Looking at the data composition of Common Crawl, English accounts for approximately 46%, whereas Lao represents less than 0.01%. This disparity of more than 4,000 times manifests directly as a difference in response quality.
When our company launched its first AI project in Laos, we asked a general-purpose LLM to summarize operational manuals in Lao, only to encounter frequent mistranslations of proper nouns and grammatical breakdowns. There were even cases where "ທະນາຄານແຫ່ງ ສປປ ລາວ" (Bank of the Lao PDR) was reduced to simply "a bank in Laos," making it impossible to identify which institution was being referred to. Prompts that work without issue in English simply do not function in Lao — this is the reality of low-resource languages.
Lao has no spaces between words and no clear sentence-ending markers. While "I love cats" in English requires only 3 tokens, the Lao equivalent "ຂ້ອຍຮັກແມວ" can be split into 10 or more tokens by a BPE (Byte Pair Encoding) tokenizer.
This has two practical implications.
Many people think: if LLM performance in Lao is poor, why not "translate the question into English first → search and reason in English → translate the result back into Lao"? Our company tried this approach in the early stages.
The results were dismal.
From this experience, our company shifted its approach to processing Lao text as Lao——that is, knowledge augmentation via RAG.

The quality of a chatbot is largely determined by the choice of LLM. However, benchmarks that claim "Lao language support" are virtually nonexistent. There is no choice but to verify it yourself.
We evaluated the LLM performance in Lao language across the following 4 axes.
For the evaluation, we prepared a business FAQ in Lao language (50 questions) and had each model respond under identical conditions.
Here is the translated text:
The following summarizes the validation results for major models as of 2025.
| Evaluation Axis | Claude Sonnet (Bedrock) | GPT-4o (OpenAI) | Gemini 2.5 Pro (Google) |
|---|---|---|---|
| Everyday Conversation | ○ Grammar is generally accurate. Appropriate use of polite forms is possible. | ○ Comparable performance. Word order can become disordered in longer sentences. | △ Short sentences are manageable, but complex syntax tends to break down. |
| Technical Terminology | △ RAG supplementation is essential for financial terminology. Inaccurate on its own. | △ Comparable performance. Hallucinations are noticeable with administrative terminology. | × The majority of technical terms are replaced with English. |
| Instruction Following | ◎ Consistently adheres to constraints such as "respond in Lao only." | ○ Generally adheres, but tends to be pulled toward English when the context is in English. | △ Sporadic cases of ignoring instructions and switching from Lao to English. |
※ Pricing is subject to change over time. Due to token inflation with Lao, the effective cost is 2–3 times that of English.
What all models have in common is that they cannot accurately answer questions in specialized domains using Lao alone. Regardless of which model is chosen, knowledge supplementation via RAG is essential.
Our chatbot infrastructure uses Claude Sonnet via AWS Bedrock. There were three deciding factors.
1. Stability of instruction-following. In RAG, search results are injected into the system prompt. If the LLM fails to comply with the instruction to "answer only based on the provided context," hallucinations (responses that differ from the facts) occur. Claude has the highest adherence to this constraint, with minimal deviation even when working with Lao-language contexts.
2. Integration with the AWS ecosystem. Using Bedrock enables access control via IAM, log monitoring via CloudWatch, and private connectivity from within a VPC. Many of our clients are financial institutions, and it was a mandatory requirement that data not leave the region.
3. Flexibility for multi-model switching. Bedrock allows calling not only Claude, but also Mistral, Llama, and Amazon Nova through the same API. When a model with stronger Lao-language capabilities emerges in the future, we can switch to it without any code changes.

As confirmed in the previous chapter, no LLM possesses specialized knowledge in Lao on its own. By combining RAG, this fundamental limitation can be addressed.
RAG (Retrieval-Augmented Generation) is a technique that retrieves documents related to a user's question via vector search, injects their content into an LLM prompt, and generates a response.
The difference between a standalone LLM and RAG becomes especially pronounced in the Lao language.
| Standalone LLM | RAG | |
|---|---|---|
| Lao regulations | Almost unable to answer. Falls back on general information in English | Can answer accurately if legal documents are included in the knowledge base |
| Internal business workflows | Has no knowledge of them, naturally | Explains procedures by referencing operation manuals |
| Hallucinations | Particularly frequent in Lao | Can respond with "I don't know" when there is no source to reference |
| Latest information | Not possible beyond the training cutoff | Reflected immediately upon updating the knowledge base |
Questions that a standalone LLM can answer reasonably well in English may be completely beyond its ability in Lao. That is precisely why RAG has its greatest impact when used with the Lao language.
The pipeline we employ has the following configuration.
User's question (Lao language) ↓ [Embedding] Vectorize text ↓ [Vector Search] Retrieve 50 similar chunks using Supabase pgvector ↓ [Reranking] Filter by similarity score → narrow down to top 5 ↓ [LLM] Send context + question to AWS Bedrock Claude ↓ [Streaming] Deliver responses incrementally via SSE ↓ [Auto Scoring] Automatically measure quality scores with Mastra Evaluations
The key point is that different models are used for embedding and the LLM. A multilingual, low-cost model is used for embedding, while Claude — which excels at instruction-following — handles response generation. This combination strikes an excellent balance between cost and quality.
Embedding (text vectorization) is the most critical component that determines the retrieval accuracy of RAG.
By using a multilingual embedding model, Lao text can also be vectorized. Models that support 100 or more languages include support for Lao as well.
However, there are limitations. Because Lao has limited training data, the vector distances between synonyms and paraphrases are not as accurate as in English. Specifically, there are cases where「ສິນເຊື່ອ」(credit lending) and「ເງິນກູ້」(loan) are not placed close together as the same concept. This issue is mitigated through a pipeline design that casts a wide initial retrieval (50 results) and then narrows it down through reranking (5 results).

From here, we will explain the specific setup steps. The technology stack assumes TypeScript + Next.js + Supabase + Mastra, but the architectural concepts can be applied to other stacks as well.
Prepare the knowledge base that the chatbot will reference (operations manuals, FAQs, regulatory documents, etc.) and split it into searchable units.
Text splitting for Lao is the biggest challenge. Words are not separated by spaces, and sentence-ending punctuation (equivalent to "。") is rarely used, which means sentence splitters designed for English do not work. Our company uses Mastra RAG's splitting strategies as follows, depending on the use case.
| Splitting Strategy | Suitable Content | Performance with Lao |
|---|---|---|
| recursive | General documents | ◎ Most stable, as splitting is based on paragraphs and line breaks |
| semantic-markdown | Markdown-formatted documents | ○ High accuracy when heading structure is clearly defined |
| token | Long-form reports | ○ Mechanically splits at token limit. Works regardless of language |
| sentence | FAQs / short text collections | × Cannot detect sentence boundaries in Lao; not usable |
Recommended settings: For Lao-language documents, use recursive as the default (chunk size: 512 tokens, overlap: 50 tokens). The reason for including overlap is to ensure that even if a split point falls in the middle of a Lao-language context, the content is supplemented by the surrounding chunks.
The vectorized text chunks are stored in a searchable state. We use Supabase's pgvector extension.
There are three reasons we chose Supabase pgvector.
The key point in table design is including the language code in the metadata. When searching only Lao-language knowledge, filter with language = 'lo'; when searching across all languages, remove the filter — this switching can be achieved with a single line in a SQL WHERE clause.
Vector search alone—simply "returning results in order of similarity"—is not accurate enough. In Lao, embedding quality is not as high as in English, making it easy for low-relevance chunks to appear near the top of results.
Our pipeline filters in two stages:
Why retrieve as many as 50 results? With Lao embeddings, a chunk that should rank first can end up buried at position 20. The root cause is a synonym problem: searching for「ສິນເຊື່ອ」may fail to surface chunks containing「ເງິນກູ້」in the top results. Casting a wide net and then correcting the order through reranking results in fewer missed retrievals.
The search results are passed to an LLM to generate responses in Lao. To enhance the user experience, streaming delivery via SSE (Server-Sent Events) is adopted.
Streaming reduces TTFT (Time to First Token) to approximately 0.8 seconds. When input tokens increase due to context injection in RAG, waiting for the full response to generate takes 5–10 seconds; with streaming, the first characters begin appearing within 1 second.
Lao-Specific System Prompt Design:
For a Lao chatbot, the following instructions must be explicitly included in the system prompt.
A chatbot that merely "works" is not enough. Response quality constantly fluctuates as knowledge is added and LLMs are updated.
We use Mastra Evaluations to automatically measure the following 3 metrics in real time.
| Metric | What It Measures | Passing Threshold |
|---|---|---|
| Answer Relevancy | Whether the response accurately answers the user's question | 0.7 or above |
| Faithfulness | Whether the response is faithful to the retrieved content (no hallucinations) | 0.8 or above |
| Retrieval Precision | Whether the chunks retrieved by search are relevant to the question | 0.6 or above |
A separate LLM from the main response generation model is used for evaluation — to avoid the self-scoring bias of "grading your own answers."

Here are the failure patterns we experienced in projects for Laos that we would like to share. All of them could have been avoided if we had known about them in advance.
What happened: When splitting a Lao-language operations manual using the sentence strategy, the results were polarized — either the entire document became a single chunk, or it was fragmented byte by byte. The cause was straightforward: Lao text rarely uses sentence-ending punctuation equivalent to "。". Since the sentence splitter relies on terminal punctuation as delimiters, it cannot find any split points in Lao text.
How it was fixed: We switched to the recursive strategy. Chunking was based on line breaks and paragraph separators, with a chunk size of 512 tokens and an overlap of 50 tokens. Because Lao documents typically include line breaks between paragraphs, this approach enables practical splitting.
What happened: Because the knowledge base contained documents in both English and Japanese, English chunks frequently matched queries written in Laotian, causing the LLM to respond in English. The root cause was that no response-language instruction had been included in the system prompt.
This issue occurred intensively during the transitional period of multilinguifying internal knowledge that had been English-only until two years ago. In sections where Laotian knowledge was not yet available, only English chunks would be retrieved, pulling the LLM toward responding in English.
How it was fixed: Two countermeasures were implemented simultaneously. (1) An explicit instruction was added to the system prompt stating "Always respond in the same language as the user." (2) Language codes were assigned to the knowledge base metadata, and a filter was implemented to prioritize retrieval of Laotian chunks when the query is in Laotian.
What happened: In a multi-turn Lao conversation, retaining 20 messages worth of context caused input tokens to exceed 15,000, resulting in a noticeable degradation in response quality. Lao consumes 2–3 times more tokens than English. While 20 messages is manageable in English, in Lao it consumes the majority of the context window.
This left no room to inject RAG context (5 chunks, approximately 3,000–5,000 tokens), causing retrieved results to be truncated and leading to an increase in responses that "ignored the knowledge base."
How we fixed it: We changed the conversation history retention to be controlled by a token count limit rather than a message count. In our system, we cap recent conversation history at 8,000 tokens to ensure sufficient headroom for RAG context. For Lao, this effectively corresponds to approximately 8–10 messages.

A chatbot is not something you simply build and call it done. The addition of knowledge, LLM version upgrades, and shifts in user question patterns——these factors cause response quality to fluctuate constantly.
At our company, we use Mastra's Live Evaluations feature to score production chat responses in real time. Since the scoring runs asynchronously from response generation, it has no impact on the user-perceived latency.
The three metrics we measure — Answer Relevancy, Faithfulness, and Retrieval Precision — are stored in a database, allowing us to track trends over time. A sudden drop in scores serves as a signal indicating knowledge gaps or changes in model behavior.
Scoring all requests inflates the cost of the evaluation LLM. We vary the sampling rate by environment.
| Environment | Sampling Rate | Reason |
|---|---|---|
| Development / Staging | 100% | Evaluate all responses and use them for prompt tuning |
| Staging | 30–50% | Quality gate before release |
| Production | 10% | Keep costs down while capturing trends |
Even at 10% in production, 1,000 requests per day yields 100 scored data points per day. Reviewing the weekly averages is more than sufficient to understand quality trends.
The improvement approach differs depending on the metric.
Retrieval Precision is low (< 0.6): The search is returning irrelevant chunks. Consider adjusting the chunk size (reducing from 512 → 256 tokens), adding Lao language knowledge, and reviewing metadata filters. For Lao, reducing chunk size often leads to improvement.
Faithfulness is low (< 0.8): The LLM is supplementing information not found in the search results. Address this by strengthening the constraints in the system prompt or lowering the temperature (0.3 → 0.1). Note that hallucinations are more likely to occur in Lao than in English, as the LLM has less training data for the Lao language.
Answer Relevancy is low (< 0.7): The response is misaligned with the user's question. First, check Retrieval Precision. If there are no issues on the retrieval side, work on improving the prompt (specifying the answer format, instructing rephrasing of questions).

In fact, the less knowledge there is, the easier it is to feel the effect. Since an LLM on its own has almost no expertise in the Lao language, simply adding 50 FAQ entries worth of knowledge can dramatically change the accuracy of responses. The effect of "adding 50 entries to a zero-knowledge state" is far greater than that of "adding 50 entries to a state with 1,000 entries."
This can be handled by storing documents in each language in the knowledge base and filtering by language code in the metadata. Our system processes four languages—Japanese, English, Thai, and Lao—within a single RAG pipeline. Since Thai and Lao have closely related writing systems and grammar, they can be handled with the same chunking strategy and embedding model.
FAQ response level (100 knowledge items or fewer) can be built in 2–4 weeks. Monthly infrastructure and API costs are approximately $200–500. A full-scale chatbot including business system integration takes 2–3 months, with a monthly cost estimate of $500–2,000. The majority of costs come from LLM API usage fees, and it is important to factor into the budget that Lao language incurs 2–3 times the cost of English due to token inflation.

Lao is a language that major LLMs around the world are "almost completely unfamiliar with." That is precisely why knowledge augmentation via RAG makes a decisive difference. By combining the right chunking strategy, improved retrieval accuracy through reranking, and automated quality monitoring, it is possible to achieve reliable responses even in low-resource languages.
Let's recap the 5 steps covered in this article.
AI security measures for chatbots are covered in detail in "AI Security Checklist for Lao Businesses." For those looking to understand the full picture of AI adoption, "AI Adoption Guide for Lao Businesses" is also recommended.
Yusuke Ishihara
Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).