How to Build a Lao-Language AI Chatbot — Achieving Practical Performance with a Low-Resource Language × RAG

March 9, 2026

Lead text

We asked a question in Lao, and the response came back in English — that was the first wall we hit when building our first AI chatbot in Laos. For the world's major LLMs, Lao is essentially an "almost unknown language," and the same approach used for English simply doesn't reach a practical level. In this article, based on the RAG (Retrieval-Augmented Generation) + LLM architecture we actually employ in our Laos-focused projects, we walk through step by step: how to select an LLM for Lao, how to build the system, and how to automate quality evaluation. This is a practical guide for anyone who wants to build a chatbot that truly works in Lao.

Why is a Lao language chatbot difficult?

An English chatbot works by calling an API and writing a prompt. However, the same approach does not work for Lao. The reason lies in the characteristics of the language itself and the bias in LLM training data.

Absolute Lack of Training Data — Less Than 0.02% of English

The multilingual performance of LLMs is roughly proportional to the amount of text included in the training corpus. Looking at the data composition of Common Crawl, English accounts for approximately 46%, whereas Lao represents less than 0.01%. This disparity of more than 4,000 times manifests directly as a difference in response quality.

When our company launched its first AI project in Laos, we asked a general-purpose LLM to summarize operational manuals in Lao, only to encounter frequent mistranslations of proper nouns and grammatical breakdowns. There were even cases where "ທະນາຄານແຫ່ງ ສປປ ລາວ" (Bank of the Lao PDR) was reduced to simply "a bank in Laos," making it impossible to identify which institution was being referred to. Prompts that work without issue in English simply do not function in Lao — this is the reality of low-resource languages.

Lao Script Tokenization Issues — No Spaces, No Sentence-Final Markers

Lao has no spaces between words and no clear sentence-ending markers. While "I love cats" in English requires only 3 tokens, the Lao equivalent "ຂ້ອຍຮັກແມວ" can be split into 10 or more tokens by a BPE (Byte Pair Encoding) tokenizer.

This has two practical implications.

Context window pressure: Even with the same amount of information, Lao consumes 2–3 times as many tokens as English. As conversations grow longer, earlier context gets pushed out, degrading response quality.
Difficulty in text splitting: When splitting documents for RAG (chunking), "split errors" — where cuts occur in the middle of a word or sentence — happen frequently. Sentence splitters designed for English are largely non-functional with Lao.

Why "Translate to English First, Then Process" Doesn't Work

Many people think: if LLM performance in Lao is poor, why not "translate the question into English first → search and reason in English → translate the result back into Lao"? Our company tried this approach in the early stages.

The results were dismal.

Translation accuracy from Lao to English is poor to begin with: "ສິນເຊື່ອ" (credit lending) gets flattened to "credit," and the financial context is lost. Many Lao legal and administrative terms cannot be translated directly into English.
Double translation adds 2–3 seconds of latency: When a chatbot takes more than 5 seconds to respond, user drop-off rates spike sharply.
Proper nouns break down in translation: Lao place names (such as village names in Luang Prabang Province) and institution names become distorted through the round-trip translation process and cannot be recovered.

From this experience, our company shifted its approach to processing Lao text as Lao——that is, knowledge augmentation via RAG.

Which LLMs Can Be Used for Lao? Evaluating the Major Models

The quality of a chatbot is largely determined by the choice of LLM. However, benchmarks that claim "Lao language support" are virtually nonexistent. There is no choice but to verify it yourself.

Verification Methods and Evaluation Criteria

We evaluated the LLM performance in Lao language across the following 4 axes.

Naturalness of everyday conversation: Whether basic question-and-answer exchanges are grammatically correct and naturally coherent
Accuracy of technical terminology: Whether Lao language terms in the finance, public administration, and IT domains are handled correctly
Instruction following: Whether constraints such as "respond in Lao language" and "respond based only on the referenced information" are adhered to
Cost efficiency: Since Lao language causes token inflation, the cost is 2–3 times higher than English. The effective cost taking this into account

For the evaluation, we prepared a business FAQ in Lao language (50 questions) and had each model respond under identical conditions.

Lao Language Performance Comparison Table by Model

Here is the translated text:

The following summarizes the validation results for major models as of 2025.

Evaluation Axis	Claude Sonnet (Bedrock)	GPT-4o (OpenAI)	Gemini 2.5 Pro (Google)
Everyday Conversation	○ Grammar is generally accurate. Appropriate use of polite forms is possible.	○ Comparable performance. Word order can become disordered in longer sentences.	△ Short sentences are manageable, but complex syntax tends to break down.
Technical Terminology	△ RAG supplementation is essential for financial terminology. Inaccurate on its own.	△ Comparable performance. Hallucinations are noticeable with administrative terminology.	× The majority of technical terms are replaced with English.
Instruction Following	◎ Consistently adheres to constraints such as "respond in Lao only."	○ Generally adheres, but tends to be pulled toward English when the context is in English.	△ Sporadic cases of ignoring instructions and switching from Lao to English.

※ Pricing is subject to change over time. Due to token inflation with Lao, the effective cost is 2–3 times that of English.

What all models have in common is that they cannot accurately answer questions in specialized domains using Lao alone. Regardless of which model is chosen, knowledge supplementation via RAG is essential.

Why We Chose Bedrock Claude

Our chatbot infrastructure uses Claude Sonnet via AWS Bedrock. There were three deciding factors.

1. Stability of instruction-following. In RAG, search results are injected into the system prompt. If the LLM fails to comply with the instruction to "answer only based on the provided context," hallucinations (responses that differ from the facts) occur. Claude has the highest adherence to this constraint, with minimal deviation even when working with Lao-language contexts.

2. Integration with the AWS ecosystem. Using Bedrock enables access control via IAM, log monitoring via CloudWatch, and private connectivity from within a VPC. Many of our clients are financial institutions, and it was a mandatory requirement that data not leave the region.

3. Flexibility for multi-model switching. Bedrock allows calling not only Claude, but also Mistral, Llama, and Amazon Nova through the same API. When a model with stronger Lao-language capabilities emerges in the future, we can switch to it without any code changes.

Why RAG is Essential for a Lao Language Chatbot

As confirmed in the previous chapter, no LLM possesses specialized knowledge in Lao on its own. By combining RAG, this fundamental limitation can be addressed.

LLMs Alone Have Almost Zero Expertise in Lao Language

RAG (Retrieval-Augmented Generation) is a technique that retrieves documents related to a user's question via vector search, injects their content into an LLM prompt, and generates a response.

The difference between a standalone LLM and RAG becomes especially pronounced in the Lao language.

	Standalone LLM	RAG
Lao regulations	Almost unable to answer. Falls back on general information in English	Can answer accurately if legal documents are included in the knowledge base
Internal business workflows	Has no knowledge of them, naturally	Explains procedures by referencing operation manuals
Hallucinations	Particularly frequent in Lao	Can respond with "I don't know" when there is no source to reference
Latest information	Not possible beyond the training cutoff	Reflected immediately upon updating the knowledge base

Questions that a standalone LLM can answer reasonably well in English may be completely beyond its ability in Lao. That is precisely why RAG has its greatest impact when used with the Lao language.

Multilingual RAG Architecture

The pipeline we employ has the following configuration.

User's question (Lao language)
  ↓
[Embedding] Vectorize text
  ↓
[Vector Search] Retrieve 50 similar chunks using Supabase pgvector
  ↓
[Reranking] Filter by similarity score → narrow down to top 5
  ↓
[LLM] Send context + question to AWS Bedrock Claude
  ↓
[Streaming] Deliver responses incrementally via SSE
  ↓
[Auto Scoring] Automatically measure quality scores with Mastra Evaluations

The key point is that different models are used for embedding and the LLM. A multilingual, low-cost model is used for embedding, while Claude — which excels at instruction-following — handles response generation. This combination strikes an excellent balance between cost and quality.

Performance and Limitations of Embedding Models in Lao Language

Embedding (text vectorization) is the most critical component that determines the retrieval accuracy of RAG.

By using a multilingual embedding model, Lao text can also be vectorized. Models that support 100 or more languages include support for Lao as well.

However, there are limitations. Because Lao has limited training data, the vector distances between synonyms and paraphrases are not as accurate as in English. Specifically, there are cases where「ສິນເຊື່ອ」(credit lending) and「ເງິນກູ້」(loan) are not placed close together as the same concept. This issue is mitigated through a pipeline design that casts a wide initial retrieval (50 results) and then narrows it down through reranking (5 results).

Construction Procedure — Building a Lao Chatbot in 5 Steps

From here, we will explain the specific setup steps. The technology stack assumes TypeScript + Next.js + Supabase + Mastra, but the architectural concepts can be applied to other stacks as well.

Step 1: Preparation of Lao Language Knowledge and Text Splitting Strategy

Prepare the knowledge base that the chatbot will reference (operations manuals, FAQs, regulatory documents, etc.) and split it into searchable units.

Text splitting for Lao is the biggest challenge. Words are not separated by spaces, and sentence-ending punctuation (equivalent to "。") is rarely used, which means sentence splitters designed for English do not work. Our company uses Mastra RAG's splitting strategies as follows, depending on the use case.

Splitting Strategy	Suitable Content	Performance with Lao
recursive	General documents	◎ Most stable, as splitting is based on paragraphs and line breaks
semantic-markdown	Markdown-formatted documents	○ High accuracy when heading structure is clearly defined
token	Long-form reports	○ Mechanically splits at token limit. Works regardless of language
sentence	FAQs / short text collections	× Cannot detect sentence boundaries in Lao; not usable

Recommended settings: For Lao-language documents, use recursive as the default (chunk size: 512 tokens, overlap: 50 tokens). The reason for including overlap is to ensure that even if a split point falls in the middle of a Lao-language context, the content is supplemented by the surrounding chunks.

Step 2: Building the Vector DB (Supabase pgvector)

The vectorized text chunks are stored in a searchable state. We use Supabase's pgvector extension.

There are three reasons we chose Supabase pgvector.

Tenant isolation: PostgreSQL's RLS (Row Level Security) allows knowledge to be isolated per client
No additional cost: Simply add the extension to an existing Supabase project. No need for external services such as Pinecone
SQL integration: Vector search and standard filters (language codes, categories, etc.) can be combined in a single query

The key point in table design is including the language code in the metadata. When searching only Lao-language knowledge, filter with language = 'lo'; when searching across all languages, remove the filter — this switching can be achieved with a single line in a SQL WHERE clause.

Step 3: Implementing the Search Pipeline (Retrieve 50 → Reranking → 5)

Vector search alone—simply "returning results in order of similarity"—is not accurate enough. In Lao, embedding quality is not as high as in English, making it easy for low-relevance chunks to appear near the top of results.

Our pipeline filters in two stages:

Initial retrieval: Broadly fetch the top 50 results using pgvector
Similarity filter + reranking: Remove noise using cosine similarity scores, then pass the top 5 reranked results to the LLM

Why retrieve as many as 50 results? With Lao embeddings, a chunk that should rank first can end up buried at position 20. The root cause is a synonym problem: searching for「ສິນເຊື່ອ」may fail to surface chunks containing「ເງິນກູ້」in the top results. Casting a wide net and then correcting the order through reranking results in fewer missed retrievals.

Step 4: Implementing LLM Streaming Responses and Language Control

The search results are passed to an LLM to generate responses in Lao. To enhance the user experience, streaming delivery via SSE (Server-Sent Events) is adopted.

Streaming reduces TTFT (Time to First Token) to approximately 0.8 seconds. When input tokens increase due to context injection in RAG, waiting for the full response to generate takes 5–10 seconds; with streaming, the first characters begin appearing within 1 second.

Lao-Specific System Prompt Design:

For a Lao chatbot, the following instructions must be explicitly included in the system prompt.

Fixing the response language: "If the user asks a question in Lao, always respond in Lao. Even if the search results are in English or Japanese, translate the response into Lao before outputting it." — Without this instruction, the LLM may respond in English when referencing English-language knowledge.
Context constraints: "Respond only based on the provided reference information. If the information is not available, say so." — Since the hallucination rate in Lao is higher than in English, explicitly stating this constraint is essential.
Citing sources: Instruct the model to return the source of the chunks used in the response. By allowing users to verify the basis of the answer, reliability can be ensured even for low-resource languages.

Step 5: Detecting Degradation with Automated RAG Quality Evaluation

A chatbot that merely "works" is not enough. Response quality constantly fluctuates as knowledge is added and LLMs are updated.

We use Mastra Evaluations to automatically measure the following 3 metrics in real time.

Metric	What It Measures	Passing Threshold
Answer Relevancy	Whether the response accurately answers the user's question	0.7 or above
Faithfulness	Whether the response is faithful to the retrieved content (no hallucinations)	0.8 or above
Retrieval Precision	Whether the chunks retrieved by search are relevant to the question	0.6 or above

A separate LLM from the main response generation model is used for evaluation — to avoid the self-scoring bias of "grading your own answers."

3 Pitfalls I Actually Encountered When Building a Lao Language Chatbot

Here are the failure patterns we experienced in projects for Laos that we would like to share. All of them could have been avoided if we had known about them in advance.

1. When the Sentence Splitter Goes Silent — The Problem of Lao Having No Sentence-Final Punctuation

What happened: When splitting a Lao-language operations manual using the sentence strategy, the results were polarized — either the entire document became a single chunk, or it was fragmented byte by byte. The cause was straightforward: Lao text rarely uses sentence-ending punctuation equivalent to "。". Since the sentence splitter relies on terminal punctuation as delimiters, it cannot find any split points in Lao text.

How it was fixed: We switched to the recursive strategy. Chunking was based on line breaks and paragraph separators, with a chunk size of 512 tokens and an overlap of 50 tokens. Because Lao documents typically include line breaks between paragraphs, this approach enables practical splitting.

2. The Problem of Getting English Responses When Asking in Lao

What happened: Because the knowledge base contained documents in both English and Japanese, English chunks frequently matched queries written in Laotian, causing the LLM to respond in English. The root cause was that no response-language instruction had been included in the system prompt.

This issue occurred intensively during the transitional period of multilinguifying internal knowledge that had been English-only until two years ago. In sections where Laotian knowledge was not yet available, only English chunks would be retrieved, pulling the LLM toward responding in English.

How it was fixed: Two countermeasures were implemented simultaneously. (1) An explicit instruction was added to the system prompt stating "Always respond in the same language as the user." (2) Language codes were assigned to the knowledge base metadata, and a filter was implemented to prioritize retrieval of Laotian chunks when the query is in Laotian.

3. The Problem of Context Window Exhaustion Due to Token Inflation in Lao Script

What happened: In a multi-turn Lao conversation, retaining 20 messages worth of context caused input tokens to exceed 15,000, resulting in a noticeable degradation in response quality. Lao consumes 2–3 times more tokens than English. While 20 messages is manageable in English, in Lao it consumes the majority of the context window.

This left no room to inject RAG context (5 chunks, approximately 3,000–5,000 tokens), causing retrieved results to be truncated and leading to an increase in responses that "ignored the knowledge base."

How we fixed it: We changed the conversation history retention to be controlled by a token count limit rather than a message count. In our system, we cap recent conversation history at 8,000 tokens to ensure sufficient headroom for RAG context. For Lao, this effectively corresponds to approximately 8–10 messages.

Operations and Improvement — Maintaining a "Useful Bot" with Quality Scores

A chatbot is not something you simply build and call it done. The addition of knowledge, LLM version upgrades, and shifts in user question patterns——these factors cause response quality to fluctuate constantly.

Automatic Scoring with Mastra Live Evaluations

At our company, we use Mastra's Live Evaluations feature to score production chat responses in real time. Since the scoring runs asynchronously from response generation, it has no impact on the user-perceived latency.

The three metrics we measure — Answer Relevancy, Faithfulness, and Retrieval Precision — are stored in a database, allowing us to track trends over time. A sudden drop in scores serves as a signal indicating knowledge gaps or changes in model behavior.

Sampling Strategy and Cost Management

Scoring all requests inflates the cost of the evaluation LLM. We vary the sampling rate by environment.

Environment	Sampling Rate	Reason
Development / Staging	100%	Evaluate all responses and use them for prompt tuning
Staging	30–50%	Quality gate before release
Production	10%	Keep costs down while capturing trends

Even at 10% in production, 1,000 requests per day yields 100 scored data points per day. Reviewing the weekly averages is more than sufficient to understand quality trends.

Improvement Flow When Scores Drop

The improvement approach differs depending on the metric.

Retrieval Precision is low (< 0.6): The search is returning irrelevant chunks. Consider adjusting the chunk size (reducing from 512 → 256 tokens), adding Lao language knowledge, and reviewing metadata filters. For Lao, reducing chunk size often leads to improvement.

Faithfulness is low (< 0.8): The LLM is supplementing information not found in the search results. Address this by strengthening the constraints in the system prompt or lowering the temperature (0.3 → 0.1). Note that hallucinations are more likely to occur in Lao than in English, as the LLM has less training data for the Lao language.

Answer Relevancy is low (< 0.7): The response is misaligned with the user's question. First, check Retrieval Precision. If there are no issues on the retrieval side, work on improving the prompt (specifying the answer format, instructing rephrasing of questions).

FAQ

Q1: Is RAG effective even with limited Lao language knowledge?

In fact, the less knowledge there is, the easier it is to feel the effect. Since an LLM on its own has almost no expertise in the Lao language, simply adding 50 FAQ entries worth of knowledge can dramatically change the accuracy of responses. The effect of "adding 50 entries to a zero-knowledge state" is far greater than that of "adding 50 entries to a state with 1,000 entries."

Q2: Can a single bot handle Lao and other languages (Thai/English)?

This can be handled by storing documents in each language in the knowledge base and filtering by language code in the metadata. Our system processes four languages—Japanese, English, Thai, and Lao—within a single RAG pipeline. Since Thai and Lao have closely related writing systems and grammar, they can be handled with the same chunking strategy and embedding model.

Q3: What are the estimated construction costs and timeline?

FAQ response level (100 knowledge items or fewer) can be built in 2–4 weeks. Monthly infrastructure and API costs are approximately $200–500. A full-scale chatbot including business system integration takes 2–3 months, with a monthly cost estimate of $500–2,000. The majority of costs come from LLM API usage fees, and it is important to factor into the budget that Lao language incurs 2–3 times the cost of English due to token inflation.

Summary & Next Steps

Lao is a language that major LLMs around the world are "almost completely unfamiliar with." That is precisely why knowledge augmentation via RAG makes a decisive difference. By combining the right chunking strategy, improved retrieval accuracy through reranking, and automated quality monitoring, it is possible to achieve reliable responses even in low-resource languages.

Let's recap the 5 steps covered in this article.

Text splitting tailored to the characteristics of Lao (recursive strategy + overlap)
Vector DB construction with Supabase pgvector (metadata with language code)
Retrieve 50 results → reranking → search pipeline narrowed to 5 results
Streaming responses with Bedrock Claude + language-control prompts
Automated monitoring of 3 metrics with Mastra Evaluations

AI security measures for chatbots are covered in detail in "AI Security Checklist for Lao Businesses." For those looking to understand the full picture of AI adoption, "AI Adoption Guide for Lao Businesses" is also recommended.

Author & Supervisor

Yusuke Ishihara

Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).