
Model Editing is a technique for rewriting only specific knowledge in a large language model (LLM) without retraining the entire model. Its greatest feature is the ability to correct inaccurate facts, outdated information, or personal data that needs to be removed—without rebuilding the model from scratch. This article is aimed at corporate practitioners and engineers considering AI governance and personal data protection compliance. It explains, step by step, the fundamental concepts of model editing, representative methods, differences from Machine Unlearning, and practical applications related to PDPA and the "right to be forgotten." By the end, readers should have a clear framework for deciding which approach to use and in which situations within their own organizations.
The essence of model editing lies not in retraining billions of parameters in their entirety, but in targeting and rewriting only the "small subset of weights" responsible for the knowledge in question. First, let us cover the differences from retraining, the relationship with fine-tuning, and the background behind the growing interest in this technology.
"The model keeps returning the old company name"—when faced with a problem like this, the first solution that comes to mind is likely to fix the training data and retrain the model. However, retraining a large-scale model requires enormous computational resources, time, and specialized personnel, making it impractical to run just to correct a single fact. Model editing overturns this assumption. By identifying the weights that correspond to the knowledge to be updated and rewriting only that portion, corrections can sometimes be completed in a matter of seconds to minutes. If retraining is like "reprinting an entire textbook," model editing is closer to "correcting only the relevant passage on the relevant page." That said, the two approaches are not mutually exclusive: a practical division of labor is to use retraining when errors are widespread, and editing when the correction is limited to a specific fact.
Fine-tuning and LoRA are similar to model editing in that they both "adjust a model after the fact," but they differ in purpose and granularity. Fine-tuning adjusts the weights of the entire model (or broad layers) using a certain volume of data, and is a method for changing the model's "overall behavior"—including writing style, domain adaptation, and task specialization. LoRA inserts small low-rank matrices to achieve similar adaptation while drastically reducing the number of parameters that need to be adjusted. While these approaches "change the model's tendencies across a broad surface," model editing is closer to the image of "changing a single point—a specific fact." For example, the goal is to rewrite individual pieces of knowledge, such as a person's job title, while minimizing the impact on other knowledge. For this reason, unlike fine-tuning, which involves preparing a dataset and running gradient descent, model editing also encompasses methods that directly compute and update the target weights. A useful starting point is: use model editing when you need to correct a small number of facts precisely, and fine-tuning when you want to improve the model's overall behavior.
Initially, errors in models were often thought of as something that could simply be "fixed in the next version's retraining cycle." However, once deployment began, situations arose one after another where that approach could not keep up. Corporate names and regulations change frequently, misinformation and hallucinations are discovered on a case-by-case basis, and requests to delete personal data create a need to remove "that specific piece of information, right now." Waiting for retraining causes delays in response and makes the cost unjustifiable. It is from these on-the-ground demands that interest in model editing—which enables specific knowledge to be corrected quickly and at low cost—has grown. Regulatory changes also form part of the backdrop. With the EU AI Act already fully in force, and a broader trend requiring companies to be accountable for AI outputs and to manage their data, the ability to "control the knowledge a model holds" is itself becoming a governance requirement. Driven by both technical convenience and regulatory necessity, model editing is being elevated to a practical, operational concern.
If model editing is "rewriting knowledge," then Machine Unlearning is a technique focused primarily on "erasing learned information." The two overlap in some respects, but differ in purpose, evaluation approach, and regulatory connection. Here we clarify the definitions, criteria for choosing between them, and their relationship to the "right to be forgotten."
Machine Unlearning refers to a technique that removes the influence of data previously used in training from a model, bringing it closer to a state as if that data had never been learned in the first place. The goal is not so much the correction of knowledge itself, but the removal of traces left by specific data. For example, it addresses requests to retroactively extract only the influence of personal information belonging to a particular user, or text that poses rights-related issues, when such data was included in the training set. There are broadly two approaches to achieving this. One is a "exact" approach, in which data is partitioned during training and only the portion containing the target data is retrained. The other is an "approximate" approach, in which the influence of the target data is counteracted by, for example, moving gradients in the reverse direction. The exact method offers higher certainty but is costly, while the approximate method is lightweight but difficult to guarantee complete erasure. The deciding factor is how much certainty is required.
The choice between the two is determined by "what you want to achieve." If you want to replace a fact with the correct value, model editing is appropriate; if you want to erase the influence of specific data, Machine Unlearning is the better fit.
| Perspective | Model Editing | Machine Unlearning |
|---|---|---|
| Primary purpose | Rewrite knowledge to the correct value | Remove the influence of training data |
| Typical use case | Correcting misinformation or outdated facts | Deleting personal information; removing rights-infringing data |
| Evaluation focus | Success of the rewrite and side effects | Completeness of erasure and residual risk |
| Regulatory connection | Output accuracy and accountability | Right to be forgotten; data protection |
In practice, the two are often used in combination. For instance, when handling personal information, a realistic staged approach might involve first replacing the relevant knowledge with a harmless value via editing, and then considering the removal of training traces. Rather than assuming either approach alone is sufficient, it is advisable to use them complementarily according to the objective.
The "right to be forgotten" is codified as the right to have personal data erased upon the individual's request, most notably in the GDPR's right to erasure (Article 17). Similar provisions are spreading through personal data protection laws in various countries, including Thailand's PDPA (differences across countries are summarized in ASEAN Data Protection Laws: A Comprehensive Comparison of 4 Countries). The challenge is that while deleting a record from a database is straightforward, removing information that has been absorbed into a model's weights through training is technically difficult. This is precisely why Machine Unlearning is expected to serve as a technical vehicle for addressing such requests. Furthermore, frameworks such as the EU AI Act—which is already fully in force—are taking shape, requiring companies to ensure transparency and risk management in AI systems, making it increasingly important to be able to explain "what was included in the training data and how it can be controlled." Aligning in advance between legal/compliance and technical teams on the feasibility of responding to deletion requests is the most direct way to reduce regulatory risk.
Model editing techniques can be broadly divided into "local editing," which directly computes and rewrites target weights, and "meta-learning-based" approaches, which train the model on how to perform edits. Since both manipulate model weights, they can only be applied to open-weight models that can be obtained and run independently. The choice of technique varies depending on the volume of knowledge to be edited and the level of precision required. We will examine representative techniques and the key dimensions for comparing them.
Local editing methods take the approach of identifying "which weights in the model store a given fact" and rewriting precisely those locations. The representative technique, ROME (Rank-One Model Editing), treats the MLP in an intermediate layer of a Transformer as a form of associative memory, where the input (subject) serves as the key and the output (fact) serves as the value. It uses causal analysis to identify which layer is responsible for the target fact, then applies a minimal update to that layer's weights so that "inserting the key produces the new value," thereby rewriting the knowledge. MEMIT (Mass-Editing Memory in a Transformer) extends this concept to support simultaneous editing across multiple layers and thousands of entries. The approach is analogous to replacing specific entries in an associative memory, and its advantage is that it avoids disturbing the whole model the way retraining would. On the other hand, if the target fact cannot be correctly localized, unintended areas may be affected, making the precision of edit localization a critical factor in quality.
Meta-learning-based methods train an auxiliary network to learn the rule itself—"how should weights be rewritten to fix knowledge without causing adverse effects elsewhere?" The representative example, MEND, pre-trains a small network (an editor) that converts gradients obtained from the cases to be edited into appropriate weight updates for the base model. During actual editing, this editor instantly outputs the update, eliminating the need to redo heavy computation for each individual edit. MALMEN extends this idea for large-scale editing, enabling many edits to be applied together efficiently. The essential difference from local editing is that while local editing involves "manually designing how to rewrite," meta-learning-based methods "learn how to rewrite from data." Although upfront training costs are required, once prepared, edits can be applied iteratively with ease, and the approach tends to be more effective as the scale of editing grows.
When selecting a method, the key is to evaluate not only editing accuracy, but also side effects (spillover to unrelated knowledge) and operational costs together.
| Perspective | Local Editing (ROME / MEMIT) | Meta-Learning-Based (MEND / MALMEN) |
|---|---|---|
| Upfront preparation | None to minimal | Pre-training of an editor is required |
| Small-scale editing | Well-suited | Feasible, but preparation cost is relatively high |
| Large-scale editing | Handled by MEMIT | Handled by MALMEN |
| Side effect control | Depends on localization accuracy | Depends on training quality |
A common challenge across all methods is whether "while the edited knowledge is corrected, side effects on surrounding knowledge or text generation capability are avoided." For this reason, before deploying in production, it is essential to include an evaluation step that checks not only the edited knowledge, but also whether "related knowledge that was not edited" has been preserved. Since numerical performance varies depending on public benchmarks and model configurations, validation on your own target model is indispensable.
Model editing and Machine Unlearning can serve as practical implementation tools for addressing personal data deletion requests and AI governance requirements at a realistic cost, without relying on retraining. However, these techniques alone are not sufficient—they must be designed in conjunction with operational workflows and evaluation frameworks. The following examines key points for practical use from three perspectives.
To prepare for personal data deletion requests, it is important to establish a consistent end-to-end workflow that covers the steps before and after the technical operation.
Since the optimal method varies depending on the nature of the request, embedding decision criteria for "which method to use in which case" into the workflow helps reduce inconsistency in responses.
"Can your company's AI correct or delete the knowledge it holds when necessary?" — From an AI governance perspective, the ability to answer this question is itself becoming a requirement. The EU AI Act, now fully in force, requires transparency, data management, and risk mitigation mechanisms for high-risk AI applications, making it increasingly difficult to allow models to continue producing incorrect or inappropriate information. Model editing and Machine Unlearning can serve as concrete means of demonstrating that "outputs can be controlled" and that "deletion requests can be addressed technically" in response to these requirements. What matters is positioning these not as ad hoc correction tools, but as part of a governance framework. Only when the design encompasses the full operational rules — who decides on edits and by what criteria, how they are recorded, and how they are verified — can the technology function as a substantive basis for regulatory compliance. For an overview of building such a framework, the AI Governance Framework Guide for Companies Expanding into ASEAN is also a useful reference.
When it comes to hallucination countermeasures, Retrieval-Augmented Generation (RAG) — which supplies the model with accurate information — is typically the first approach mentioned (implementation best practices are covered in detail in How to Improve RAG Accuracy). However, when a model has strongly memorized an incorrect fact internally, providing correct information from the outside may still cause it to be pulled back toward that outdated knowledge. This is where model editing plays a complementary role. By rewriting a specific incorrect fact to the correct value at the level of the model's internal knowledge, it becomes easier to prevent the recurrence of errors without relying on retrieval. That said, attempting to eliminate all hallucinations through editing alone is not realistic. An effective division of responsibilities would be: use RAG to reference up-to-date external information in domains where facts change frequently, and use model editing to correct fixed facts that rarely change but are repeatedly stated incorrectly. Editing is a tool that works specifically on "fixed errors in internal knowledge," and rather than being mutually exclusive with RAG, combining the two can raise the overall effectiveness of hallucination countermeasures.
Model editing and Machine Unlearning are not cure-alls, and it is a misconception to believe that "once an operation is performed, information is completely erased or fixed." Deploying these techniques without accurately estimating residual risks and side effects creates the most dangerous situation — one where you merely believe you have addressed the problem. Two common misconceptions are worth highlighting.
Assuming that "deleted information will never surface again" is precarious. In approximate Machine Unlearning in particular, even if the influence of the target data is weakened, there is no guarantee that all traces are completely eliminated. It has been noted that cleverly phrased questions or inferences drawn from related information can partially reproduce content that was supposed to have been removed. This stems from the fact that knowledge is not stored in a single location but is distributed throughout the model. In practice, therefore, a probabilistic perspective — not "was it erased?" but "to what degree has the risk been reduced?" — is indispensable. During verification, it is not sufficient to simply confirm that the content does not reappear in response to straightforward questions; paraphrased queries and checks using peripheral information should also be used to assess the degree of residual presence. When certainty is the top priority, it may be appropriate to choose a more rigorous method that includes retraining, even at greater cost.
Edits do not always remain confined to the targeted fact; their effects can extend to related knowledge. For example, rewriting a person's organizational affiliation may destabilize answers to other questions about that same person. This occurs because knowledge within the model is interconnected. It is easy to initially assume that "fixing just one point is sufficient," but in practice, confirming that nothing around the corrected area has broken should be considered part of the same task. To limit cascading effects, it is effective to keep the scope of edits to the necessary minimum and to verify behavior after editing using test cases that include related knowledge. When applying a large number of edits at once, the edits themselves can interfere with one another and degrade quality, so it is safer to apply them in batches and insert an evaluation step at each stage. The right mindset is to accept that side effects cannot be reduced to zero, and instead manage them by defining an acceptable range.
The first step is not selecting an advanced technique, but clarifying your use case: "Which knowledge do I want to correct, why, and with what level of certainty?" Once the objective is defined, the appropriate methods and evaluation approaches will naturally narrow down. Here is a way to get started while validating incrementally.
The first thing to tackle is identifying the situations within your organization where you want to "correct or remove knowledge," and prioritizing them. For example, correcting frequently changing facts, responding to personal data deletion requests, fixing recurring fixed hallucinations — the right approach varies depending on the objective. The decision criteria are simple: three questions — "Do I want to correct it to the right value, or eliminate its influence entirely?", "Is this a small number of cases or a large volume?", and "How much certainty is required?" If it is a limited factual correction where fixing the output is sufficient, start with small-scale model editing; if it involves personal data where certainty is critical for regulatory compliance, begin by examining more rigorous methods. Trying to address everything at once will cause evaluation to break down, so the practical approach is to select one use case with significant impact that is easy to validate, assess its effectiveness and side effects there, and then expand horizontally. The choice of your first target becomes the foundation for subsequent operational design.
As a prerequisite, these techniques can only be applied to open-weight models whose weights can be downloaded and run locally — such as Llama, Mistral, Qwen, and Gemma (OpenAI has also released weights for some models). Models with non-public weights, such as Claude or the APIs for GPT and Gemini, cannot be targeted for editing because their weights are not externally accessible. For such closed models, knowledge correction takes the form of RAG, system prompts, or fine-tuning functionality provided by the vendor. In terms of tooling, open-source frameworks such as EasyEdit — which allow you to experiment with multiple editing methods in one place — are well-suited as a starting point for small-scale validation, as they enable comparison of methods such as ROME, MEMIT, and MEND within a common framework. More important than the tools, however, is the design of evaluation metrics. The quality of model editing is fundamentally measured across three dimensions: whether the targeted knowledge was successfully rewritten (reliability), whether the new answer is consistently produced even with paraphrased queries (generalization), and whether knowledge and generation capabilities unrelated to the edit are preserved (locality). Preparing an evaluation set tailored to your target model and use case, and measuring these dimensions every time, is a prerequisite for safely moving toward production use.
Q. Should I use model editing or fine-tuning? Model editing is suited for precisely correcting a small number of facts, while fine-tuning is better when you want to change the overall behavior of the model, such as writing style or domain adaptation. The two are not mutually exclusive — the basic approach is to use each according to your objective.
Q. Can Machine Unlearning completely delete training data? Complete deletion cannot be guaranteed. Approximate methods in particular may leave traces, so it is more appropriate to evaluate outcomes in terms of "how much the risk has been reduced" rather than "whether it has been erased." If certainty is the top priority, it will be necessary to consider rigorous methods that include retraining.
Q. Is it ready to use for PDPA compliance or the right to be forgotten? The technology is entering a practical stage, but it does not fulfill regulatory requirements on its own. It only functions as substantiation for regulatory compliance when combined with an operational workflow covering everything from receiving deletion requests to verification and record-keeping, along with an evaluation framework for measuring side effects.
Yusuke Ishihara
Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).