
An LLM internal guideline is a system of usage rules and operational procedures established by an organization to leverage generative AI in a safe and controlled manner.
"Shadow AI" — the use of generative AI in the workplace without authorization — carries the risk of unintentionally transmitting confidential information and personal data to external parties. In June 2023, the Personal Information Protection Commission published an advisory regarding the use of generative AI services, making an organizational response an urgent priority.
This article is intended for IT systems personnel and DX promotion officers, and explains the following topics in a step-by-step manner.
Conclusion: As the unauthorized use of generative AI tools in the workplace continues to grow, organizations without guidelines are left completely exposed to the risk of information leaks.
With the widespread adoption of generative AI, "shadow AI" — where employees use LLMs at their own discretion for work purposes — is rapidly increasing. The following H3 section takes a deeper look at the actual risks involved and the reasons why guidelines are necessary.
It is easy to think, "It's just individuals using it on their own, so it's not a significant risk" — but in reality, shadow AI is the least visible entry point for information leaks.
Shadow AI is a collective term for AI tools used by employees in their work without the approval of the IT department or management. Typical cases include pasting internal customer data or contract text directly into a free-plan LLM service, or entering highly confidential meeting notes into a personally contracted cloud AI.
There are four main channels through which leaks can occur. First, pasting internal documents directly into a prompt causes plaintext data to be transmitted to the service provider's servers. Second, some services have terms of use that allow input data to be used for model improvement under free or low-cost plans. Furthermore, because these tools are used through personal accounts, access rights may remain even after an employee leaves the organization. In addition, since the organization has no visibility into who entered what, investigating the cause of a leak after the fact becomes extremely difficult.
The advisory published by the Personal Information Protection Commission in June 2023 also pointed out that entering personal information into generative AI services may constitute a third-party provision under the Act on the Protection of Personal Information.
The deeper problem lies in how difficult it is for the damage to surface. Employees use AI out of a desire for convenience, with no malicious intent, and it is not uncommon for a leak to go undetected for months.
When the use of generative AI spreads without established guidelines, an organization is exposed to three major risks.
① Unintentional external transmission of confidential information Cases have been reported in which employees paste confidential business data directly into prompts. Many cloud-based LLM services may default to using input data for model improvement, creating a risk that confidential information is inadvertently shared externally. If the input contains personal information, legal liability may arise, as indicated by the GDPR and the advisory published by Japan's Personal Information Protection Commission in June 2023.
② Compliance violations and lack of audit trails When the tools used and the content entered are not recorded, tracing the root cause after an incident becomes extremely difficult. In regulated industries such as healthcare, finance, and legal services, failure to meet the record-keeping obligations stipulated by HIPAA or GDPR can lead to audit findings and penalties. Even in industries with relatively lenient regulations, the absence of an audit trail poses a management risk from an internal control perspective.
③ Decision-making errors caused by over-reliance on generated outputs LLMs can produce information that appears factual but is incorrect (hallucinations). Without guidelines, no verification procedures for generated outputs are established, increasing the risk that misinformation is used directly in external documents or decision-making.
These three risks do not occur in isolation — they tend to compound one another, amplifying the overall impact.
Conclusion: Developing guidelines begins with understanding the current situation and establishing a promotion structure. Drafting guidelines without this foundation leads to policies that exist only on paper.
Before beginning the drafting process, it is necessary to identify which AI tools are being used across which departments within the organization, and to clarify who is responsible and who will lead the initiative. Skipping this preparatory stage makes it easy to produce policies that are disconnected from reality.
Inventory management is similar to counting inventory that has no visible shadow. Tools you cannot track cannot be governed by policy, so the starting point is to visualize "what is actually being used."
Key Methods for Conducting an Inventory
How to Organize Inventory Results
Information collected through the investigation should be compiled into a list using the following items.
| Item | Example of Recorded Content |
|---|---|
| Tool name / Provider | Service name, vendor country / location |
| Department / Number of users | Department name, estimated number of users |
| Primary use case | Document creation, code generation, translation, etc. |
| Type of input data | Publicly available information only / includes internal materials, etc. |
| Contract / Approval status | Formally contracted / Personal use (free tier), etc. |
It is not uncommon for an inventory to reveal multiple tools that are "not officially approved but widely used on the ground."
Guideline development tends to be viewed as "a job for the IT department alone," but in practice it is prone to becoming a formality unless legal, HR, and business departments are also involved. When responsibilities remain ambiguous, even a completed policy often fails to take root on the ground — and many cases have been reported where shadow AI could not be suppressed as a result.
The first step in building a governance structure is to clearly define the stakeholders who should be involved, organized by role. In general, the following four roles are required:
For each role, clarifying "who decides what, who executes what, and who reports what" using a RACI matrix (Responsible / Accountable / Consulted / Informed) helps reduce misalignment in later stages.
It is also important to set deadlines and schedule regular meetings for the guideline development project. If "we'll get to it when things slow down" keeps getting repeated, AI adoption on the ground will outpace the policy. Set up a monthly review meeting on the calendar from the outset, and establish a mechanism for sharing progress and issues.
"Which AI tools are we actually allowed to use?" — Creating a state where frontline staff can answer this question on their own is the starting point for guideline development.
By simultaneously defining which tools are permitted and what data may be entered into them, inconsistent judgment on the ground can be prevented. Specifically, AI tools used within the organization are classified into three tiers — "Approved," "Conditional," and "Prohibited" — and rules are clarified by combining these tiers with the confidentiality level of input data. The following sections examine the classification criteria and how to define confidentiality levels in detail.
"Can I use this tool or not — and who am I even supposed to ask?" — The moment a frontline employee feels this way, trust in the guidelines is lost. Ambiguous boundaries are the single greatest breeding ground for shadow AI.
An effective solution is to manage AI tools using a three-tier classification system.
① Approved Tools for which the IT department has completed a security evaluation and that all employees may use by following the prescribed procedures. Prerequisites include the ability to capture usage logs and confirmation of the data processing region.
② Conditional A category in which use is permitted only for specific departments, use cases, or data types. Restrictions such as "do not enter confidential information" or "business text only, containing no personal information" must be explicitly stated. In many cases, manager approval is required before use, and the process is operated in conjunction with a request form.
③ Prohibited Tools where data may be used for model training, or where the processing region is unknown, are placed in the prohibited category. Publishing a list of reasons for prohibition makes it easier for frontline staff to understand and accept why a tool cannot be used.
Three key points are required to make the classification work effectively:
Classifying data by confidentiality level is, in essence, the task of "determining how securely something should be locked based on what's inside." Treating everything the same way creates the risk of sensitive information being passed to AI without adequate protection.
Setting confidentiality levels in three tiers leads to easier day-to-day operation on the ground:
The Personal Information Protection Commission's advisory notice published in June 2023 also calls for caution regarding the input of personal data into generative AI services. Where GDPR applies — such as in transactions with European counterparts — even stricter restrictions are required.
When establishing handling rules, the following points should be explicitly documented:
Confidentiality level classifications should not be treated as a one-time exercise — they must be reviewed whenever new types of data emerge. Combining them with the approval workflow design covered in the next step ensures that the classification rules remain functional rather than becoming a mere formality.
Even after tool classification is complete, if the process from application to approval remains unclear, frontline staff will be left uncertain about how to proceed — ultimately inviting shadow AI use under the logic of "let's just try it and see." Designing an approval flow that explicitly defines who applies for what and how becomes the next challenge. This section walks through a risk assessment checklist and an approval flow template, in that order.
By going through a risk assessment step before introducing a new AI tool into your organization, the subsequent approval flow design becomes significantly smoother.
At a minimum, your assessment checklist should cover the following areas:
Data Handling
Security & Compliance
Vendor Reliability
Business Fit
If the purpose of introducing a tool is limited to "assisting individual work," a simplified assessment (i.e., confirming the checklist above) is sufficient. However, if the tool is to be integrated into business processes that handle customer data or confidential information, a joint review by the IT department and the legal/compliance department should be considered mandatory.
Checklist outcomes should be recorded as one of three verdicts — "Approved," "Conditionally Approved (with specified restrictions)," or "Rejected" — and logged together with the rationale for the decision.
"I submitted an approval request, but it stalled because no one knew who the decision-maker was" — this is an experience many frontline staff will recognize. Unless decision-makers, alternate approvers, and deadlines are explicitly defined from the design stage, an approval flow risks becoming a mere formality.
Items to include in the template are as follows:
Three operational points are especially important to keep in mind:
Conclusion: Guidelines only become effective when paired with initial response procedures and log management for when incidents occur.
When an information breach caused by AI occurs, an unclear response procedure tends to allow the damage to spread. This section explains how to design an initial response flow and how to manage the audit log lifecycle.
Immediately after an incident occurs, the instinct is often to think "let's identify the cause before reporting" — but in practice, immediate reporting conducted in parallel with containment is more effective at preventing the damage from spreading. While time is spent investigating the root cause, the risk of leaked information propagating externally continues to grow.
The following four-step process is recommended for initial response:
The response flow should be documented in advance so that the same procedures can be followed regardless of personnel changes. Regularly conducting incident response drills (tabletop exercises) tends to improve decision-making speed when an actual emergency occurs.
Audit log design is easier to organize when approached along three axes: "what to store, where to store it, and how long to store it."
The minimum required elements for logs to be captured are as follows:
Retention periods vary depending on the confidentiality level of the data involved. For operation logs related to personal information or trade secrets, a minimum retention period of one year or more is recommended, while logs for general business use are often sufficient at around three to six months. When GDPR or the Act on the Protection of Personal Information applies, attention must also be paid to the upper limits on retention periods.
For review cycles, a two-tier structure consisting of routine reviews (monthly) and trigger-based reviews (upon incident occurrence) is the most operationally manageable. Monthly reviews should track trends in the number of anomaly detection flags, and if a threshold is exceeded, the cause should be identified. For incident response, establishing procedures in advance to preserve and analyze the relevant session logs within 72 hours will help prevent gaps in the response.
The choice of log storage location—whether on internal servers or external cloud—also affects the design of access controls. When using cloud-based LLM services, it is essential to verify the vendor's log retention policy before signing a contract and to ensure alignment with your organization's own policy.
Conclusion: Guidelines only function once they have been established and all employees understand and can put them into practice.
Even if tool classification and approval workflows are in place, shadow AI is likely to recur if frontline staff lack a proper understanding. It is essential to design training content tailored to each role and job type, and to have a mechanism in place for continuously measuring how well the content has been retained.
Requiring all employees to undergo the same uniform training is akin to mandating the same surgical training for every doctor—content that does not match a person's role is unlikely to stick, and is equally unlikely to translate into practical application. Training design should begin by categorizing employees around the axis of "who uses AI, and for what purpose."
The following are key design points to address for each role and job type:
For general employees (end users)
For managers and team leaders
For IT and information security personnel
For executives and decision-makers
In terms of training format, short e-learning modules (approximately 15–20 minutes) are suitable for general employees, while hands-on workshops are appropriate for IT personnel.
Post-training comprehension tests tend to become focused on "raising pass rates," but to genuinely measure retention, question designs that can confirm behavioral change are more effective. It is important to shift from testing memorized knowledge to assessing judgment in practical work situations.
When designing comprehension checks, use the following three perspectives as your framework:
Timing of implementation also matters. By re-administering the same questions three months after the initial training—not just immediately afterward—you can continuously monitor knowledge retention. If scores have declined, this is a signal that periodic reminder measures (such as emails or team notifications) are needed.
Additionally, make it clear to employees that the results of comprehension checks will not be used for individual performance evaluations, but rather as indicators for policy improvement. If employees feel the results will be used to evaluate them, there is a risk they will not answer honestly.
Keeping the number of questions to ten or fewer, with a target response time of around ten minutes, reduces the burden on frontline staff and makes it easier to administer the checks on an ongoing basis.
Conclusion: The most common failures in guideline development fall into two patterns—"overly strict restrictions" and "set it and forget it." Both invite non-compliant behavior in the field, making it essential to build countermeasures into the design from the outset.
Each of these is discussed in detail in the H3 sections below.
It's easy to assume that "stricter policies mean greater security," but the reality on the ground often moves in the opposite direction.
When approval processes are overly cumbersome or tools necessary for day-to-day work are banned across the board, employees tend to conclude that it's easier to simply use those tools without going through official channels. This becomes a breeding ground for shadow AI — generative AI tools used outside of organizational oversight.
Anyone working on the front lines has likely felt at some point: "If my request is just going to get rejected anyway, it's faster to use it quietly."
The following are typical patterns in which overly strict policies backfire:
In these situations, unvetted tools that carry greater risk end up being used in place of officially managed ones, which paradoxically increases the likelihood of information leaks.
An effective countermeasure is to provide a "safe escape route" rather than defaulting to outright prohibition. Expanding the lineup of approved tools and streamlining the application process creates an environment where employees are more likely to choose the official route. The strictness of a policy only works when balanced against the practical needs of those on the ground.
Guidelines are not something you create once and consider finished — they are living documents that must be continuously updated. This is because the environment is constantly changing: laws and regulations are revised, new AI tools emerge, and internal incidents occur.
Much like a vehicle inspection, a mechanism is needed to periodically check whether things are "safe to operate in their current state." Similarly, it is important to institutionalize a regular review cycle for guidelines.
Key points for designing a revision cycle are as follows:
After each revision, it is also essential to conduct a briefing and training session to ensure that all employees have a clear understanding of the updated content.
Q1. Do small and medium-sized businesses need guidelines too?
Even with a small number of employees, a policy is necessary as soon as generative AI tools are being used for business purposes. The smaller the organization, the greater the relative impact when an information leak occurs. Starting by summarizing just two points — "which tools are permitted" and "what data must not be entered" — on a single page allows you to achieve a minimum level of risk management while keeping the burden low.
Q2. How should guidelines be integrated with existing information security policies?
Appending an "addendum on AI usage" to your existing policy minimizes the cost of review and approval. Reusing existing data classification standards and data handling rules as-is, then adding AI-specific items as incremental additions — such as restrictions on prompt input and prohibitions on secondary use of outputs — tends to be more readily accepted by staff on the ground.
Q3. How should the use of free AI tools on personal devices be handled?
Using personal devices and personal accounts for business purposes is a textbook example of shadow AI. It is recommended that guidelines explicitly state: "Do not enter business data into personally owned accounts or unapproved services," and that procedures for handling violations be defined alongside this rule. Since leading with prohibition alone tends to increase workaround usage, it is important to present approved alternative tools at the same time.
Looking back at what has been covered in this article, it becomes clear that developing guidelines is not an exercise in "imposing restrictions," but rather the work of building a foundation on which employees can confidently and fully leverage AI.
The five steps — starting with understanding the current situation and establishing a governance structure, followed by the three-tier tool classification, approval workflow design, incident response and log management, and AI literacy training — do not function as independent measures, but as a connected, end-to-end process. If any one element is missing, the whole breaks down: having a classification system without a workflow leads to it becoming a formality, and conducting training without incident response procedures in place means the organization will be unable to function when something actually goes wrong.
International frameworks such as the "AI Business Operator Guidelines (Version 1.0)" published by the Ministry of Economy, Trade and Industry and the Ministry of Internal Affairs and Communications in April 2024, and the NIST AI RMF 1.0, also place the balance between risk management and the promotion of AI adoption at their core. When developing your own organization's guidelines, a practical approach is to use these as reference points while tailoring the content to fit your industry, organizational size, and the confidentiality level of your data.
Finally, a completed set of guidelines is not something to be "stored away" — it is something to be "grown." Given the pace at which generative AI technology is evolving and the rate at which the regulatory landscape is shifting, building a review cycle of every six months to a year into the design from the very beginning is what leads to stable, long-term operation.
Chi
Majored in Information Science at the National University of Laos, where he contributed to the development of statistical software, building a practical foundation in data analysis and programming. He began his career in web and application development in 2021, and from 2023 onward gained extensive hands-on experience across both frontend and backend domains. At our company, he is responsible for the design and development of AI-powered web services, and is involved in projects that integrate natural language processing (NLP), machine learning, and generative AI and large language models (LLMs) into business systems. He has a voracious appetite for keeping up with the latest technologies and places great value on moving swiftly from technical validation to production implementation.