LLM Internal Guideline Development Guide: How to Create Operational Policies to Prevent Shadow AI Risks

June 16, 2026

What Are LLM Internal Guidelines? A Step-by-Step Guide to Drafting and Embedding Company AI Policy from Scratch

An LLM internal guideline is a system of usage rules and operational procedures established by an organization to leverage generative AI in a safe and controlled manner.

"Shadow AI" — the use of generative AI in the workplace without authorization — carries the risk of unintentionally transmitting confidential information and personal data to external parties. In June 2023, the Personal Information Protection Commission published an advisory regarding the use of generative AI services, making an organizational response an urgent priority.

This article is intended for IT systems personnel and DX promotion officers, and explains the following topics in a step-by-step manner.

Why Do Organizations Need LLM Internal Guidelines Now?

Conclusion: As the unauthorized use of generative AI tools in the workplace continues to grow, organizations without guidelines are left completely exposed to the risk of information leaks.

With the widespread adoption of generative AI, "shadow AI" — where employees use LLMs at their own discretion for work purposes — is rapidly increasing. The following H3 section takes a deeper look at the actual risks involved and the reasons why guidelines are necessary.

The Reality of Information Leaks Caused by Shadow AI

It is easy to think, "It's just individuals using it on their own, so it's not a significant risk" — but in reality, shadow AI is the least visible entry point for information leaks.

Shadow AI is a collective term for AI tools used by employees in their work without the approval of the IT department or management. Typical cases include pasting internal customer data or contract text directly into a free-plan LLM service, or entering highly confidential meeting notes into a personally contracted cloud AI.

There are four main channels through which leaks can occur. First, pasting internal documents directly into a prompt causes plaintext data to be transmitted to the service provider's servers. Second, some services have terms of use that allow input data to be used for model improvement under free or low-cost plans. Furthermore, because these tools are used through personal accounts, access rights may remain even after an employee leaves the organization. In addition, since the organization has no visibility into who entered what, investigating the cause of a leak after the fact becomes extremely difficult.

The advisory published by the Personal Information Protection Commission in June 2023 also pointed out that entering personal information into generative AI services may constitute a third-party provision under the Act on the Protection of Personal Information.

The deeper problem lies in how difficult it is for the damage to surface. Employees use AI out of a desire for convenience, with no malicious intent, and it is not uncommon for a leak to go undetected for months.

Three Risks of Deploying AI Without Guidelines

When the use of generative AI spreads without established guidelines, an organization is exposed to three major risks.

① Unintentional external transmission of confidential information Cases have been reported in which employees paste confidential business data directly into prompts. Many cloud-based LLM services may default to using input data for model improvement, creating a risk that confidential information is inadvertently shared externally. If the input contains personal information, legal liability may arise, as indicated by the GDPR and the advisory published by Japan's Personal Information Protection Commission in June 2023.

② Compliance violations and lack of audit trails When the tools used and the content entered are not recorded, tracing the root cause after an incident becomes extremely difficult. In regulated industries such as healthcare, finance, and legal services, failure to meet the record-keeping obligations stipulated by HIPAA or GDPR can lead to audit findings and penalties. Even in industries with relatively lenient regulations, the absence of an audit trail poses a management risk from an internal control perspective.

③ Decision-making errors caused by over-reliance on generated outputs LLMs can produce information that appears factual but is incorrect (hallucinations). Without guidelines, no verification procedures for generated outputs are established, increasing the risk that misinformation is used directly in external documents or decision-making.

These three risks do not occur in isolation — they tend to compound one another, amplifying the overall impact.

Prerequisites for Drafting Guidelines: Assessing the Current State and Building a Framework

Conclusion: Developing guidelines begins with understanding the current situation and establishing a promotion structure. Drafting guidelines without this foundation leads to policies that exist only on paper.

Before beginning the drafting process, it is necessary to identify which AI tools are being used across which departments within the organization, and to clarify who is responsible and who will lead the initiative. Skipping this preparatory stage makes it easy to produce policies that are disconnected from reality.

How to Audit Internal AI Tool Usage

Inventory management is similar to counting inventory that has no visible shadow. Tools you cannot track cannot be governed by policy, so the starting point is to visualize "what is actually being used."

Key Methods for Conducting an Inventory

Reviewing network logs: Extract access history to generative AI services from proxy and firewall logs. Focusing on the domains of major services such as ChatGPT, Copilot, and Gemini improves efficiency.
Survey-based investigation: Ask each department to report the AI tools they use for work, along with their purpose and frequency of use. Using an anonymous format tends to increase the reporting rate of unofficial usage.
Leveraging SaaS management tools: If a SaaS management solution is already in place, AI categories can be extracted from the application inventory.

How to Organize Inventory Results

Information collected through the investigation should be compiled into a list using the following items.

Item	Example of Recorded Content
Tool name / Provider	Service name, vendor country / location
Department / Number of users	Department name, estimated number of users
Primary use case	Document creation, code generation, translation, etc.
Type of input data	Publicly available information only / includes internal materials, etc.
Contract / Approval status	Formally contracted / Personal use (free tier), etc.

It is not uncommon for an inventory to reveal multiple tools that are "not officially approved but widely used on the ground."

Assigning Stakeholders and Defining Responsibilities

Guideline development tends to be viewed as "a job for the IT department alone," but in practice it is prone to becoming a formality unless legal, HR, and business departments are also involved. When responsibilities remain ambiguous, even a completed policy often fails to take root on the ground — and many cases have been reported where shadow AI could not be suppressed as a result.

The first step in building a governance structure is to clearly define the stakeholders who should be involved, organized by role. In general, the following four roles are required:

Owner (CISO or DX Promotion Lead): Holds decision-making authority and final approval over the entire policy.
Legal / Compliance: Reviews compliance with GDPR, the Act on the Protection of Personal Information, and other applicable regulations.
IT / Information Systems: Responsible for technical evaluation of tools, log management, and access control.
Business Unit Representative: Reflects frontline needs and supports the design of practical, effective rules.

For each role, clarifying "who decides what, who executes what, and who reports what" using a RACI matrix (Responsible / Accountable / Consulted / Informed) helps reduce misalignment in later stages.

It is also important to set deadlines and schedule regular meetings for the guideline development project. If "we'll get to it when things slow down" keeps getting repeated, AI adoption on the ground will outpace the policy. Set up a monthly review meeting on the calendar from the outset, and establish a mechanism for sharing progress and issues.

Step 1: Classify Permitted AI Tools and Prohibited Actions

"Which AI tools are we actually allowed to use?" — Creating a state where frontline staff can answer this question on their own is the starting point for guideline development.

By simultaneously defining which tools are permitted and what data may be entered into them, inconsistent judgment on the ground can be prevented. Specifically, AI tools used within the organization are classified into three tiers — "Approved," "Conditional," and "Prohibited" — and rules are clarified by combining these tiers with the confidentiality level of input data. The following sections examine the classification criteria and how to define confidentiality levels in detail.

Three-Tier Classification: Approved, Conditional, and Prohibited

"Can I use this tool or not — and who am I even supposed to ask?" — The moment a frontline employee feels this way, trust in the guidelines is lost. Ambiguous boundaries are the single greatest breeding ground for shadow AI.

An effective solution is to manage AI tools using a three-tier classification system.

① Approved Tools for which the IT department has completed a security evaluation and that all employees may use by following the prescribed procedures. Prerequisites include the ability to capture usage logs and confirmation of the data processing region.

② Conditional A category in which use is permitted only for specific departments, use cases, or data types. Restrictions such as "do not enter confidential information" or "business text only, containing no personal information" must be explicitly stated. In many cases, manager approval is required before use, and the process is operated in conjunction with a request form.

③ Prohibited Tools where data may be used for model training, or where the processing region is unknown, are placed in the prohibited category. Publishing a list of reasons for prohibition makes it easier for frontline staff to understand and accept why a tool cannot be used.

Three key points are required to make the classification work effectively:

Setting Data Confidentiality Levels and Handling Rules for Inputs

Classifying data by confidentiality level is, in essence, the task of "determining how securely something should be locked based on what's inside." Treating everything the same way creates the risk of sensitive information being passed to AI without adequate protection.

Setting confidentiality levels in three tiers leads to easier day-to-day operation on the ground:

Level 1 (Public Information): Marketing copy and product specifications already published externally. Input into external cloud-based LLMs is permitted.
Level 2 (Internal Use Only): Internal manuals, meeting minutes, unpublished financial data, etc. May only be used with an approved enterprise plan or in an on-premises environment.
Level 3 (Confidential / Personal Information): Customer personal data, contracts, personnel evaluation data, etc. Input into AI is prohibited in principle; anonymization and masking are mandatory.

The Personal Information Protection Commission's advisory notice published in June 2023 also calls for caution regarding the input of personal data into generative AI services. Where GDPR applies — such as in transactions with European counterparts — even stricter restrictions are required.

When establishing handling rules, the following points should be explicitly documented:

A checklist procedure for verifying the confidentiality level before inputting data
Specific methods for anonymization and masking (e.g., replacing names with "Mr./Ms. ○○")
Whether output results may be shared externally, and the applicable retention period

Confidentiality level classifications should not be treated as a one-time exercise — they must be reviewed whenever new types of data emerge. Combining them with the approval workflow design covered in the next step ensures that the classification rules remain functional rather than becoming a mere formality.

Step 2: Design the AI Tool Request and Approval Workflow

Even after tool classification is complete, if the process from application to approval remains unclear, frontline staff will be left uncertain about how to proceed — ultimately inviting shadow AI use under the logic of "let's just try it and see." Designing an approval flow that explicitly defines who applies for what and how becomes the next challenge. This section walks through a risk assessment checklist and an approval flow template, in that order.

Risk Assessment Checklist for Introducing New AI Tools

By going through a risk assessment step before introducing a new AI tool into your organization, the subsequent approval flow design becomes significantly smoother.

At a minimum, your assessment checklist should cover the following areas:

Data Handling

Is input data transmitted to the service provider's servers?
Is it possible to opt out of having data used for training?
Does the use case involve inputting personal or confidential information?

Security & Compliance

Does the tool hold third-party certifications such as SOC 2 Type II or ISO 27001?
Compatibility with GDPR and the Personal Information Protection Commission's "Cautionary Notice Regarding the Use of Generative AI Services" (June 2023)
Can a contractual Data Processing Agreement (DPA) be established?

Vendor Reliability

Risks related to company size and service continuity
Whether incident notification obligations and SLAs are in place

Business Fit

Compatibility with existing IT assets and authentication infrastructure (e.g., SSO)
Scale of the intended department and estimated number of users

If the purpose of introducing a tool is limited to "assisting individual work," a simplified assessment (i.e., confirming the checklist above) is sufficient. However, if the tool is to be integrated into business processes that handle customer data or confidential information, a joint review by the IT department and the legal/compliance department should be considered mandatory.

Checklist outcomes should be recorded as one of three verdicts — "Approved," "Conditionally Approved (with specified restrictions)," or "Rejected" — and logged together with the rationale for the decision.

Approval Workflow Template and Operational Considerations

"I submitted an approval request, but it stalled because no one knew who the decision-maker was" — this is an experience many frontline staff will recognize. Unless decision-makers, alternate approvers, and deadlines are explicitly defined from the design stage, an approval flow risks becoming a mere formality.

Items to include in the template are as follows:

Applicant information: Name, department, and intended start date
Tool overview: Service name, provider, purpose of use, and target business process
Confidentiality level of input data: The classification defined in the previous section (Public / Internal Use Only / Confidential)
Risk assessment results: Checklist score and determination category
Approvers: A two-stage structure is recommended as the baseline — first-stage approval (direct line manager) → second-stage approval (IT department or security officer)

Three operational points are especially important to keep in mind:

Set approval deadlines: By specifying deadlines — such as first-stage approval within 5 business days and final approval within 10 business days of submission — you can prevent situations where staff begin using a tool informally while waiting for approval.
Conduct periodic continuation reviews: Even tools that have already been approved should be re-evaluated every six months to a year to check for changes in service specifications and any security incidents that may have occurred.
Close off emergency-use workarounds: To prevent "I just tried it out" situations, prohibit trial use before formal approval as a general rule. If testing is genuinely necessary, establish a policy that limits it strictly to a sandbox environment managed by the IT department.

Step 3: Establish Incident Response Procedures and Log Management

Conclusion: Guidelines only become effective when paired with initial response procedures and log management for when incidents occur.

When an information breach caused by AI occurs, an unclear response procedure tends to allow the damage to spread. This section explains how to design an initial response flow and how to manage the audit log lifecycle.

Initial Response Flow for AI-Related Information Leaks

Immediately after an incident occurs, the instinct is often to think "let's identify the cause before reporting" — but in practice, immediate reporting conducted in parallel with containment is more effective at preventing the damage from spreading. While time is spent investigating the root cause, the risk of leaked information propagating externally continues to grow.

The following four-step process is recommended for initial response:

Detection & Primary Confirmation (0–30 minutes): Confirm any suspicious outputs, transmission logs, or access anomalies, and temporarily suspend access to the AI tools involved. The person in charge should not make decisions alone — notify their direct supervisor and the security officer immediately.
Scope of Impact Assessment (30 minutes–2 hours): Document what data was input, when, and into which tool. The Personal Information Protection Commission's "Cautionary Notice Regarding the Use of Generative AI Services" (June 2023) also recommends understanding the status of personal data inputs.
Containment & Evidence Preservation (2–4 hours): Preserve logs from the relevant sessions and disable the tool's API integrations and accounts. Save logs that serve as evidence to a separate medium to prevent deletion or overwriting.
Reporting & Notification (4 hours onward): Report to senior management, legal, and — where applicable — regulatory authorities, in accordance with internal policies. Note that where GDPR applies, notification to the relevant authority within 72 hours is mandatory.

The response flow should be documented in advance so that the same procedures can be followed regardless of personnel changes. Regularly conducting incident response drills (tabletop exercises) tends to improve decision-making speed when an actual emergency occurs.

Designing the Audit Log Collection, Retention, and Review Cycle

Audit log design is easier to organize when approached along three axes: "what to store, where to store it, and how long to store it."

The minimum required elements for logs to be captured are as follows:

Operation logs: Tool name, user ID, date and time of use, session ID
Input logs: Prompt category classification (with confidentiality level tags)
Output logs: Hash value or summary of generated text (whether to store full text should be determined based on data volume considerations)
Anomaly detection logs: Details of operations that triggered a policy violation flag

Retention periods vary depending on the confidentiality level of the data involved. For operation logs related to personal information or trade secrets, a minimum retention period of one year or more is recommended, while logs for general business use are often sufficient at around three to six months. When GDPR or the Act on the Protection of Personal Information applies, attention must also be paid to the upper limits on retention periods.

For review cycles, a two-tier structure consisting of routine reviews (monthly) and trigger-based reviews (upon incident occurrence) is the most operationally manageable. Monthly reviews should track trends in the number of anomaly detection flags, and if a threshold is exceeded, the cause should be identified. For incident response, establishing procedures in advance to preserve and analyze the relevant session logs within 72 hours will help prevent gaps in the response.

The choice of log storage location—whether on internal servers or external cloud—also affects the design of access controls. When using cloud-based LLM services, it is essential to verify the vendor's log retention policy before signing a contract and to ensure alignment with your organization's own policy.

Step 4: Design Company-Wide AI Literacy Training

Conclusion: Guidelines only function once they have been established and all employees understand and can put them into practice.

Even if tool classification and approval workflows are in place, shadow AI is likely to recur if frontline staff lack a proper understanding. It is essential to design training content tailored to each role and job type, and to have a mechanism in place for continuously measuring how well the content has been retained.

Key Points for Designing Training Content by Role and Job Type

Requiring all employees to undergo the same uniform training is akin to mandating the same surgical training for every doctor—content that does not match a person's role is unlikely to stick, and is equally unlikely to translate into practical application. Training design should begin by categorizing employees around the axis of "who uses AI, and for what purpose."

The following are key design points to address for each role and job type:

For general employees (end users)

Communicating an overview of the guidelines and prohibited actions (such as the prohibition on entering confidential information)
How to operate approved tools and the reporting route when a problem occurs
Role-play formats using actual business scenarios are effective

For managers and team leaders

Perspectives for understanding and supervising subordinates' AI usage
Escalation procedures when a guideline violation is discovered
Criteria for risk judgment (what can be decided at the frontline level, and what must be escalated)

For IT and information security personnel

Evaluating tool technical specifications and verifying data flows during API integration
Practical procedures for log capture and audit response
How to use incident response checklists for initial action when an incident occurs

For executives and decision-makers

The company-wide scope of AI risk and where management responsibility lies
An overview of governance based on international frameworks such as NIST AI RMF 1.0
Roles in the approval process for guideline revisions

In terms of training format, short e-learning modules (approximately 15–20 minutes) are suitable for general employees, while hands-on workshops are appropriate for IT personnel.

How to Create Comprehension Checks That Measure Guideline Adoption

Post-training comprehension tests tend to become focused on "raising pass rates," but to genuinely measure retention, question designs that can confirm behavioral change are more effective. It is important to shift from testing memorized knowledge to assessing judgment in practical work situations.

When designing comprehension checks, use the following three perspectives as your framework:

Center the assessment on scenario-based questions: Prepare multiple-choice questions set in actual work situations, such as "You are about to enter an email containing a customer's personal information into an AI tool. How do you respond?"
Vary the questions by role and job type: Tailor the content to each person's responsibilities—for example, give managers questions on approval flow judgment, and give general employees questions on classifying input data
Analyze incorrect answer patterns: If errors are concentrated on a particular question, treat this as a signal that the explanation of that topic was insufficient, and use it to improve the training content

Timing of implementation also matters. By re-administering the same questions three months after the initial training—not just immediately afterward—you can continuously monitor knowledge retention. If scores have declined, this is a signal that periodic reminder measures (such as emails or team notifications) are needed.

Additionally, make it clear to employees that the results of comprehension checks will not be used for individual performance evaluations, but rather as indicators for policy improvement. If employees feel the results will be used to evaluate them, there is a risk they will not answer honestly.

Keeping the number of questions to ten or fewer, with a target response time of around ten minutes, reduces the burden on frontline staff and makes it easier to administer the checks on an ongoing basis.

Common Failure Patterns and How to Avoid Them

Conclusion: The most common failures in guideline development fall into two patterns—"overly strict restrictions" and "set it and forget it." Both invite non-compliant behavior in the field, making it essential to build countermeasures into the design from the outset.

Each of these is discussed in detail in the H3 sections below.

The Paradox: Overly Strict Policies Accelerate Shadow AI in the Workplace

It's easy to assume that "stricter policies mean greater security," but the reality on the ground often moves in the opposite direction.

When approval processes are overly cumbersome or tools necessary for day-to-day work are banned across the board, employees tend to conclude that it's easier to simply use those tools without going through official channels. This becomes a breeding ground for shadow AI — generative AI tools used outside of organizational oversight.

Anyone working on the front lines has likely felt at some point: "If my request is just going to get rejected anyway, it's faster to use it quietly."

The following are typical patterns in which overly strict policies backfire:

Blanket bans with no exceptions: Because no alternatives are offered, usage via personal devices and personal accounts increases
Excessively long approval lead times: Because approvals don't come in time for urgent tasks, tools are used without waiting for authorization
Unclear reasons for prohibition: When employees don't understand "why it's not allowed," trust in the rules themselves erodes

In these situations, unvetted tools that carry greater risk end up being used in place of officially managed ones, which paradoxically increases the likelihood of information leaks.

An effective countermeasure is to provide a "safe escape route" rather than defaulting to outright prohibition. Expanding the lineup of approved tools and streamlining the application process creates an environment where employees are more likely to choose the official route. The strictness of a policy only works when balanced against the practical needs of those on the ground.

Establishing a Revision Cycle to Prevent Guidelines from Becoming Shelf Documents

Guidelines are not something you create once and consider finished — they are living documents that must be continuously updated. This is because the environment is constantly changing: laws and regulations are revised, new AI tools emerge, and internal incidents occur.

Much like a vehicle inspection, a mechanism is needed to periodically check whether things are "safe to operate in their current state." Similarly, it is important to institutionalize a regular review cycle for guidelines.

Key points for designing a revision cycle are as follows:

Set a schedule for regular reviews: Establish review opportunities at least once a year, and ideally every six months. Since documents such as the "AI Business Operator Guidelines (Version 1.0)" published by the Ministry of Economy, Trade and Industry and the Ministry of Internal Affairs and Communications (April 2024) and the NIST AI RMF 1.0 are also revised and updated, monitoring the latest developments should be incorporated into regular reviews
Establish trigger-based ad hoc revisions: When an incident occurs, when a new tool is rolled out company-wide, or when laws or regulations change, consider revising immediately rather than waiting for the next scheduled review
Make revision history visible: Record version numbers and reasons for changes, and maintain a state in which all employees can check the differences at any time
Collect feedback from the field: Set up a channel each quarter to receive improvement suggestions from front-line staff, and continuously verify the effectiveness of the guidelines

After each revision, it is also essential to conduct a briefing and training session to ensure that all employees have a clear understanding of the updated content.

Frequently Asked Questions (FAQ)

Q1. Do small and medium-sized businesses need guidelines too?

Even with a small number of employees, a policy is necessary as soon as generative AI tools are being used for business purposes. The smaller the organization, the greater the relative impact when an information leak occurs. Starting by summarizing just two points — "which tools are permitted" and "what data must not be entered" — on a single page allows you to achieve a minimum level of risk management while keeping the burden low.

Q2. How should guidelines be integrated with existing information security policies?

Appending an "addendum on AI usage" to your existing policy minimizes the cost of review and approval. Reusing existing data classification standards and data handling rules as-is, then adding AI-specific items as incremental additions — such as restrictions on prompt input and prohibitions on secondary use of outputs — tends to be more readily accepted by staff on the ground.

Q3. How should the use of free AI tools on personal devices be handled?

Using personal devices and personal accounts for business purposes is a textbook example of shadow AI. It is recommended that guidelines explicitly state: "Do not enter business data into personally owned accounts or unapproved services," and that procedures for handling violations be defined alongside this rule. Since leading with prohibition alone tends to increase workaround usage, it is important to present approved alternative tools at the same time.

Conclusion: Guideline Development as the Foundation for Accelerating AI Adoption

Looking back at what has been covered in this article, it becomes clear that developing guidelines is not an exercise in "imposing restrictions," but rather the work of building a foundation on which employees can confidently and fully leverage AI.

The five steps — starting with understanding the current situation and establishing a governance structure, followed by the three-tier tool classification, approval workflow design, incident response and log management, and AI literacy training — do not function as independent measures, but as a connected, end-to-end process. If any one element is missing, the whole breaks down: having a classification system without a workflow leads to it becoming a formality, and conducting training without incident response procedures in place means the organization will be unable to function when something actually goes wrong.

International frameworks such as the "AI Business Operator Guidelines (Version 1.0)" published by the Ministry of Economy, Trade and Industry and the Ministry of Internal Affairs and Communications in April 2024, and the NIST AI RMF 1.0, also place the balance between risk management and the promotion of AI adoption at their core. When developing your own organization's guidelines, a practical approach is to use these as reference points while tailoring the content to fit your industry, organizational size, and the confidentiality level of your data.

Finally, a completed set of guidelines is not something to be "stored away" — it is something to be "grown." Given the pace at which generative AI technology is evolving and the rate at which the regulatory landscape is shifting, building a review cycle of every six months to a year into the design from the very beginning is what leads to stable, long-term operation.

Author & Supervisor

Chi

Majored in Information Science at the National University of Laos, where he contributed to the development of statistical software, building a practical foundation in data analysis and programming. He began his career in web and application development in 2021, and from 2023 onward gained extensive hands-on experience across both frontend and backend domains. At our company, he is responsible for the design and development of AI-powered web services, and is involved in projects that integrate natural language processing (NLP), machine learning, and generative AI and large language models (LLMs) into business systems. He has a voracious appetite for keeping up with the latest technologies and places great value on moving swiftly from technical validation to production implementation.