AI Implementation and Data Security: How Mid-Market Companies Give Employees Access Without Losing Control
Mid-market companies can give employees secure AI access to internal knowledge — email archives, contracts, product documentation, CRM records, operational data — by deploying AI on top of a role-governed, encrypted retrieval architecture rather than public tools. The right architecture depends on whether your internal data problem is primarily about documents or structured data at scale. Both paths exist. Choosing the wrong one costs more time than choosing nothing.
Author: Roy Gatling, RMG Associates — linkedin.com/in/roygatling
Published: 2026-05-14
Last updated: 2026-05-15
Freshness note: Data reflects reports and incidents from October 2025 through May 2026. Shadow AI statistics updated with IBM, Gartner, IDC, Netskope, and Databricks 2025–2026 findings.
What is the actual problem companies are trying to solve?
Employees need information to do their jobs. They need answers from last quarter's contracts, the product spec their team shipped six months ago, the compliance policy legal updated last week, the customer history their sales rep entered into Salesforce. Getting that information used to mean searching three systems, pinging three people, and waiting.
AI makes it instant. So employees are using AI — regardless of whether IT approved it.
According to IDC's 2025 survey, 56% of employees use unauthorized AI tools at work, while only 23% use tools their organization provides and governs. That gap is the security problem. Employees aren't trying to cause harm. They're trying to get their jobs done. The risk is that they're pasting sensitive data — contracts, financial models, customer records, source code — into public tools that may retain that data, use it for model training, or expose it through breaches.
The IBM 2025 Cost of a Data Breach Report found that breaches involving shadow AI cost organizations an average of $670,000 more than other incidents. Gartner, in a November 2025 survey of 302 cybersecurity leaders, found that 69% of organizations already suspect or have confirmed that employees are using prohibited public generative AI tools.
The goal is not to block AI. The goal is to give employees a version of AI that works — without the exposure.
Why traditional security controls fail for AI workloads
Traditional enterprise security was designed for a different threat model. Firewalls and DLP tools watch for file transfers. They catch the employee who emails a spreadsheet to a personal account. They do not catch the employee who copies three paragraphs of a contract into a ChatGPT prompt.
Check Point's 2025 AI Security Report found that 1 in 80 generative AI prompts carries a high risk of sensitive data exposure, and 7.5% of prompts include sensitive or private details. LayerX's Enterprise AI and SaaS Data Security Report found that employees make an average of 14 copy-paste actions per day using non-corporate accounts, with at least 3 containing sensitive data. Generative AI accounts for 32% of all corporate-to-personal data exfiltration — making it the single largest vector for sensitive data movement, ahead of email and USB.
Traditional DLP doesn't read prompts. It doesn't understand context. It can't tell the difference between an employee asking an AI to help format a memo and an employee pasting a client's unredacted financials into the same session.
The browser has become the new security perimeter. Microsoft announced at RSAC 2026 that it was extending Microsoft Purview protections into Edge for Business specifically to intercept prompt-level data.
But browser controls are one layer. The more durable answer is giving employees a sanctioned AI that doesn't require them to paste sensitive data into a public system at all.
What does a secure enterprise AI architecture look like?
A governed enterprise AI architecture for employee knowledge access has three layers working together. The architecture is the same whether the platform is a turnkey SaaS product or an enterprise data lakehouse — the implementation complexity is where the paths diverge.
Layer 1: A private knowledge layer (RAG)
Retrieval-Augmented Generation (RAG) is the dominant architecture for enterprise AI knowledge systems. Rather than feeding documents directly to an AI model or relying on public training data, RAG indexes your internal content into an encrypted vector database. When an employee asks a question, the system retrieves relevant passages from your approved knowledge base — not from the open internet — and uses those passages to construct a response.
The AI never touches your raw documents. It reads knowledge that has been processed, encrypted, and stored within a controlled environment. RAG now powers over 70% of internal AI knowledge tools in production enterprise environments (Hakia, 2025).
Layer 2: Role-based access control at the retrieval layer
Access governance must be enforced where data is retrieved, not just where it is stored. If a marketing employee asks the AI "what are our Q4 revenue projections," the system should return an answer only if that employee has permission to see Q4 revenue data. If they don't, the retrieval layer returns nothing — the AI has nothing to work with and cannot speculate or synthesize from restricted content. This is the distinction between a real control and a checkbox. Role-based access enforced at the retrieval layer applies the principle of least privilege to AI interactions the same way it applies to system permissions.
Layer 3: Audit logging and compliance evidence
Every interaction — who asked what, what was retrieved, what was returned — should produce an immutable log. This is not optional for regulated industries. SOC 2 and ISO 27001 require documented access controls and audit trails. GDPR requires records of data processing. An AI system that does not log interactions is a compliance liability regardless of how well-encrypted the underlying data is.
What are the two primary architectural paths for mid-market companies?
Path A: Turnkey SaaS knowledge layer. Purpose-built platforms (Medullar, MindStudio, Glean) that sit on top of existing document repositories, apply role-governed RAG, and deliver governed AI access without requiring data engineering staff. Designed for organizations whose internal knowledge problem is primarily document-based — SharePoint, email, Confluence, product docs.
Path B: Enterprise data lakehouse as the AI knowledge foundation. Platforms like Databricks Lakehouse that unify structured data (CRM, ERP, operational databases), unstructured data (documents, logs), and AI infrastructure under a single governed environment. Designed for organizations with complex, multi-source data estates, existing cloud infrastructure, and technical capacity to operate the platform.
Most mid-market companies at Level 1 or Level 2 maturity (see table below) should start with Path A. The Databricks path is the right answer for a different problem — one where the internal knowledge gap is as much about structured operational data as it is about documents.
What tools are available and how do they differ?
Several platforms address this space with meaningfully different approaches and fit profiles.
Medullar
Medullar is built specifically for organizations that need AI productivity without moving data to public infrastructure. The platform runs entirely within Microsoft Azure's AI Foundry — no data is stored, shared with, or used to train external models. When a file is uploaded, the original is never retained: the system extracts knowledge, encrypts it at the chunk level using AES-256, and deletes the source. Each workspace uses a unique encryption key, so the same document in two different workspaces produces different encrypted representations with no cross-contamination. Medullar's architecture is zero-knowledge by design: even Medullar administrators cannot read workspace content. The platform connects to over 60 enterprise applications for federated search. For mid-market companies that want enterprise-grade security without enterprise-grade IT overhead, Medullar deploys quickly without custom infrastructure build-outs. Best fit: 50–500 employees, document-centric internal knowledge, limited data engineering capacity.
Microsoft 365 Copilot with Purview
For organizations already running Microsoft 365, Copilot with Purview integration enforces sensitivity labels through AI interactions. Purview can detect and block prompts containing PII, financial data, or classified content before the interaction completes. The access governance model inherits from Active Directory. The limitation is cost and configuration complexity. Microsoft's enterprise Copilot licensing starts at $30/user/month, requires tenant-level configuration, and delivers the most value when an organization's data governance within Microsoft 365 is already mature. For companies with inconsistent SharePoint tagging or disorganized document structures, Copilot will surface what employees were already struggling to find — and miss what they need most.
Glean
Glean is an enterprise search and knowledge discovery platform that connects to internal data sources (Google Drive, Confluence, Salesforce, Jira, Slack, and others) and enforces source-level access controls at retrieval time. An employee only sees results they already have permission to access in the underlying system. Glean's architecture is designed for FedRAMP, GDPR, and HIPAA-aligned environments. Glean skews toward larger organizations — pricing and deployment complexity are better suited to companies with 500+ employees and mature IT functions.
MindStudio
MindStudio provides an enterprise AI agent platform with built-in SSO, role-based access control, and audit logging. It is designed for organizations building multiple AI agents — customer-facing, internal ops, HR, legal — where each agent needs different permissions and governed access to different data. The platform supports SOC 2, ISO 27001, and GDPR compliance, with self-hosted deployment for organizations with data residency requirements. Requires more technical setup than Medullar but provides more control over agent behavior.
Databricks Lakehouse (Path B)
Databricks is a unified data and AI platform that combines data engineering, analytics, and AI under a single governed environment. For mid-market companies with significant structured data (CRM records, ERP tables, operational databases, financial data), Databricks represents a meaningfully different architectural choice — not a fancier document search tool.
Unity Catalog is the governance layer. It implements role-based access control (RBAC) and attribute-based access control (ABAC) at the catalog, schema, table, and column level. Privileges are hierarchical — permissions granted at the catalog level cascade to schemas and tables, and fine-grained controls like row-level filtering and column masking are available for sensitive fields. Every data access, query, and AI interaction is logged with full lineage. The Unity Catalog RBAC model maps naturally to business domains: prod_finance, prod_sales, prod_hr catalogs with role groups like finance_admins and finance_readonly enforce the principle of least privilege at the retrieval layer.
Vector Search is fully integrated with the Lakehouse and works directly with Delta tables. Vector indexes are automatically generated and updated when new data is added — no separate synchronization pipeline is required. This means the AI system always queries fresh, governed enterprise data. The architecture keeps retrieval inside the governance boundary rather than exporting data to a separate vector database service.
Lakeflow Connect provides native connectors for ingesting data from enterprise applications — including Microsoft SharePoint, relational databases, and third-party SaaS systems — directly into Delta tables governed by Unity Catalog. SQL sources can be federated without ETL: queries are pushed down to the source system, and Unity Catalog governs the access.
Mosaic AI and Agent Bricks add a governed control plane for LLM usage. A single API abstracts multiple model providers, enables routing, monitoring, guardrails, and cost optimization, and supports building production-ready AI agents that operate within the governance boundary. Organizations using Databricks can serve role-governed answers from structured and unstructured data in a single query — a capability that requires complex integration work on any other platform.
Best fit: Organizations with $75M–$500M revenue, existing cloud data infrastructure, complex multi-source data estates, and technical capacity to deploy and operate the platform. Companies already using Databricks for analytics have a natural extension path rather than a net-new implementation.
How does Databricks Lakehouse compare to purpose-built AI knowledge platforms?
| Dimension | Medullar | Microsoft 365 Copilot + Purview | Glean | Databricks Lakehouse |
|---|---|---|---|---|
| Primary data type | Documents, files | Microsoft 365 content | Multi-SaaS documents | Structured + unstructured |
| Structured data (CRM, ERP) | Limited | Limited | Via connectors | Native — core strength |
| Governance model | Workspace-level encryption + RBAC | Active Directory inheritance | Source-level permissions | Unity Catalog: column-level RBAC + ABAC |
| Audit logging | Yes — encrypted | Yes — Purview | Yes | Yes — full lineage |
| RAG architecture | Built-in | Built-in (SharePoint-based) | Built-in | Native Vector Search on Delta tables |
| Data residency | Azure boundary | Microsoft tenant | Vendor SaaS | Your cloud account |
| Setup complexity | Low — no data engineering required | Medium — requires M365 governance maturity | High — enterprise IT function | High — requires data engineering capacity |
| Best fit (employee count) | 50–500 | 200–2,000 | 500+ | 75–500 with data team |
| Indicative cost | Accessible for SMB | $30/user/month + config overhead | Enterprise pricing | DBU consumption-based; variable |
| Compliance certifications | SOC 2, GDPR, Azure AI Foundry | SOC 2, HIPAA, GDPR, FedRAMP | FedRAMP, HIPAA, GDPR | SOC 2, HIPAA, GDPR, FedRAMP |
| Time to governed AI | Weeks | 1–3 months | 2–4 months | 3–6 months (with data classification) |
What maturity level are most mid-market companies at today?
Most mid-market companies are at one of three maturity levels when it comes to AI and data security. The architecture choice should follow from maturity level, not precede it.
| Maturity Level | What's Happening | Security Posture | Compliance Exposure |
|---|---|---|---|
| Level 1: Unmanaged | Employees use consumer tools freely (ChatGPT, Gemini, Claude.ai personal accounts) | None — no visibility, no controls | High — data exits the org boundary with every prompt |
| Level 2: Policy-Only | IT has issued a policy; approved tools exist but are limited or poorly adopted | Partial — policy without tooling creates compliance theater | Moderate — policy without enforcement doesn't reduce breach risk |
| Level 3: Governed AI Layer | Private knowledge layer with role-based retrieval, audit logging, SSO integration | High — data stays inside the org boundary; interactions are logged | Low — demonstrable controls aligned to SOC 2, GDPR, HIPAA as applicable |
Only 37% of organizations have an AI governance policy in place at all (IBM, 2025). Deloitte's 2026 State of AI in the Enterprise found one in five companies has a mature governance model despite worker AI access rising 50% in 2025 alone.
Gartner forecasts AI governance spending will reach $492 million in 2026 and surpass $1 billion by 2030. The market has accepted this is not a problem that resolves itself.
What is the governance problem that no platform automatically solves?
The most important constraint in deploying any governed AI knowledge system — whether Medullar or Databricks — is that the data governance pre-work is not optional and not provided by the platform.
A January 2026 Netskope Cloud and Threat Report found that GenAI data violations have more than doubled year-over-year. The root cause, according to Sentra's 2026 analysis of lakehouse environments, is that sensitive data accumulates in enterprise data stores faster than classification and access controls keep up — and AI systems inherit whatever access their underlying service accounts allow.
Deploying a RAG system on top of an unclassified SharePoint or an ungoverned Databricks lakehouse does not reduce exposure. It makes access faster. The same data governance problem your employees had becomes the same data governance problem your AI has — at higher speed.
In our work with mid-market firms, the barrier to deploying a governed AI knowledge layer is rarely the platform. It is organizational: who owns data classification, how are access decisions made, and which repositories have been audited for sensitive content. These questions have to be resolved before an AI layer is added on top of them, not after.
What does implementation actually look like for a mid-market company?
The sequence that works, regardless of platform:
Step 1: Audit current AI usage before writing policy. Most companies discover that 40–60% of employees are already using AI tools. The audit creates the baseline and surfaces which use cases are driving adoption — those use cases are what your governed system needs to serve well.
Step 2: Classify your data before connecting it to AI. Not all internal content carries the same risk. Product documentation is low-risk. Customer contracts with PII are high-risk. Health information is regulated. Apply a three-tier classification (public / internal / restricted) to your major document and data repositories before connecting them to any AI retrieval system.
Step 3: Match the platform to your data estate and technical capacity. Organizations with document-centric knowledge problems and limited data engineering resources should evaluate Medullar or MindStudio. Organizations with complex structured data estates (CRM, ERP, operational databases), existing cloud infrastructure, and data engineering capacity should evaluate Databricks Lakehouse or Microsoft Fabric. The wrong move is adopting a platform that requires rearchitecting your data storage — the AI layer should sit on top of where your data already lives.
Step 4: Adopt, don't mandate. The Healthcare Brew 2026 survey found that providing approved AI tools drove an 89% reduction in unauthorized use. The goal is to make the governed option better than the ungoverned one — faster, more accurate on internal knowledge, and integrated into existing workflows. Mandating a tool employees find inferior produces the same shadow AI problem you started with.
Step 5: Log everything and review quarterly. Audit logs are only useful if someone reviews them. A quarterly review of AI interaction logs — what queries were most common, what knowledge was most retrieved, what access errors occurred — serves as both a security control and a signal for improving the system.
What happens when your AI knowledge problem is actually a data infrastructure problem?
This is the question most mid-market AI deployments surface within the first 90 days: employees want AI answers from data that lives in CRM, ERP, financial systems, and operational databases — not just documents. Governed document search gets them 40% of what they need. The rest is locked in structured systems that weren't designed for natural language access.
This is where the Databricks path becomes architecturally meaningful rather than overcomplicated. Databricks Lakehouse allows a single governed environment to serve structured queries (operational data), unstructured retrieval (documents, emails), and AI generation — with Unity Catalog enforcing the same access controls across all three. A sales rep asking "what's our average deal size for manufacturing clients in the Midwest over the last 18 months?" gets an answer that combines CRM records and product data, governed by their role permissions, with every interaction logged.
The Databricks 2026 State of AI Agents report found that organizations using unified governance tools are deploying 12 times more AI projects to production than those with fragmented data silos. The governance infrastructure becomes the platform — not just for the initial RAG use case, but for every AI agent and workflow built on top of it.
This is also the scenario where platform choice compounds. A company that deploys Medullar for document access and later wants governed AI access to its ERP data is making a second architectural decision — not extending the first. A company that deploys Databricks Lakehouse is making one architectural decision that extends.
What compliance frameworks apply to AI knowledge systems?
Several frameworks now have AI-specific guidance. The answer depends on your industry.
NIST AI RMF (AI Risk Management Framework) — The U.S. standard for AI governance. It maps to four functions: Govern, Map, Measure, Manage. Mid-market companies seeking to document AI controls for enterprise customers or SOC 2 audits should align to NIST AI RMF first.
ISO 42001 — The international standard for AI management systems, published in 2023. Becoming a requirement in enterprise vendor procurement, especially in EU markets.
SOC 2 Type 2 — The baseline enterprise compliance certification. AI systems that process customer data or are used in customer-facing workflows need to be included in SOC 2 scope. Key controls: access management, change management, monitoring and alerting, and incident response.
GDPR / CCPA — Regulated data entering AI systems creates processing records obligations. This applies to any AI system that ingests employee communications, customer records, or other PII — even for internal use only.
EU AI Act — Effective February 2025, the EU AI Act introduces risk-based obligations for AI systems. Most internal enterprise knowledge tools fall into limited or minimal risk categories, but systems used for HR decisions, financial assessments, or anything affecting individuals require additional documentation and human oversight requirements.
Gartner projects that over 60% of enterprises will require formal AI governance frameworks by 2026 to satisfy security, risk, and compliance requirements from enterprise customers. This procurement pressure is reaching mid-market companies through supply chain requirements.
What are the most common failure modes?
- Governing the tool, not the behavior. Companies approve a specific AI platform and consider the job done. Employees continue using other tools because the approved one doesn't serve their actual use cases. Access policy must follow the data and the use case, not just the software list.
- Connecting AI to ungoverned data. Deploying a RAG system on a SharePoint instance where everyone has access to everything doesn't reduce exposure — it just makes the access faster. The data governance problem has to be solved before the AI layer is added, not after. This applies equally to Databricks lakehouses where service accounts have been granted broad permissions.
- Choosing the wrong platform for the data estate. A turnkey SaaS knowledge tool deployed on top of a structured-data-heavy organization leaves 50–70% of the internal knowledge problem unsolved. An enterprise data lakehouse deployed at a 75-person professional services firm without a data engineer creates a six-month implementation project with no one to maintain it.
- Logging without reviewing. Audit logs that no one reads are compliance theater. The log review process and the person responsible for it need to be defined at deployment.
- Blanket bans without governed alternatives. When approved alternatives are absent or inadequate, employees use unapproved tools. Banning AI without providing a governed substitute doesn't reduce risk — it drives risk underground.
What should a CEO or COO do in the next 60 days?
If you have between 50 and 500 employees and you haven't formally addressed AI and data security, the minimum viable action plan:
- Audit current AI tool usage across the organization — use an anonymous survey if needed
- Identify the top 3–5 internal knowledge use cases employees are solving with AI today
- Determine whether those use cases are primarily document-based or structured-data-based — this is the architectural fork
- Classify your data: which repositories contain PII, financial data, or regulated content?
- Evaluate one or two governed AI platforms based on your fork: Medullar or MindStudio for document-centric; Databricks Lakehouse or Microsoft Fabric for structured-data-heavy environments
- Define a three-tier AI tool policy: approved / limited use / prohibited
- Assign ownership — who is responsible for AI governance? IT alone is insufficient; legal and operations need seats at the table
The window where "we haven't gotten to AI governance yet" is a defensible position is closing. Enterprise customers, insurance underwriters, and regulators are beginning to treat AI governance as a baseline expectation, not a differentiator. Mid-market companies that build the infrastructure now will spend less on remediation later — and will spend it on capability instead of containment.
Related articles
- Fast Is Not Fast Enough: The Compounding Math of AI Delay — the economic case for why waiting on AI governance compounds risk.
- The fastest way to fail at AI is to "try a bunch of things" — a structured planning process before you scale tools or pilots.
- AI as Margin Infrastructure: The CFO's Framework — how to frame AI investment, risk, and ROI for the finance function.
Executive FAQ
Frequently asked questions about AI data security and governed access.
What is shadow AI and why does it matter for compliance?
Shadow AI refers to AI tools employees use without IT approval — consumer ChatGPT accounts, personal Claude subscriptions, or integrated AI features in unapproved SaaS tools. It matters for compliance because regulated data entering these systems creates documentation obligations under GDPR, HIPAA, and SOC 2 that organizations cannot fulfill if they lack visibility into the interactions.
How does RAG keep sensitive data inside the organization?
RAG (Retrieval-Augmented Generation) indexes internal documents into an encrypted vector database within your infrastructure. When an employee queries the system, the AI retrieves passages from that encrypted store — not from public training data — and generates a response. The original documents and the interaction logs stay inside your environment. No raw data is sent to a public AI model.
What is the difference between role-based access in a regular system and in an AI system?
In a traditional system, access controls determine what files a user can open. In an AI knowledge system, access controls must be enforced at the retrieval layer — what content the AI is allowed to pull before generating an answer. Without retrieval-layer enforcement, a user could ask a question that causes the AI to synthesize and return information they are not permitted to see, even if they could not access the underlying documents directly.
When does it make sense to use Databricks Lakehouse instead of a purpose-built AI knowledge platform?
Databricks Lakehouse is the stronger choice when the internal knowledge problem is primarily about structured data — CRM records, ERP tables, operational databases, financial data — not just documents. It is also appropriate when the organization already has cloud data infrastructure (Azure, AWS, GCP), has data engineering capacity, and wants a single governed environment for analytics and AI rather than a separate layer for each. For document-centric problems without a data engineering function, purpose-built platforms like Medullar deploy faster with lower operational overhead.
What does Unity Catalog do and why does it matter for AI security?
Unity Catalog is Databricks' unified governance layer for data and AI. It implements role-based access control (RBAC) and attribute-based access control (ABAC) at the catalog, schema, table, and column level. For AI workloads, Unity Catalog enforces who the AI is allowed to retrieve data on behalf of — preventing the system from synthesizing answers from data the querying user is not authorized to access. It also produces full audit logs with data lineage across every query and AI interaction.
What data governance work must be done before deploying any governed AI knowledge system?
Data classification is required before any governed AI deployment, regardless of platform. At minimum, apply a three-tier classification (public / internal / restricted) to all major document repositories and data sources. This determines what content can connect to a self-service AI and what requires additional access controls. Skipping this step results in an AI system that inherits the same governance gaps as the underlying data — and surfaces them at higher speed.
Is there a compliance framework specifically for AI?
Yes. NIST AI RMF and ISO 42001 are the primary frameworks. The EU AI Act adds binding legal obligations for organizations operating in EU markets. Most mid-market companies should start with NIST AI RMF because it maps clearly to existing SOC 2 control language and is the framework most enterprise customers are beginning to require in vendor questionnaires.
How much does it cost to implement a governed AI knowledge layer?
Costs vary significantly by platform and scale. Medullar targets SMBs and professional services firms with pricing designed for accessible deployment without enterprise IT overhead. Microsoft Copilot starts at $30/user/month, best suited for companies deeply invested in Microsoft 365. Databricks costs are consumption-based (Databricks Units / DBUs) and scale with compute — variable, but predictable with a data engineering function managing it. The more relevant cost calculation: IBM found shadow AI breaches cost an average of $670,000 more per incident than standard breaches. For most mid-market companies, the platform cost is a fraction of a single incident.
About the author
Roy Gatling is the founder of RMG Associates, an AI strategy and implementation agency serving mid-market companies. RMG advises C-suite leaders on AI adoption, builds custom AI-powered tools for client operations, and designs data governance frameworks that hold up under compliance scrutiny. RMG Associates is a Databricks and Snowflake reseller partner. Connect on LinkedIn
Ready to move from reading to acting?
AI Strategy Alignment & Planning is the structured next step — a working session that produces board-ready clarity on your AI leverage in less than 5 days.
Assess Your AI Operating MaturityFeatured guide
Start with where most AI programs actually break down
Why Your AI Transformation Is Being Overcomplicated (And How to Fix the Partner Problem) — the operating logic for picking partners and pacing transformation so execution matches mid-market realities.
Read the flagship guide