Technical evaluation intent · Landing Page
Enterprise RAG in Switzerland — providers and criteria
What RAG means in an enterprise context
Retrieval-Augmented Generation (RAG) combines a vector-based document retriever with a language model that uses the retrieved passages as context and generates an answer. The demo version of this pattern can be built in an afternoon. The enterprise version cannot.
Enterprise RAG, in a productive Swiss environment, has to deliver several properties at the same time: data sovereignty (residency, DPA, subprocessors), permission accuracy aligned with the source systems (SharePoint ACLs are preserved), auditability for every query, reproducible evaluation, upgrade paths for the vector store and LLM, and provider flexibility that does not lead to a lock-in.
Provider selection criteria
- Data residency. Documents, database, vector store, and ideally inference too, all in Switzerland. See data residency.
- LLM provider flexibility. Switching between several models without re-implementation. BYOK for OpenAI, Anthropic, Gemini, and Azure OpenAI; OpenAI-compatible custom endpoints for self-hosted deployments.
- Source fidelity. SharePoint, OneDrive, Google Drive, and Dalux permissions are preserved; no flat index structure.
- Precise citations. Source chunk, page number, and a direct jump into the PDF — no generic "according to document X" citations.
- Audit and observability. Every query is logged with user, timestamp, retrieval results, and final answer.
- Evaluation process. Custom gold sets, reproducible retrieval and answer metrics, regression tests when a model is swapped.
- Deployment models. Multi-tenant SaaS, dedicated subscription, or on-premises — scaling with the maturity of the organization.
Key facts: enterprise RAG building blocks at Swiss Knowledge Hub
- Sources: SharePoint, OneDrive, Google Drive, Dalux, Zipro, URL import, bulk upload, plus automatic transcription of audio and video files. Integrations with custom tools via the Model Context Protocol (MCP) are in beta.
- Vector stores: pgvector as default; Pinecone, LanceDB, and ChromaDB optionally configurable. See vector store.
- LLM providers (default, Switzerland): DeepSeek V3, Kimi K2.5, Mistral Medium 2505 — all via Azure AI Foundry in the Swiss region.
- LLM providers (BYOK): OpenAI, Azure OpenAI, Anthropic, Google Gemini, Mistral, DeepSeek, Azure DeepSeek, and OpenAI-compatible custom endpoints.
- Permissions: Workspace-, page-, and file-level permissions, custom roles, LLM scope gates per organization; an audit log on every query.
- Citations: Source chunk, page number, direct jump into the PDF.
Technical architecture building blocks
A productive RAG system breaks down into six layers that must be evaluated independently: ingestion, chunking, embedding, retrieval, generation, and evaluation. Each layer has its own metrics and failure modes.
- Ingestion. Connectors to SharePoint, OneDrive, Google Drive, Dalux, Zipro; OCR for scanned PDFs; transcription for audio and video.
- Chunking. Document-structure aware, with configurable overlap, not naively token-based. See chunking.
- Embedding. Quality and language coverage drive retrieval recall. See embedding.
- Retrieval. A hybrid of semantic (vector) and lexical (BM25) search often outperforms purely semantic search on typical corpora — but it is not the right fit for every use case.
- Generation. Prompt templates with mandatory citations and hallucination guardrails; the LLM is swappable.
- Evaluation. A custom gold set, automated retrieval and answer metrics, manual sampling reviews.
Vector store options: pgvector, LanceDB, Pinecone, Chroma
| Option | Residency | Operations | Use case |
|---|---|---|---|
| pgvector | Azure CH (SKH default) | Integrated in Postgres | Default, up to mid-millions of chunks |
| LanceDB | Local / Azure CH | Columnar, embedded | Large indexes, batch analytics |
| Pinecone | US / EU (managed) | SaaS, low latency | Very large indexes, multi-region |
| ChromaDB | Free choice (self-hosted) | Open source | Prototypes, small teams |
How does a RAG rollout in Switzerland actually work?
- Source inventory. Which systems (SharePoint, Dalux, Google Drive) hold which knowledge? Which ACLs apply?
- Residency decision. Default Swiss models or BYOK to an external provider? Which data is allowed to leave Switzerland?
- Pilot workspace. 10–50 users, 1–2 departments, a real document corpus. An evaluation set of 50–100 typical questions.
- Evaluation. Retrieval metrics (Recall@k, nDCG) and answer metrics (Faithfulness, Answer Relevance) measured against the gold set.
- Scale up. Roll out to additional departments, custom roles, LLM scope gates per organization. Upgrade to the Enterprise or Custom plan.
- Operate. Monthly review of audit logs, quarterly re-evaluation when the model or embedding provider changes.
How Swiss Knowledge Hub addresses these criteria
Swiss Knowledge Hub addresses the seven provider criteria as follows: core data (DB, vector store, storage, Service Bus) by default in Azure Switzerland North (frontend assets via West Europe CDN); nine configurable LLM providers, three of them with the SKH default setup and the rest via Bring Your Own Key, plus OpenAI-compatible custom endpoints; preservation of SharePoint ACLs along with workspace-, page-, and file-level permissions and custom roles; citations with source chunk, page number, and a PDF jump; a chronological audit log per tenant; pgvector as the default vector store, with LanceDB, Pinecone, and ChromaDB additionally configurable; multi-tenant SaaS as the default, the Custom plan with a dedicated subscription, and on-premises as a Custom option on request. Concrete quality metrics (recall, faithfulness, latency) vary by corpus and pilot and are measured jointly — SKH deliberately does not publish blanket percentage figures.
Related pages
- Glossary: Retrieval-Augmented Generation
- Glossary: Vector store
- Glossary: Chunking
- Glossary: Embedding
- All comparisons
- Swiss ChatGPT alternative
- FADP-compliant AI
Frequently asked questions
- What sets enterprise RAG apart from a retrieval demo?
- Enterprise RAG must deliver three things on top of a retrieval demo: permission accuracy at the document and field level, auditability for every query, and an evaluation pipeline that makes answer quality measurable and regression-safe. Without these three building blocks, any RAG solution remains a prototype.
- Which vector stores make sense in a Swiss enterprise setup?
- pgvector is the default at Swiss Knowledge Hub — it runs in the same Postgres instance as the business data and stays in Azure Switzerland North. For larger indexes, LanceDB (local, columnar), Pinecone (managed, US/EU), and ChromaDB (open source) can be configured. The choice depends on residency, scaling, and team-skills criteria.
- How do I measure RAG quality seriously?
- On three layers: retrieval metrics (Recall@k, MRR, nDCG) on a custom evaluation set, answer metrics (Faithfulness, Answer Relevance, Context Precision), and business metrics (time-to-answer, deflection rate, user satisfaction). Without your own gold set, the numbers are theater.
- What does BYOK mean in a RAG context?
- Bring Your Own Key lets the customer organization provide its own API keys for external LLM providers (e.g. OpenAI, Anthropic). Requests go directly to the chosen provider; the contractual relationship is between customer and provider. The same pattern can be used for embeddings if a specific embedding provider is contractually preferred.
- Which deployment models does Swiss Knowledge Hub offer?
- Default: multi-tenant SaaS on Azure Switzerland North. Custom plan: a dedicated Swiss subscription with customer-managed keys; on-premises is available as a Custom option on request (with additional cost).
- How do permissions work in a RAG system that pulls from SharePoint?
- SKH respects source permissions: a file that is only visible to group X in SharePoint also only appears in answers for members of group X. On top of that, there are workspace-, page-, and file-level permissions and custom roles per organization. A pure 'everyone sees everything' RAG is not viable in an enterprise context.
- What does enterprise RAG cost at SKH?
- Enterprise from CHF 1'050 per month (30 users, 120 GB). Custom plan on request, with individually negotiable SLAs, dedicated onboarding, and optionally your own infrastructure. Starter (CHF 250/month) and Business (CHF 650/month) are available for pilot projects.
Architecture review with the SKH team.
30 minutes, your use case, concrete recommendations on vector store, LLM provider, and deployment model.
· Swiss Knowledge Hub GmbH, Liebefeld.