Embeddings and Vector DBs: Refreshing SME Document Search

TL;DR: Embeddings and vector databases — moving SME document search to semantic retrieval, RAG architecture, and an implementation guide.
Summary: An embedding (vectorisation) turns text into a numeric vector; the question "are these two pieces of text semantically similar?" is then answered by vector distance maths. Vector databases (Qdrant, Chroma, Weaviate, Pinecone) store and query these vectors at scale. The most practical use case for SMEs is refreshing document search. Instead of the old "Ctrl+F keyword match", you get a "find me documents that are topically similar to this" approach. Combine a local LLM with a vector DB through a RAG (Retrieval-Augmented Generation) architecture and you can offer a "ChatGPT" experience over your company's own documents — without any data leaving the organisation.
In SMEs, documents are scattered across SharePoint, file servers, Notion, and email archives. When information is needed — "where is that contract?", "what did we do for this customer last year?", "what does our company policy say?" — manual search takes hours. Classic keyword search returns 50 documents that contain the word "backup" and you can't tell which one is relevant. Embedding-based semantic search will surface documents about backup, BCP, and KVKK compliance even when the query "what should we do to protect our data?" doesn't share any keywords with them.
In this article we cover the embedding concept at SME scale, vector DB options, and refreshing document search with a RAG architecture. The target audience: IT managers, teams that want more efficient document management, and decision makers who want to take advantage of modern AI.
What is an Embedding?
An embedding converts text (a word, sentence, paragraph, or document) into a fixed-size numeric vector.
A Typical Embedding
Sentence: "A backup strategy is required for KVKK compliance"
Embedding (example, 384 dimensions):
[0.12, -0.45, 0.67, 0.23, -0.11, ...]
This vector represents the "meaning" of the sentence. Sentences with similar meanings have similar vectors.
Vector Similarity
The "distance" between two vectors is measured with cosine similarity:
- 1.0 = same meaning
- 0.5 = related but different topic
- 0.0 = unrelated
- -1.0 = opposite
SME rule of thumb: 0.7+ similarity = "relevant document".
Embedding Models
| Model | Dimensions | Footprint | SME fit |
|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | Small | Light, fast |
| mxbai-embed-large | 1024 | Medium | General purpose |
| OpenAI text-embedding-3-small | 1536 | Cloud API | High quality |
| OpenAI text-embedding-3-large | 3072 | Cloud API | Premium |
| Cohere embed-multilingual | 1024 | Cloud API | Good Turkish support |
| BGE-large | 1024 | Local | Solid open source |
Pragmatic SME starting point: all-MiniLM-L6-v2 (local, fast, free) or mxbai-embed-large (better quality + local).
What is a Vector DB?
A vector DB stores millions or billions of embeddings at scale and answers the question "which 10 vectors are closest to this one?" quickly.
Vector DB Options
| DB | Type | SME fit |
|---|---|---|
| Qdrant | Open source, self-host | Recommended |
| Chroma | Open source, lightweight | Small SMEs |
| Weaviate | Open source, feature rich | Complex needs |
| Milvus | Open source, scalable | Larger SMEs |
| Pinecone | Cloud SaaS | Quick start |
| PostgreSQL + pgvector | Add-on to existing Postgres | SMEs already on Postgres |
| Elasticsearch | Built-in vector support | Existing Elastic users |
Typical SME pick: Qdrant (feature rich, scalable) or Chroma (easiest to set up).
A Practical SME Scenario — Document Search
The difference between classic and embedding-based search:
Classic Search (Ctrl+F)
Question: "How do we back up customer data?"
Result: every document containing the word "backup".
Problems:
- A document titled "Data protection" gets missed
- A document titled "Backup strategy" isn't found (if it doesn't contain the queried word)
- 100 results show up, 90% irrelevant
Embedding-Based Search
Question: "How do we back up customer data?"
System: question → embedding → top 10 nearest documents in the vector DB.
Result:
- "Backup strategy" (0.92 similarity)
- "Data protection under KVKK" (0.85)
- "Disaster recovery plan" (0.81)
- "Customer data retention guidelines" (0.78)
The results are semantically related — independent of keyword matching.
RAG Architecture
Embedding + Vector DB + LLM combined: RAG (Retrieval-Augmented Generation).
RAG Flow
1. Preparation (Indexing):
[All documents] → [Embedding model] → [Vector DB]
2. Query:
[User question] → [Embedding] → [Find relevant docs in vector DB]
↓
[Question + Docs] → [LLM] → [Context-grounded answer]
SME Advantage
- Local LLM + local vector DB = data never leaves the organisation (KVKK)
- A "ChatGPT" experience over your own documents
- Adding a new document only requires recomputing its embedding
A Typical SME RAG Stack
An open-source stack example:
| Component | Choice |
|---|---|
| LLM runtime | Ollama (llama3.1:8b) |
| Embedding model | mxbai-embed-large |
| Vector DB | Qdrant |
| Orchestration | LangChain or LlamaIndex |
| UI | Open WebUI or a custom web app |
| Document parsing | unstructured.io, Apache Tika |
Setup (Docker Compose)
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
open-webui:
image: ghcr.io/open-webui/open-webui:main
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui_data:/app/backend/data
volumes:
ollama_data:
qdrant_data:
openwebui_data:
Pull these three services together and the SME's self-hosted "company knowledge ChatGPT" is ready.
Document Types
Typical documents that an SME might bring into RAG scope:
- Company policies (PDF, Word)
- Customer contracts
- Internal procedures and handbooks
- HR documents
- Technical documentation
- Wiki / Notion pages
- Email archives (when appropriate)
- Past projects
- Marketing materials
Sensitive-Data Filter
Not every document should go into RAG:
- KVKK-restricted (financial reports, M&A) — kept out
- Sensitive employee data — personnel files
- Customer-specific data — covered by KVKK
Access control inside RAG should be role-based: who can query which documents.
Document Preparation (Chunking)
A 100-page document can't be a single embedding; it has to be split into chunks.
Chunking Strategies
- Fixed size: e.g. 500 tokens with 50-token overlap
- Sentence-based: each sentence separately, simple
- Paragraph-based: preserves semantic cohesion
- Section-based: driven by Markdown headings
- Recursive: advanced — paragraph → sentence → word
Practical SME pick: 500 tokens + 50-token overlap, paragraph-aware.
Metadata
Add metadata to every chunk:
- Source document name
- Page number
- Section title
- Author, date
- Access permissions
Query results then show the user the source.
Turkish and Embeddings
SME environments are mostly Turkish-language; the embedding model must support Turkish well.
Models with Good Turkish Performance
- Cohere embed-multilingual-v3 (cloud API, best)
- mxbai-embed-large (local, good)
- BGE-M3 (multilingual, good)
- all-MiniLM-L6-v2 (mid-tier, fast)
- OpenAI text-embedding-3 (cloud, high quality)
For SMEs that need data privacy + Turkish: mxbai-embed-large or BGE-M3 run locally.
Performance Expectations
RAG performance at SME scale:
| Metric | Typical value |
|---|---|
| Document count | 1,000 – 100,000 |
| Total chunks | 10,000 – 1,000,000 |
| Embedding index build time | Hours (one-time) |
| Query response time | <1 second (vector search) |
| LLM response time | 3–15 seconds |
| Hardware | RTX 4090 / A100 ideal |
Common Mistakes
Typical pitfalls in SME RAG deployments:
- Indexing every document without filtering: sensitive-data leak risk
- No access control: anyone can query any document
- Chunks too large: the relevant section becomes diluted
- Chunks too small: context is lost
- Outdated Turkish model: poor answer quality
- No audit log: no record of who asked what
- No backups: if the vector DB is lost, you start from scratch
What Yamanlar Bilişim Offers
Our RAG/embedding support areas at SME scale:
- "Is RAG right for us?" assessment
- Document inventory and sensitivity classification
- Qdrant/Chroma vector DB installation
- Embedding model selection (Turkish-focused)
- LangChain / LlamaIndex orchestration
- Open WebUI or custom UI
- Access control and audit logging
- Annual model refresh
Frequently Asked Questions
Conclusion
Embeddings and vector databases move SME document search from the "Ctrl+F keyword match" era into the "semantic similarity" age. Combined with a local LLM through a RAG architecture, you can offer a "ChatGPT" experience over your company's own documents — without any data leaving the organisation, KVKK-friendly, at sustainable cost. A Qdrant + Ollama + mxbai-embed combo can be set up at SME scale in 1–2 weeks and pays off for years.
Yamanlar Bilişim offers RAG architecture design, deployment, and training services sized to your needs; we turn your company's documents from a forgotten archive into a queried, used knowledge asset.
Frequently Asked Questions
As an SME, do I need embeddings?
If your SME has more than 100 documents and time wasted searching is a real problem — yes. Manual search may suffice below ~50 documents; in environments with 500+ documents, embedding-based search makes a dramatic difference.
Cloud embedding API or local?
If KVKK is the priority: local (Ollama + mxbai-embed-large). If speed and quality come first, go cloud: OpenAI text-embedding-3-small is economical, Cohere embed-multilingual is strong in Turkish. A hybrid — sensitive data local, general R D in the cloud — is also viable.
How much disk/RAM does the vector DB need?
Typical maths: 100 documents × 100 chunks × 1,024 dims × 4 bytes = ~40 MB. 10,000 documents: ~4 GB. A vector DB server with 8–16 GB RAM and 100 GB disk is a sufficient starting point for an SME.
Isn't SharePoint search enough?
SharePoint search is keyword-based, with limited semantic capability in the latest versions (Microsoft Search). Embedding-based search is much stronger at semantic similarity. The SME need decides: as SharePoint grows, Microsoft 365 Copilot (a RAG-like offering) is an option; for tight data-privacy control, local RAG wins.
Doesn't Microsoft Copilot already do RAG?
Yes — M365 Copilot performs RAG over company data (M365 files, Teams, email). For SMEs already on M365, Copilot is the natural choice. The catch: documents outside M365 (old file servers, PDF archives, web content) aren't in Copilot's scope — they need a dedicated RAG solution.
Best local embedding model for Turkish?
mxbai-embed-large or BGE-M3 deliver acceptable Turkish performance. Cohere embed-multilingual on the cloud side is the strongest for Turkish (but the data leaves the building). For in-house local needs, mxbai-embed-large is the common pick.
Author
Serdar
Yamanlar Bilişim Expert
Writes content on IT infrastructure, cybersecurity, and digital transformation at Yamanlar Bilişim. Get in touch for any questions.
Professional Support
Get help on this topic
Let's design the Enterprise AI and Data Intelligence solution you need together. Our experts get back to you within 1 business day.
support@yamanlarbilisim.com.tr · Response time: 1 business day
Keep Reading
Related Articles

AI Policy: Rules for Using ChatGPT and Copilot in an SME
A corporate AI-policy guide for SMEs — using ChatGPT, Microsoft Copilot, and Claude responsibly, KVKK alignment, and employee rules.

Excel Automation: Killing Manual Work with Power Automate
Automating Excel workflows with Microsoft Power Automate — practical SME scenarios, connectors, and productivity gains.

Local LLM Deployment: Data-Private AI in an SME
Self-hosted LLM deployment at SME scale — running Ollama, LM Studio, and vLLM for data-privacy-first AI.