Enterprise AI and Data IntelligenceMay 11, 2026Serdar8 min read

Embeddings and Vector DBs: Refreshing SME Document Search

Embeddings and Vector DBs: Refreshing SME Document Search

TL;DR: Embeddings and vector databases — moving SME document search to semantic retrieval, RAG architecture, and an implementation guide.

Summary: An embedding (vectorisation) turns text into a numeric vector; the question "are these two pieces of text semantically similar?" is then answered by vector distance maths. Vector databases (Qdrant, Chroma, Weaviate, Pinecone) store and query these vectors at scale. The most practical use case for SMEs is refreshing document search. Instead of the old "Ctrl+F keyword match", you get a "find me documents that are topically similar to this" approach. Combine a local LLM with a vector DB through a RAG (Retrieval-Augmented Generation) architecture and you can offer a "ChatGPT" experience over your company's own documents — without any data leaving the organisation.

In SMEs, documents are scattered across SharePoint, file servers, Notion, and email archives. When information is needed — "where is that contract?", "what did we do for this customer last year?", "what does our company policy say?" — manual search takes hours. Classic keyword search returns 50 documents that contain the word "backup" and you can't tell which one is relevant. Embedding-based semantic search will surface documents about backup, BCP, and KVKK compliance even when the query "what should we do to protect our data?" doesn't share any keywords with them.

In this article we cover the embedding concept at SME scale, vector DB options, and refreshing document search with a RAG architecture. The target audience: IT managers, teams that want more efficient document management, and decision makers who want to take advantage of modern AI.

What is an Embedding?

An embedding converts text (a word, sentence, paragraph, or document) into a fixed-size numeric vector.

A Typical Embedding

Sentence: "A backup strategy is required for KVKK compliance"

Embedding (example, 384 dimensions):

[0.12, -0.45, 0.67, 0.23, -0.11, ...]

This vector represents the "meaning" of the sentence. Sentences with similar meanings have similar vectors.

Vector Similarity

The "distance" between two vectors is measured with cosine similarity:

  • 1.0 = same meaning
  • 0.5 = related but different topic
  • 0.0 = unrelated
  • -1.0 = opposite

SME rule of thumb: 0.7+ similarity = "relevant document".

Embedding Models

Model Dimensions Footprint SME fit
all-MiniLM-L6-v2 384 Small Light, fast
mxbai-embed-large 1024 Medium General purpose
OpenAI text-embedding-3-small 1536 Cloud API High quality
OpenAI text-embedding-3-large 3072 Cloud API Premium
Cohere embed-multilingual 1024 Cloud API Good Turkish support
BGE-large 1024 Local Solid open source

Pragmatic SME starting point: all-MiniLM-L6-v2 (local, fast, free) or mxbai-embed-large (better quality + local).

What is a Vector DB?

A vector DB stores millions or billions of embeddings at scale and answers the question "which 10 vectors are closest to this one?" quickly.

Vector DB Options

DB Type SME fit
Qdrant Open source, self-host Recommended
Chroma Open source, lightweight Small SMEs
Weaviate Open source, feature rich Complex needs
Milvus Open source, scalable Larger SMEs
Pinecone Cloud SaaS Quick start
PostgreSQL + pgvector Add-on to existing Postgres SMEs already on Postgres
Elasticsearch Built-in vector support Existing Elastic users

Typical SME pick: Qdrant (feature rich, scalable) or Chroma (easiest to set up).

The difference between classic and embedding-based search:

Classic Search (Ctrl+F)

Question: "How do we back up customer data?"

Result: every document containing the word "backup".

Problems:

  • A document titled "Data protection" gets missed
  • A document titled "Backup strategy" isn't found (if it doesn't contain the queried word)
  • 100 results show up, 90% irrelevant

Question: "How do we back up customer data?"

System: question → embedding → top 10 nearest documents in the vector DB.

Result:

  • "Backup strategy" (0.92 similarity)
  • "Data protection under KVKK" (0.85)
  • "Disaster recovery plan" (0.81)
  • "Customer data retention guidelines" (0.78)

The results are semantically related — independent of keyword matching.

RAG Architecture

Embedding + Vector DB + LLM combined: RAG (Retrieval-Augmented Generation).

RAG Flow

1. Preparation (Indexing):
[All documents] → [Embedding model] → [Vector DB]

2. Query:
[User question] → [Embedding] → [Find relevant docs in vector DB]
                                ↓
         [Question + Docs] → [LLM] → [Context-grounded answer]

SME Advantage

  • Local LLM + local vector DB = data never leaves the organisation (KVKK)
  • A "ChatGPT" experience over your own documents
  • Adding a new document only requires recomputing its embedding

A Typical SME RAG Stack

An open-source stack example:

Component Choice
LLM runtime Ollama (llama3.1:8b)
Embedding model mxbai-embed-large
Vector DB Qdrant
Orchestration LangChain or LlamaIndex
UI Open WebUI or a custom web app
Document parsing unstructured.io, Apache Tika

Setup (Docker Compose)

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui_data:/app/backend/data

volumes:
  ollama_data:
  qdrant_data:
  openwebui_data:

Pull these three services together and the SME's self-hosted "company knowledge ChatGPT" is ready.

Document Types

Typical documents that an SME might bring into RAG scope:

  • Company policies (PDF, Word)
  • Customer contracts
  • Internal procedures and handbooks
  • HR documents
  • Technical documentation
  • Wiki / Notion pages
  • Email archives (when appropriate)
  • Past projects
  • Marketing materials

Sensitive-Data Filter

Not every document should go into RAG:

  • KVKK-restricted (financial reports, M&A) — kept out
  • Sensitive employee data — personnel files
  • Customer-specific data — covered by KVKK

Access control inside RAG should be role-based: who can query which documents.

Document Preparation (Chunking)

A 100-page document can't be a single embedding; it has to be split into chunks.

Chunking Strategies

  • Fixed size: e.g. 500 tokens with 50-token overlap
  • Sentence-based: each sentence separately, simple
  • Paragraph-based: preserves semantic cohesion
  • Section-based: driven by Markdown headings
  • Recursive: advanced — paragraph → sentence → word

Practical SME pick: 500 tokens + 50-token overlap, paragraph-aware.

Metadata

Add metadata to every chunk:

  • Source document name
  • Page number
  • Section title
  • Author, date
  • Access permissions

Query results then show the user the source.

Turkish and Embeddings

SME environments are mostly Turkish-language; the embedding model must support Turkish well.

Models with Good Turkish Performance

  • Cohere embed-multilingual-v3 (cloud API, best)
  • mxbai-embed-large (local, good)
  • BGE-M3 (multilingual, good)
  • all-MiniLM-L6-v2 (mid-tier, fast)
  • OpenAI text-embedding-3 (cloud, high quality)

For SMEs that need data privacy + Turkish: mxbai-embed-large or BGE-M3 run locally.

Performance Expectations

RAG performance at SME scale:

Metric Typical value
Document count 1,000 – 100,000
Total chunks 10,000 – 1,000,000
Embedding index build time Hours (one-time)
Query response time <1 second (vector search)
LLM response time 3–15 seconds
Hardware RTX 4090 / A100 ideal

Common Mistakes

Typical pitfalls in SME RAG deployments:

  • Indexing every document without filtering: sensitive-data leak risk
  • No access control: anyone can query any document
  • Chunks too large: the relevant section becomes diluted
  • Chunks too small: context is lost
  • Outdated Turkish model: poor answer quality
  • No audit log: no record of who asked what
  • No backups: if the vector DB is lost, you start from scratch

What Yamanlar Bilişim Offers

Our RAG/embedding support areas at SME scale:

  • "Is RAG right for us?" assessment
  • Document inventory and sensitivity classification
  • Qdrant/Chroma vector DB installation
  • Embedding model selection (Turkish-focused)
  • LangChain / LlamaIndex orchestration
  • Open WebUI or custom UI
  • Access control and audit logging
  • Annual model refresh

Frequently Asked Questions

Conclusion

Embeddings and vector databases move SME document search from the "Ctrl+F keyword match" era into the "semantic similarity" age. Combined with a local LLM through a RAG architecture, you can offer a "ChatGPT" experience over your company's own documents — without any data leaving the organisation, KVKK-friendly, at sustainable cost. A Qdrant + Ollama + mxbai-embed combo can be set up at SME scale in 1–2 weeks and pays off for years.

Yamanlar Bilişim offers RAG architecture design, deployment, and training services sized to your needs; we turn your company's documents from a forgotten archive into a queried, used knowledge asset.

Frequently Asked Questions

As an SME, do I need embeddings?

If your SME has more than 100 documents and time wasted searching is a real problem — yes. Manual search may suffice below ~50 documents; in environments with 500+ documents, embedding-based search makes a dramatic difference.

Cloud embedding API or local?

If KVKK is the priority: local (Ollama + mxbai-embed-large). If speed and quality come first, go cloud: OpenAI text-embedding-3-small is economical, Cohere embed-multilingual is strong in Turkish. A hybrid — sensitive data local, general R D in the cloud — is also viable.

How much disk/RAM does the vector DB need?

Typical maths: 100 documents × 100 chunks × 1,024 dims × 4 bytes = ~40 MB. 10,000 documents: ~4 GB. A vector DB server with 8–16 GB RAM and 100 GB disk is a sufficient starting point for an SME.

Isn't SharePoint search enough?

SharePoint search is keyword-based, with limited semantic capability in the latest versions (Microsoft Search). Embedding-based search is much stronger at semantic similarity. The SME need decides: as SharePoint grows, Microsoft 365 Copilot (a RAG-like offering) is an option; for tight data-privacy control, local RAG wins.

Doesn't Microsoft Copilot already do RAG?

Yes — M365 Copilot performs RAG over company data (M365 files, Teams, email). For SMEs already on M365, Copilot is the natural choice. The catch: documents outside M365 (old file servers, PDF archives, web content) aren't in Copilot's scope — they need a dedicated RAG solution.

Best local embedding model for Turkish?

mxbai-embed-large or BGE-M3 deliver acceptable Turkish performance. Cohere embed-multilingual on the cloud side is the strongest for Turkish (but the data leaves the building). For in-house local needs, mxbai-embed-large is the common pick.

Share:
Last updated: May 11, 2026
S

Author

Serdar

Yamanlar Bilişim Expert

Writes content on IT infrastructure, cybersecurity, and digital transformation at Yamanlar Bilişim. Get in touch for any questions.

Professional Support

Get help on this topic

Let's design the Enterprise AI and Data Intelligence solution you need together. Our experts get back to you within 1 business day.

support@yamanlarbilisim.com.tr · Response time: 1 business day