As an SME, do I need embeddings?

If your SME has more than 100 documents and time wasted searching is a real problem — yes. Manual search may suffice below ~50 documents; in environments with 500+ documents, embedding-based search makes a dramatic difference.

Cloud embedding API or local?

If KVKK is the priority: local (Ollama + mxbai-embed-large). If speed and quality come first, go cloud: OpenAI text-embedding-3-small is economical, Cohere embed-multilingual is strong in Turkish. A hybrid — sensitive data local, general R D in the cloud — is also viable.

How much disk/RAM does the vector DB need?

Typical maths: 100 documents × 100 chunks × 1,024 dims × 4 bytes = ~40 MB. 10,000 documents: ~4 GB. A vector DB server with 8–16 GB RAM and 100 GB disk is a sufficient starting point for an SME.

Isn't SharePoint search enough?

SharePoint search is keyword-based, with limited semantic capability in the latest versions (Microsoft Search). Embedding-based search is much stronger at semantic similarity. The SME need decides: as SharePoint grows, Microsoft 365 Copilot (a RAG-like offering) is an option; for tight data-privacy control, local RAG wins.

Doesn't Microsoft Copilot already do RAG?

Yes — M365 Copilot performs RAG over company data (M365 files, Teams, email). For SMEs already on M365, Copilot is the natural choice. The catch: documents outside M365 (old file servers, PDF archives, web content) aren't in Copilot's scope — they need a dedicated RAG solution.

Best local embedding model for Turkish?

mxbai-embed-large or BGE-M3 deliver acceptable Turkish performance. Cohere embed-multilingual on the cloud side is the strongest for Turkish (but the data leaves the building). For in-house local needs, mxbai-embed-large is the common pick.

Embeddings & Vector DBs: Semantic Document Search for SMEs

TL;DR: Embeddings and vector databases — moving SME document search to semantic retrieval, RAG architecture, and an implementation guide.

Summary: An embedding (vectorisation) turns text into a numeric vector; the question "are these two pieces of text semantically similar?" is then answered by vector distance maths. Vector databases (Qdrant, Chroma, Weaviate, Pinecone) store and query these vectors at scale. The most practical use case for SMEs is refreshing document search. Instead of the old "Ctrl+F keyword match", you get a "find me documents that are topically similar to this" approach. Combine a local LLM with a vector DB through a RAG (Retrieval-Augmented Generation) architecture and you can offer a "ChatGPT" experience over your company's own documents — without any data leaving the organisation.

In SMEs, documents are scattered across SharePoint, file servers, Notion, and email archives. When information is needed — "where is that contract?", "what did we do for this customer last year?", "what does our company policy say?" — manual search takes hours. Classic keyword search returns 50 documents that contain the word "backup" and you can't tell which one is relevant. Embedding-based semantic search will surface documents about backup, BCP, and KVKK compliance even when the query "what should we do to protect our data?" doesn't share any keywords with them.

In this article we cover the embedding concept at SME scale, vector DB options, and refreshing document search with a RAG architecture. The target audience: IT managers, teams that want more efficient document management, and decision makers who want to take advantage of modern AI.

What is an Embedding?

An embedding converts text (a word, sentence, paragraph, or document) into a fixed-size numeric vector.

A Typical Embedding

Sentence: "A backup strategy is required for KVKK compliance"

Embedding (example, 384 dimensions):

[0.12, -0.45, 0.67, 0.23, -0.11, ...]

This vector represents the "meaning" of the sentence. Sentences with similar meanings have similar vectors.

Vector Similarity

The "distance" between two vectors is measured with cosine similarity:

1.0 = same meaning
0.5 = related but different topic
0.0 = unrelated
-1.0 = opposite

SME rule of thumb: 0.7+ similarity = "relevant document".

Embedding Models

Model	Dimensions	Footprint	SME fit
all-MiniLM-L6-v2	384	Small	Light, fast
mxbai-embed-large	1024	Medium	General purpose
OpenAI text-embedding-3-small	1536	Cloud API	High quality
OpenAI text-embedding-3-large	3072	Cloud API	Premium
Cohere embed-multilingual	1024	Cloud API	Good Turkish support
BGE-large	1024	Local	Solid open source

Pragmatic SME starting point: all-MiniLM-L6-v2 (local, fast, free) or mxbai-embed-large (better quality + local).

What is a Vector DB?

A vector DB stores millions or billions of embeddings at scale and answers the question "which 10 vectors are closest to this one?" quickly.

Vector DB Options

DB	Type	SME fit
Qdrant	Open source, self-host	Recommended
Chroma	Open source, lightweight	Small SMEs
Weaviate	Open source, feature rich	Complex needs
Milvus	Open source, scalable	Larger SMEs
Pinecone	Cloud SaaS	Quick start
PostgreSQL + pgvector	Add-on to existing Postgres	SMEs already on Postgres
Elasticsearch	Built-in vector support	Existing Elastic users

Typical SME pick: Qdrant (feature rich, scalable) or Chroma (easiest to set up).

A Practical SME Scenario — Document Search

The difference between classic and embedding-based search:

Classic Search (Ctrl+F)

Question: "How do we back up customer data?"

Result: every document containing the word "backup".

Problems:

A document titled "Data protection" gets missed
A document titled "Backup strategy" isn't found (if it doesn't contain the queried word)
100 results show up, 90% irrelevant

Embedding-Based Search

Question: "How do we back up customer data?"

System: question → embedding → top 10 nearest documents in the vector DB.

Result:

"Backup strategy" (0.92 similarity)
"Data protection under KVKK" (0.85)
"Disaster recovery plan" (0.81)
"Customer data retention guidelines" (0.78)

The results are semantically related — independent of keyword matching.

RAG Architecture

Embedding + Vector DB + LLM combined: RAG (Retrieval-Augmented Generation).

RAG Flow

1. Preparation (Indexing):
[All documents] → [Embedding model] → [Vector DB]

2. Query:
[User question] → [Embedding] → [Find relevant docs in vector DB]
                                ↓
         [Question + Docs] → [LLM] → [Context-grounded answer]

SME Advantage

Local LLM + local vector DB = data never leaves the organisation (KVKK)
A "ChatGPT" experience over your own documents
Adding a new document only requires recomputing its embedding

A Typical SME RAG Stack

An open-source stack example:

Component	Choice
LLM runtime	Ollama (llama3.1:8b)
Embedding model	mxbai-embed-large
Vector DB	Qdrant
Orchestration	LangChain or LlamaIndex
UI	Open WebUI or a custom web app
Document parsing	unstructured.io, Apache Tika

Setup (Docker Compose)

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui_data:/app/backend/data

volumes:
  ollama_data:
  qdrant_data:
  openwebui_data:

Pull these three services together and the SME's self-hosted "company knowledge ChatGPT" is ready.

Document Types

Typical documents that an SME might bring into RAG scope:

Company policies (PDF, Word)
Customer contracts
Internal procedures and handbooks
HR documents
Technical documentation
Wiki / Notion pages
Email archives (when appropriate)
Past projects
Marketing materials

Sensitive-Data Filter

Not every document should go into RAG:

KVKK-restricted (financial reports, M&A) — kept out
Sensitive employee data — personnel files
Customer-specific data — covered by KVKK

Access control inside RAG should be role-based: who can query which documents.

Document Preparation (Chunking)

A 100-page document can't be a single embedding; it has to be split into chunks.

Chunking Strategies

Fixed size: e.g. 500 tokens with 50-token overlap
Sentence-based: each sentence separately, simple
Paragraph-based: preserves semantic cohesion
Section-based: driven by Markdown headings
Recursive: advanced — paragraph → sentence → word

Practical SME pick: 500 tokens + 50-token overlap, paragraph-aware.

Metadata

Add metadata to every chunk:

Source document name
Page number
Section title
Author, date
Access permissions

Query results then show the user the source.

Turkish and Embeddings

SME environments are mostly Turkish-language; the embedding model must support Turkish well.

Models with Good Turkish Performance

Cohere embed-multilingual-v3 (cloud API, best)
mxbai-embed-large (local, good)
BGE-M3 (multilingual, good)
all-MiniLM-L6-v2 (mid-tier, fast)
OpenAI text-embedding-3 (cloud, high quality)

For SMEs that need data privacy + Turkish: mxbai-embed-large or BGE-M3 run locally.

Performance Expectations

RAG performance at SME scale:

Metric	Typical value
Document count	1,000 – 100,000
Total chunks	10,000 – 1,000,000
Embedding index build time	Hours (one-time)
Query response time	<1 second (vector search)
LLM response time	3–15 seconds
Hardware	RTX 4090 / A100 ideal

Common Mistakes

Typical pitfalls in SME RAG deployments:

Indexing every document without filtering: sensitive-data leak risk
No access control: anyone can query any document
Chunks too large: the relevant section becomes diluted
Chunks too small: context is lost
Outdated Turkish model: poor answer quality
No audit log: no record of who asked what
No backups: if the vector DB is lost, you start from scratch

What Yamanlar Bilişim Offers

Our RAG/embedding support areas at SME scale:

"Is RAG right for us?" assessment
Document inventory and sensitivity classification
Qdrant/Chroma vector DB installation
Embedding model selection (Turkish-focused)
LangChain / LlamaIndex orchestration
Open WebUI or custom UI
Access control and audit logging
Annual model refresh

Frequently Asked Questions

Conclusion

Embeddings and vector databases move SME document search from the "Ctrl+F keyword match" era into the "semantic similarity" age. Combined with a local LLM through a RAG architecture, you can offer a "ChatGPT" experience over your company's own documents — without any data leaving the organisation, KVKK-friendly, at sustainable cost. A Qdrant + Ollama + mxbai-embed combo can be set up at SME scale in 1–2 weeks and pays off for years.

Yamanlar Bilişim offers RAG architecture design, deployment, and training services sized to your needs; we turn your company's documents from a forgotten archive into a queried, used knowledge asset.

Embeddings and Vector DBs: Refreshing SME Document Search

What is an Embedding?

A Typical Embedding

Vector Similarity

Embedding Models

What is a Vector DB?

Vector DB Options

A Practical SME Scenario — Document Search

Classic Search (Ctrl+F)

Embedding-Based Search

RAG Architecture

RAG Flow

SME Advantage

A Typical SME RAG Stack

Setup (Docker Compose)

Document Types

Sensitive-Data Filter

Document Preparation (Chunking)

Chunking Strategies

Metadata

Turkish and Embeddings

Models with Good Turkish Performance

Performance Expectations

Common Mistakes

What Yamanlar Bilişim Offers

Frequently Asked Questions

Conclusion

Frequently Asked Questions

Get help on this topic

Related Articles

AI Policy: Rules for Using ChatGPT and Copilot in an SME

Excel Automation: Killing Manual Work with Power Automate

Local LLM Deployment: Data-Private AI in an SME