KIKIneAhnung
Back to Tutorials

Level 4 — Expert

Expert

You don't just want to use AI — you want to build your own AI systems. This level covers RAG, MCP servers, custom knowledge bases, and advanced prompt engineering. Welcome to the workshop.

What Is RAG (Retrieval Augmented Generation)?

Imagine asking an AI about your company — and it makes up the answer because it doesn't know your internal documents. That's exactly the problem RAG (Retrieval Augmented Generation) solves.

The principle is elegant: before the AI answers, it first searches yourdata — documents, FAQs, product descriptions, internal wikis — and uses the relevant passages as context for its response.

Why does this matter?

  • No hallucinations about your content: The AI only answers based on real documents
  • Always up to date: New documents are automatically included, no retraining needed
  • Traceable: You can see which sources the AI used for its answer
  • Cost-effective: No expensive fine-tuning required — just provide your data

In practice, RAG works in three steps: your documents are split into small chunks and stored as mathematical vectors (called embeddings). When a question comes in, it's also converted to a vector, and the most similar document chunks are found. These chunks are then sent to the AI along with the question as context.

Building Custom Knowledge Bases

The key to a good RAG system is the knowledge base. This is where you store everything the AI should know: company documents, product catalogues, FAQs, manuals, meeting notes, or customer feedback.

For this you need a vector database — a specialised database that doesn't search for exact words but for meaning. The concept behind it is called embeddings: every text is converted into a mathematical vector (a list of numbers) that represents its meaning. Similar texts have similar vectors.

Supabase Vector (pgvector)

If you already use Supabase: pgvector is a PostgreSQL extension that enables vector search directly in your existing database. Ideal for getting started — no additional service needed.

Pinecone

A specialised cloud service for vector search. Scales well, has a generous free tier, and is set up in minutes. Great for projects that need to grow fast.

LanceDB

A local, open-source solution. Runs directly on your machine or server, no cloud needed. Perfect for privacy-sensitive applications or prototypes.

Don't want to build it all yourself?

Ready-made RAG chatbot solutions like SerahrChat offer AI chatbots as a managed service — with custom knowledge base, GDPR-oriented and no technical setup required. Ideal for SMBs looking to add a FAQ bot to their website.

The typical workflow: upload documents, split them into chunks, generate embeddings (e.g. with OpenAI's embedding models or Voyage AI), and store them in the vector database. When a query comes in, the most relevant chunks are retrieved and passed to the AI as context.

Understanding MCP (Model Context Protocol)

MCP(Model Context Protocol) is an open standard developed by Anthropic that defines how AI models communicate with external tools and data sources. Think of MCP as “USB for AI”: a universal connector that lets AI access any tool.

Before MCP, every tool had to build its own integration. With MCP, there's a standard — implement it once, and it works with all compatible AI systems. Since 2025, MCP has established itself as the defining standard, and an increasing number of tools and platforms support it.

What can AI do with MCP?

  • Read and write files: The AI accesses your local file system
  • Query databases: SQL queries directly from the chat
  • Control browsers: Open websites, fill forms, take screenshots
  • Call APIs: Send emails, manage calendars, create tickets
  • Execute code: Run Python scripts, shell commands, launch tests

What makes it special: the AI decides which tool to use and when. You say “Create a report on our latest sales figures,” and the AI knows it needs to first query the database, then analyse the numbers, and finally create a formatted report.

Setting Up MCP Servers

An MCP server is a small programme that sits between the AI and an external service. It provides “tools” that the AI can use. Setting one up is easier than you might think:

1

Local MCP Servers

Run directly on your machine. Ideal for personal projects: file system access, local databases, Git integration. Installation is usually a single npm or pip command.

2

Cloud MCP Servers

Run on a server and are reachable via the internet. Useful for team setups or when the AI needs to access cloud services like Stripe, Supabase, or GitHub.

3

Marketplace & Community

There are already hundreds of ready-made MCP servers for common tools: Slack, Notion, Google Drive, PostgreSQL, Jira, and many more. Often, you just install an existing server and configure it.

MCP is most commonly used with Claude (the desktop app or Claude Code). In the configuration file, you simply list your MCP servers, and Claude can automatically use their tools. The configuration is a simple JSON file.

Prompt Engineering for Experts

You know the basics of prompting from the previous levels. Now it's about techniques that deliver professional results — reproducible and consistent.

Chain-of-Thought

Ask the AI to think step by step: “Explain your reasoning before you answer.” This significantly improves quality on complex tasks because the AI can self-correct along the way.

Few-Shot Learning

Give the AI 2–3 examples of the desired format. “Here are examples of what I want: [Example 1] [Example 2]. Now create one for: [your topic].” The AI recognises the pattern and reproduces it.

Structured Output

Request outputs in a specific format: JSON, Markdown tables, CSV. “Respond exclusively as JSON with the fields: title, summary, tags.” This makes results machine-processable.

System Prompts for Consistent Results

When using APIs, you can set a system prompt that defines the AI's fundamental behaviour: role, tone, constraints, output format. This is the key to professional, reproducible results.

Tip: combine these techniques. A system prompt with role definition plus few-shot examples plus structured output yields consistent, high-quality results — every time.

Fine-Tuning vs. RAG vs. Prompting

When to use which approach? This is a crucial question, and the answer depends on your specific use case. Here's an overview:

CriterionPromptingRAGFine-Tuning
CostLowMediumHigh
Setup timeMinutesHours–daysDays–weeks
Own dataIn the promptExternal databaseIn the model
FreshnessAlways currentCurrent (when updated)Stale after training
Best forGeneral tasks, prototypesCompany knowledge, support, docsSpecialised style, domain expertise

Rules of thumb:Always start with prompting. If the context window isn't enough or you have large amounts of documents: RAG. Fine-tuning only when you need a very specific style or domain language and the other approaches fall short.

Security and Privacy for Custom AI Systems

When you build your own AI systems, you carry responsibility for the data you process. Especially in Germany and the EU, strict rules apply — and that's a good thing.

Self-Hosted vs. Cloud

Self-hosted: You run everything on your own servers. Full control over the data, but more maintenance and security effort.
Cloud: Services like OpenAI or Anthropic process your data on their servers. Easier, but you must carefully review the privacy policies.

Data Residency

Where is data stored? Many companies are required to keep data within the EU. Supabase, for example, offers servers in Frankfurt, and some AI providers have European data centres.

GDPR Compliance

If your AI system processes personal data, you need: a privacy policy, data processing agreements (DPAs) with all service providers, technical and organisational measures, and potentially a Data Protection Impact Assessment (DPIA).

Important: never send sensitive data (passwords, credit card numbers, health data) to AI APIs without first checking whether the provider uses that data for training. Most providers don't when using the API — but verify it.

This field is evolving rapidly

RAG, MCP, and the technologies described here are developing fast. What's best practice today may be complemented by better approaches tomorrow. Stay curious, experiment with different tools, and don't hesitate to question established solutions. The AI world rewards curiosity.