RAG Kit
이 콘텐츠는 아직 해당 언어로 제공되지 않습니다.
mashin’s RAG kit lets you build knowledge bases that run entirely on your machine. LibSQL vectors, Ollama embeddings, Extractous document extraction. No external services required.
The kit lives at @mashin/kits/rag and provides six machines that cover the full RAG lifecycle: ingesting documents, searching them semantically, and answering questions with source citations.
Quick Start
Three steps to a working knowledge base:
1. Ingest a document
ask ingest, from: "@mashin/kits/rag/ingest" file_path: "/docs/handbook.pdf" collection: "company_docs"2. Search it
ask search, from: "@mashin/kits/rag/search" query: "vacation policy" collection: "company_docs"3. Ask a question
ask answer, from: "@mashin/kits/rag/ask" question: "How many vacation days do new employees get?" collection: "company_docs"The ask machine retrieves relevant documents, feeds them to an LLM, and returns an answer with source citations.
Machines
| Machine | Path | Purpose |
|---|---|---|
ingest | @mashin/kits/rag/ingest | Ingest a single document (file or raw text) |
search | @mashin/kits/rag/search | Semantic similarity search with threshold filtering |
ask | @mashin/kits/rag/ask | Question answering with source citations |
ingest_folder | @mashin/kits/rag/ingest_folder | Batch-ingest all documents in a directory |
watch_folder | @mashin/kits/rag/watch_folder | Live sync: watch a folder and re-index on changes |
knowledge_base | @mashin/kits/rag/knowledge_base | Unified interface for ingest, search, and ask via action parameter |
Document Ingestion
The ingest machine takes a file path or raw text and runs it through a four-stage pipeline:
- Extract — Pull text from the source file using Extractous (handles PDF, DOCX, HTML, and more)
- Chunk — Split the text into overlapping segments
- Embed — Generate vector embeddings for each chunk
- Index — Store chunks and vectors in LibSQL for retrieval
Supported formats
PDF, DOCX, TXT, Markdown, HTML. Extractous handles the parsing; you pass a file path and get text back.
Inputs
ask ingest, from: "@mashin/kits/rag/ingest" file_path: "/docs/report.pdf" collection: "legal_docs" chunk_size: 1000 chunk_overlap: 200 chunking_strategy: "semantic" tags: ["legal", "2026"]| Input | Type | Default | Description |
|---|---|---|---|
file_path | text | — | Path to the document file |
text | text | — | Raw text to ingest (alternative to file_path) |
collection | text | "documents" | Target collection name |
chunk_size | number | 1000 | Maximum characters per chunk |
chunk_overlap | number | 200 | Overlap between consecutive chunks |
chunking_strategy | text | "semantic" | How to split text: "semantic", "paragraph", or "fixed" |
embedding_model | text | — | Override the default embedding model |
tags | list | [] | Metadata tags attached to each chunk |
You must provide either file_path or text. If both are given, file_path takes precedence.
Chunking strategies
- semantic (default): Splits on sentence boundaries and topic shifts. Produces chunks that preserve meaning. Best for most documents.
- paragraph: Splits on paragraph breaks. Good for well-structured documents like articles and reports.
- fixed: Splits at exact character boundaries. Fastest, but can cut mid-sentence. Use when chunk alignment matters more than readability.
Outputs
| Output | Type | Description |
|---|---|---|
document_id | text | Unique identifier for the ingested document |
chunks_indexed | number | How many chunks were created and indexed |
collection | text | The collection the document was added to |
elapsed_ms | number | Processing time in milliseconds |
Batch ingestion
Use ingest_folder to process an entire directory:
ask ingest_all, from: "@mashin/kits/rag/ingest_folder" folder: "/docs/legal" collection: "legal_docs" glob: "**/*.{pdf,md,docx}" chunk_size: 1000It walks the folder, matches files against the glob pattern, and ingests each one. The default glob is **/*.{pdf,txt,md,docx,html}.
| Output | Type | Description |
|---|---|---|
files_processed | number | Total files found matching the glob |
files_indexed | number | Files successfully ingested |
total_chunks | number | Total chunks across all files |
results | list | Per-file status (file path, chunk count, errors) |
Searching
The search machine embeds your query, runs vector similarity search against the collection, and returns ranked results.
ask find, from: "@mashin/kits/rag/search" query: "password reset process" collection: "help_docs" limit: 5 threshold: 0.7How it works
- The query text is embedded using the same model that embedded the documents
- Vector similarity (cosine distance) finds the closest chunks
- Results below the threshold score are filtered out
- Remaining results are ranked by score and returned
Inputs
| Input | Type | Default | Description |
|---|---|---|---|
query | text | required | The search query |
collection | text | "documents" | Which collection to search |
limit | number | 5 | Maximum results to return |
threshold | number | 0.7 | Minimum similarity score (0.0 to 1.0) |
Outputs
| Output | Type | Description |
|---|---|---|
results | list | Matching chunks, each with content, score, and source |
count | number | Number of results returned |
query | text | The original query (echoed back) |
Collection namespacing
Collections are isolated namespaces. Documents in "legal_docs" are never returned when searching "help_docs". Use collections to separate knowledge domains:
// Separate collections for different document typesask find_legal, from: "@mashin/kits/rag/search" query: input.question collection: "contracts"
ask find_support, from: "@mashin/kits/rag/search" query: input.question collection: "support_articles"Question Answering
The ask machine combines search and LLM synthesis. It retrieves relevant documents, formats them as context, and asks an LLM to answer the question with citations.
ask answer, from: "@mashin/kits/rag/ask" question: "What is our refund policy?" collection: "support_docs" limit: 5 model: "fast"How it works
- Search: Calls
@mashin/kits/rag/searchto find relevant chunks - Format: Builds a numbered context string from results (
[1] source: content) - Synthesize: Passes the context and question to an LLM with instructions to cite sources
- Return: The answer text plus the list of cited sources
The LLM is instructed to answer only from the provided documents. If the documents don’t contain enough information, it says so rather than fabricating an answer.
Inputs
| Input | Type | Default | Description |
|---|---|---|---|
question | text | required | The question to answer |
collection | text | "documents" | Collection to search |
limit | number | 5 | How many chunks to retrieve as context |
threshold | number | 0.7 | Minimum similarity for retrieved chunks |
model | text | "fast" | Which model tier to use for synthesis |
Outputs
| Output | Type | Description |
|---|---|---|
answer | text | The synthesized answer with inline citations |
sources | list | Source identifiers cited in the answer |
source_count | number | Number of sources cited |
Model selection
The model parameter accepts any mashin model specifier:
// Use the fast tier (default; picks local model if available)ask answer, from: "@mashin/kits/rag/ask" question: input.question model: "fast"
// Use a specific local modelask answer, from: "@mashin/kits/rag/ask" question: input.question model: "ollama:qwen3:8b"
// Use a cloud model for higher qualityask answer, from: "@mashin/kits/rag/ask" question: input.question model: "anthropic:claude-sonnet-4-6"Folder Watching
The watch_folder machine monitors a directory and keeps the knowledge base in sync. When files are added or modified, they get re-indexed automatically.
ask watch, from: "@mashin/kits/rag/watch_folder" folder: "/docs/wiki" collection: "wiki" poll_interval: 30 glob: "**/*.{md,txt}"| Input | Type | Default | Description |
|---|---|---|---|
folder | text | required | Directory to watch |
collection | text | "documents" | Target collection |
glob | text | "**/*.{pdf,txt,md,docx,html}" | File pattern to match |
poll_interval | number | 30 | Seconds between scans |
chunk_size | number | 1000 | Characters per chunk |
embedding_model | text | — | Override embedding model |
The machine tracks sync state in memory, so it won’t re-index unchanged files. It has has agency, meaning it can operate autonomously over time.
MCP Integration
The knowledge_base machine exposes itself as an MCP tool, so you can use it directly from Claude Desktop (or any MCP client).
The machine has an expresses > mcp section that registers it as a callable tool:
expresses mcp name: "knowledge_base" description: "A complete knowledge base. Actions: 'ask', 'search', 'ingest'."Claude Desktop configuration
Add this to your Claude Desktop MCP config (~/Library/Application Support/Claude/claude_desktop_config.json):
{ "mcpServers": { "mashin": { "command": "mashin", "args": ["serve", "--mcp"], "env": {} } }}Once connected, Claude can call knowledge_base as a tool. You say “search my docs for deployment instructions” and Claude calls the machine with action: "search", gets results, and responds with what it found.
The knowledge_base machine routes by action parameter:
| Action | What it does |
|---|---|
"ask" | Searches and synthesizes an answer with citations |
"search" | Returns raw search results |
"ingest" | Ingests a document from file_path, folder, or text |
Embedding Models
The default embedding model depends on your environment. If Ollama is running, it uses a local model. Otherwise, it falls back to a cloud embedding API.
| Provider | Model | Dimensions | Local |
|---|---|---|---|
| Ollama | nomic-embed-text | 768 | Yes |
| Ollama | mxbai-embed-large | 1024 | Yes |
| OpenAI | text-embedding-3-small | 1536 | No |
| OpenAI | text-embedding-3-large | 3072 | No |
| Voyage | voyage-3 | 1024 | No |
| Cohere | embed-english-v3.0 | 1024 | No |
To use a specific model, pass embedding_model to the ingest machine:
ask ingest, from: "@mashin/kits/rag/ingest" file_path: "/docs/report.pdf" collection: "docs" embedding_model: "ollama:nomic-embed-text"Use the same embedding model for ingestion and search. Mixing models produces meaningless similarity scores.
Advanced Usage
Multiple collections
Organize knowledge by domain, then search across them:
machine multi_domain_qa
accepts question as text, is required domains as list, default: ["engineering", "support", "legal"]
responds with answer as text sources as list
implements ask eng_results, from: "@mashin/kits/rag/search" query: input.question collection: "engineering" limit: 3
ask support_results, from: "@mashin/kits/rag/search" query: input.question collection: "support" limit: 3
compute merge let all = (steps.eng_results.results ?? []).concat(steps.support_results.results ?? []) let sorted = all.sort((a, b) => b.score - a.score).slice(0, 5) {results: sorted, count: sorted.length}
ask synthesize, using: "fast" with role "Answer using only the provided sources. Cite with [1], [2], etc." with task "Question: ${input.question}\n\nSources:\n${steps.merge.results.map((r, i) => '[' + (i+1) + '] ' + r.content).join('\n\n')}" returns answer as text sources as listCustom metadata
Tag documents during ingestion for filtered retrieval later:
ask ingest, from: "@mashin/kits/rag/ingest" file_path: "/docs/q1-report.pdf" collection: "reports" tags: ["finance", "Q1", "2026"]Governance
Every RAG kit machine declares its permissions in ensures. The ingest machine requires memory_write and embedding_request. The search machine requires memory_read and embedding_request. The ask machine adds llm_inference on top of search permissions.
These permissions are checked at runtime by the governance interpreter. A machine that calls @mashin/kits/rag/ask must have at least the same permissions, or the call is denied.
Architecture
User machine | ask ... from: "@mashin/kits/rag/ask" | +-- search (embed query -> vector similarity -> ranked results) | | | +-- @mashin/actions/rag/query (LibSQL vector search) | +-- synthesize (format context -> LLM -> answer with citations) | +-- LLM provider (Ollama / cloud)
Ingestion path: ask ... from: "@mashin/kits/rag/ingest" | +-- @mashin/actions/rag/ingest | +-- Extractous (document extraction) +-- Chunker (semantic / paragraph / fixed) +-- Embedding API (Ollama / cloud) +-- LibSQL (vector storage)Next steps
- Memory - The
rememberandrecallprimitives that RAG builds on - Composition - How
ask ... fromworks across machine boundaries - Surfaces: MCP - Exposing machines as MCP tools
- Local Intelligence - Running embedding and inference models locally