跳转到内容
Developer Preview — APIs and language features may change before 1.0

RAG Kit

此内容尚不支持你的语言。

mashin’s RAG kit lets you build knowledge bases that run entirely on your machine. LibSQL vectors, Ollama embeddings, Extractous document extraction. No external services required.

The kit lives at @mashin/kits/rag and provides six machines that cover the full RAG lifecycle: ingesting documents, searching them semantically, and answering questions with source citations.

Quick Start

Three steps to a working knowledge base:

1. Ingest a document

ask ingest, from: "@mashin/kits/rag/ingest"
file_path: "/docs/handbook.pdf"
collection: "company_docs"

2. Search it

ask search, from: "@mashin/kits/rag/search"
query: "vacation policy"
collection: "company_docs"

3. Ask a question

ask answer, from: "@mashin/kits/rag/ask"
question: "How many vacation days do new employees get?"
collection: "company_docs"

The ask machine retrieves relevant documents, feeds them to an LLM, and returns an answer with source citations.

Machines

MachinePathPurpose
ingest@mashin/kits/rag/ingestIngest a single document (file or raw text)
search@mashin/kits/rag/searchSemantic similarity search with threshold filtering
ask@mashin/kits/rag/askQuestion answering with source citations
ingest_folder@mashin/kits/rag/ingest_folderBatch-ingest all documents in a directory
watch_folder@mashin/kits/rag/watch_folderLive sync: watch a folder and re-index on changes
knowledge_base@mashin/kits/rag/knowledge_baseUnified interface for ingest, search, and ask via action parameter

Document Ingestion

The ingest machine takes a file path or raw text and runs it through a four-stage pipeline:

  1. Extract — Pull text from the source file using Extractous (handles PDF, DOCX, HTML, and more)
  2. Chunk — Split the text into overlapping segments
  3. Embed — Generate vector embeddings for each chunk
  4. Index — Store chunks and vectors in LibSQL for retrieval

Supported formats

PDF, DOCX, TXT, Markdown, HTML. Extractous handles the parsing; you pass a file path and get text back.

Inputs

ask ingest, from: "@mashin/kits/rag/ingest"
file_path: "/docs/report.pdf"
collection: "legal_docs"
chunk_size: 1000
chunk_overlap: 200
chunking_strategy: "semantic"
tags: ["legal", "2026"]
InputTypeDefaultDescription
file_pathtextPath to the document file
texttextRaw text to ingest (alternative to file_path)
collectiontext"documents"Target collection name
chunk_sizenumber1000Maximum characters per chunk
chunk_overlapnumber200Overlap between consecutive chunks
chunking_strategytext"semantic"How to split text: "semantic", "paragraph", or "fixed"
embedding_modeltextOverride the default embedding model
tagslist[]Metadata tags attached to each chunk

You must provide either file_path or text. If both are given, file_path takes precedence.

Chunking strategies

  • semantic (default): Splits on sentence boundaries and topic shifts. Produces chunks that preserve meaning. Best for most documents.
  • paragraph: Splits on paragraph breaks. Good for well-structured documents like articles and reports.
  • fixed: Splits at exact character boundaries. Fastest, but can cut mid-sentence. Use when chunk alignment matters more than readability.

Outputs

OutputTypeDescription
document_idtextUnique identifier for the ingested document
chunks_indexednumberHow many chunks were created and indexed
collectiontextThe collection the document was added to
elapsed_msnumberProcessing time in milliseconds

Batch ingestion

Use ingest_folder to process an entire directory:

ask ingest_all, from: "@mashin/kits/rag/ingest_folder"
folder: "/docs/legal"
collection: "legal_docs"
glob: "**/*.{pdf,md,docx}"
chunk_size: 1000

It walks the folder, matches files against the glob pattern, and ingests each one. The default glob is **/*.{pdf,txt,md,docx,html}.

OutputTypeDescription
files_processednumberTotal files found matching the glob
files_indexednumberFiles successfully ingested
total_chunksnumberTotal chunks across all files
resultslistPer-file status (file path, chunk count, errors)

Searching

The search machine embeds your query, runs vector similarity search against the collection, and returns ranked results.

ask find, from: "@mashin/kits/rag/search"
query: "password reset process"
collection: "help_docs"
limit: 5
threshold: 0.7

How it works

  1. The query text is embedded using the same model that embedded the documents
  2. Vector similarity (cosine distance) finds the closest chunks
  3. Results below the threshold score are filtered out
  4. Remaining results are ranked by score and returned

Inputs

InputTypeDefaultDescription
querytextrequiredThe search query
collectiontext"documents"Which collection to search
limitnumber5Maximum results to return
thresholdnumber0.7Minimum similarity score (0.0 to 1.0)

Outputs

OutputTypeDescription
resultslistMatching chunks, each with content, score, and source
countnumberNumber of results returned
querytextThe original query (echoed back)

Collection namespacing

Collections are isolated namespaces. Documents in "legal_docs" are never returned when searching "help_docs". Use collections to separate knowledge domains:

// Separate collections for different document types
ask find_legal, from: "@mashin/kits/rag/search"
query: input.question
collection: "contracts"
ask find_support, from: "@mashin/kits/rag/search"
query: input.question
collection: "support_articles"

Question Answering

The ask machine combines search and LLM synthesis. It retrieves relevant documents, formats them as context, and asks an LLM to answer the question with citations.

ask answer, from: "@mashin/kits/rag/ask"
question: "What is our refund policy?"
collection: "support_docs"
limit: 5
model: "fast"

How it works

  1. Search: Calls @mashin/kits/rag/search to find relevant chunks
  2. Format: Builds a numbered context string from results ([1] source: content)
  3. Synthesize: Passes the context and question to an LLM with instructions to cite sources
  4. Return: The answer text plus the list of cited sources

The LLM is instructed to answer only from the provided documents. If the documents don’t contain enough information, it says so rather than fabricating an answer.

Inputs

InputTypeDefaultDescription
questiontextrequiredThe question to answer
collectiontext"documents"Collection to search
limitnumber5How many chunks to retrieve as context
thresholdnumber0.7Minimum similarity for retrieved chunks
modeltext"fast"Which model tier to use for synthesis

Outputs

OutputTypeDescription
answertextThe synthesized answer with inline citations
sourceslistSource identifiers cited in the answer
source_countnumberNumber of sources cited

Model selection

The model parameter accepts any mashin model specifier:

// Use the fast tier (default; picks local model if available)
ask answer, from: "@mashin/kits/rag/ask"
question: input.question
model: "fast"
// Use a specific local model
ask answer, from: "@mashin/kits/rag/ask"
question: input.question
model: "ollama:qwen3:8b"
// Use a cloud model for higher quality
ask answer, from: "@mashin/kits/rag/ask"
question: input.question
model: "anthropic:claude-sonnet-4-6"

Folder Watching

The watch_folder machine monitors a directory and keeps the knowledge base in sync. When files are added or modified, they get re-indexed automatically.

ask watch, from: "@mashin/kits/rag/watch_folder"
folder: "/docs/wiki"
collection: "wiki"
poll_interval: 30
glob: "**/*.{md,txt}"
InputTypeDefaultDescription
foldertextrequiredDirectory to watch
collectiontext"documents"Target collection
globtext"**/*.{pdf,txt,md,docx,html}"File pattern to match
poll_intervalnumber30Seconds between scans
chunk_sizenumber1000Characters per chunk
embedding_modeltextOverride embedding model

The machine tracks sync state in memory, so it won’t re-index unchanged files. It has has agency, meaning it can operate autonomously over time.

MCP Integration

The knowledge_base machine exposes itself as an MCP tool, so you can use it directly from Claude Desktop (or any MCP client).

The machine has an expresses > mcp section that registers it as a callable tool:

expresses
mcp
name: "knowledge_base"
description: "A complete knowledge base. Actions: 'ask', 'search', 'ingest'."

Claude Desktop configuration

Add this to your Claude Desktop MCP config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
"mcpServers": {
"mashin": {
"command": "mashin",
"args": ["serve", "--mcp"],
"env": {}
}
}
}

Once connected, Claude can call knowledge_base as a tool. You say “search my docs for deployment instructions” and Claude calls the machine with action: "search", gets results, and responds with what it found.

The knowledge_base machine routes by action parameter:

ActionWhat it does
"ask"Searches and synthesizes an answer with citations
"search"Returns raw search results
"ingest"Ingests a document from file_path, folder, or text

Embedding Models

The default embedding model depends on your environment. If Ollama is running, it uses a local model. Otherwise, it falls back to a cloud embedding API.

ProviderModelDimensionsLocal
Ollamanomic-embed-text768Yes
Ollamamxbai-embed-large1024Yes
OpenAItext-embedding-3-small1536No
OpenAItext-embedding-3-large3072No
Voyagevoyage-31024No
Cohereembed-english-v3.01024No

To use a specific model, pass embedding_model to the ingest machine:

ask ingest, from: "@mashin/kits/rag/ingest"
file_path: "/docs/report.pdf"
collection: "docs"
embedding_model: "ollama:nomic-embed-text"

Use the same embedding model for ingestion and search. Mixing models produces meaningless similarity scores.

Advanced Usage

Multiple collections

Organize knowledge by domain, then search across them:

machine multi_domain_qa
accepts
question as text, is required
domains as list, default: ["engineering", "support", "legal"]
responds with
answer as text
sources as list
implements
ask eng_results, from: "@mashin/kits/rag/search"
query: input.question
collection: "engineering"
limit: 3
ask support_results, from: "@mashin/kits/rag/search"
query: input.question
collection: "support"
limit: 3
compute merge
let all = (steps.eng_results.results ?? []).concat(steps.support_results.results ?? [])
let sorted = all.sort((a, b) => b.score - a.score).slice(0, 5)
{results: sorted, count: sorted.length}
ask synthesize, using: "fast"
with role "Answer using only the provided sources. Cite with [1], [2], etc."
with task "Question: ${input.question}\n\nSources:\n${steps.merge.results.map((r, i) => '[' + (i+1) + '] ' + r.content).join('\n\n')}"
returns
answer as text
sources as list

Custom metadata

Tag documents during ingestion for filtered retrieval later:

ask ingest, from: "@mashin/kits/rag/ingest"
file_path: "/docs/q1-report.pdf"
collection: "reports"
tags: ["finance", "Q1", "2026"]

Governance

Every RAG kit machine declares its permissions in ensures. The ingest machine requires memory_write and embedding_request. The search machine requires memory_read and embedding_request. The ask machine adds llm_inference on top of search permissions.

These permissions are checked at runtime by the governance interpreter. A machine that calls @mashin/kits/rag/ask must have at least the same permissions, or the call is denied.

Architecture

User machine
|
ask ... from: "@mashin/kits/rag/ask"
|
+-- search (embed query -> vector similarity -> ranked results)
| |
| +-- @mashin/actions/rag/query (LibSQL vector search)
|
+-- synthesize (format context -> LLM -> answer with citations)
|
+-- LLM provider (Ollama / cloud)
Ingestion path:
ask ... from: "@mashin/kits/rag/ingest"
|
+-- @mashin/actions/rag/ingest
|
+-- Extractous (document extraction)
+-- Chunker (semantic / paragraph / fixed)
+-- Embedding API (Ollama / cloud)
+-- LibSQL (vector storage)

Next steps

  • Memory - The remember and recall primitives that RAG builds on
  • Composition - How ask ... from works across machine boundaries
  • Surfaces: MCP - Exposing machines as MCP tools
  • Local Intelligence - Running embedding and inference models locally