RAG Kit

此内容尚不支持你的语言。

mashin’s RAG kit lets you build knowledge bases that run entirely on your machine. LibSQL vectors, Ollama embeddings, Extractous document extraction. No external services required.

The kit lives at @mashin/kits/rag and provides six machines that cover the full RAG lifecycle: ingesting documents, searching them semantically, and answering questions with source citations.

Quick Start

Three steps to a working knowledge base:

1. Ingest a document

ask ingest, from: "@mashin/kits/rag/ingest"
  file_path: "/docs/handbook.pdf"
  collection: "company_docs"

2. Search it

ask search, from: "@mashin/kits/rag/search"
  query: "vacation policy"
  collection: "company_docs"

3. Ask a question

ask answer, from: "@mashin/kits/rag/ask"
  question: "How many vacation days do new employees get?"
  collection: "company_docs"

The ask machine retrieves relevant documents, feeds them to an LLM, and returns an answer with source citations.

Machines

Machine	Path	Purpose
`ingest`	`@mashin/kits/rag/ingest`	Ingest a single document (file or raw text)
`search`	`@mashin/kits/rag/search`	Semantic similarity search with threshold filtering
`ask`	`@mashin/kits/rag/ask`	Question answering with source citations
`ingest_folder`	`@mashin/kits/rag/ingest_folder`	Batch-ingest all documents in a directory
`watch_folder`	`@mashin/kits/rag/watch_folder`	Live sync: watch a folder and re-index on changes
`knowledge_base`	`@mashin/kits/rag/knowledge_base`	Unified interface for ingest, search, and ask via `action` parameter

Document Ingestion

The ingest machine takes a file path or raw text and runs it through a four-stage pipeline:

Extract — Pull text from the source file using Extractous (handles PDF, DOCX, HTML, and more)
Chunk — Split the text into overlapping segments
Embed — Generate vector embeddings for each chunk
Index — Store chunks and vectors in LibSQL for retrieval

Supported formats

PDF, DOCX, TXT, Markdown, HTML. Extractous handles the parsing; you pass a file path and get text back.

Inputs

ask ingest, from: "@mashin/kits/rag/ingest"
  file_path: "/docs/report.pdf"
  collection: "legal_docs"
  chunk_size: 1000
  chunk_overlap: 200
  chunking_strategy: "semantic"
  tags: ["legal", "2026"]

Input	Type	Default	Description
`file_path`	text	—	Path to the document file
`text`	text	—	Raw text to ingest (alternative to file_path)
`collection`	text	`"documents"`	Target collection name
`chunk_size`	number	`1000`	Maximum characters per chunk
`chunk_overlap`	number	`200`	Overlap between consecutive chunks
`chunking_strategy`	text	`"semantic"`	How to split text: `"semantic"`, `"paragraph"`, or `"fixed"`
`embedding_model`	text	—	Override the default embedding model
`tags`	list	`[]`	Metadata tags attached to each chunk

You must provide either file_path or text. If both are given, file_path takes precedence.

Chunking strategies

semantic (default): Splits on sentence boundaries and topic shifts. Produces chunks that preserve meaning. Best for most documents.
paragraph: Splits on paragraph breaks. Good for well-structured documents like articles and reports.
fixed: Splits at exact character boundaries. Fastest, but can cut mid-sentence. Use when chunk alignment matters more than readability.

Outputs

Output	Type	Description
`document_id`	text	Unique identifier for the ingested document
`chunks_indexed`	number	How many chunks were created and indexed
`collection`	text	The collection the document was added to
`elapsed_ms`	number	Processing time in milliseconds

Batch ingestion

Use ingest_folder to process an entire directory:

ask ingest_all, from: "@mashin/kits/rag/ingest_folder"
  folder: "/docs/legal"
  collection: "legal_docs"
  glob: "**/*.{pdf,md,docx}"
  chunk_size: 1000

It walks the folder, matches files against the glob pattern, and ingests each one. The default glob is **/*.{pdf,txt,md,docx,html}.

Output	Type	Description
`files_processed`	number	Total files found matching the glob
`files_indexed`	number	Files successfully ingested
`total_chunks`	number	Total chunks across all files
`results`	list	Per-file status (file path, chunk count, errors)

Searching

The search machine embeds your query, runs vector similarity search against the collection, and returns ranked results.

ask find, from: "@mashin/kits/rag/search"
  query: "password reset process"
  collection: "help_docs"
  limit: 5
  threshold: 0.7

How it works

The query text is embedded using the same model that embedded the documents
Vector similarity (cosine distance) finds the closest chunks
Results below the threshold score are filtered out
Remaining results are ranked by score and returned

Inputs

Input	Type	Default	Description
`query`	text	required	The search query
`collection`	text	`"documents"`	Which collection to search
`limit`	number	`5`	Maximum results to return
`threshold`	number	`0.7`	Minimum similarity score (0.0 to 1.0)

Outputs

Output	Type	Description
`results`	list	Matching chunks, each with `content`, `score`, and `source`
`count`	number	Number of results returned
`query`	text	The original query (echoed back)

Collection namespacing

Collections are isolated namespaces. Documents in "legal_docs" are never returned when searching "help_docs". Use collections to separate knowledge domains:

// Separate collections for different document types
ask find_legal, from: "@mashin/kits/rag/search"
  query: input.question
  collection: "contracts"

ask find_support, from: "@mashin/kits/rag/search"
  query: input.question
  collection: "support_articles"

Question Answering

The ask machine combines search and LLM synthesis. It retrieves relevant documents, formats them as context, and asks an LLM to answer the question with citations.

ask answer, from: "@mashin/kits/rag/ask"
  question: "What is our refund policy?"
  collection: "support_docs"
  limit: 5
  model: "fast"

How it works

Search: Calls @mashin/kits/rag/search to find relevant chunks
Format: Builds a numbered context string from results ([1] source: content)
Synthesize: Passes the context and question to an LLM with instructions to cite sources
Return: The answer text plus the list of cited sources

The LLM is instructed to answer only from the provided documents. If the documents don’t contain enough information, it says so rather than fabricating an answer.

Inputs

Input	Type	Default	Description
`question`	text	required	The question to answer
`collection`	text	`"documents"`	Collection to search
`limit`	number	`5`	How many chunks to retrieve as context
`threshold`	number	`0.7`	Minimum similarity for retrieved chunks
`model`	text	`"fast"`	Which model tier to use for synthesis

Outputs

Output	Type	Description
`answer`	text	The synthesized answer with inline citations
`sources`	list	Source identifiers cited in the answer
`source_count`	number	Number of sources cited

Model selection

The model parameter accepts any mashin model specifier:

// Use the fast tier (default; picks local model if available)
ask answer, from: "@mashin/kits/rag/ask"
  question: input.question
  model: "fast"

// Use a specific local model
ask answer, from: "@mashin/kits/rag/ask"
  question: input.question
  model: "ollama:qwen3:8b"

// Use a cloud model for higher quality
ask answer, from: "@mashin/kits/rag/ask"
  question: input.question
  model: "anthropic:claude-sonnet-4-6"

Folder Watching

The watch_folder machine monitors a directory and keeps the knowledge base in sync. When files are added or modified, they get re-indexed automatically.

ask watch, from: "@mashin/kits/rag/watch_folder"
  folder: "/docs/wiki"
  collection: "wiki"
  poll_interval: 30
  glob: "**/*.{md,txt}"

Input	Type	Default	Description
`folder`	text	required	Directory to watch
`collection`	text	`"documents"`	Target collection
`glob`	text	`"*/.{pdf,txt,md,docx,html}"`	File pattern to match
`poll_interval`	number	`30`	Seconds between scans
`chunk_size`	number	`1000`	Characters per chunk
`embedding_model`	text	—	Override embedding model

The machine tracks sync state in memory, so it won’t re-index unchanged files. It has has agency, meaning it can operate autonomously over time.

MCP Integration

The knowledge_base machine exposes itself as an MCP tool, so you can use it directly from Claude Desktop (or any MCP client).

The machine has an expresses > mcp section that registers it as a callable tool:

expresses
  mcp
    name: "knowledge_base"
    description: "A complete knowledge base. Actions: 'ask', 'search', 'ingest'."

Claude Desktop configuration

Add this to your Claude Desktop MCP config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "mashin": {
      "command": "mashin",
      "args": ["serve", "--mcp"],
      "env": {}
    }
  }
}

Once connected, Claude can call knowledge_base as a tool. You say “search my docs for deployment instructions” and Claude calls the machine with action: "search", gets results, and responds with what it found.

The knowledge_base machine routes by action parameter:

Action	What it does
`"ask"`	Searches and synthesizes an answer with citations
`"search"`	Returns raw search results
`"ingest"`	Ingests a document from `file_path`, `folder`, or `text`

Embedding Models

The default embedding model depends on your environment. If Ollama is running, it uses a local model. Otherwise, it falls back to a cloud embedding API.

Provider	Model	Dimensions	Local
Ollama	`nomic-embed-text`	768	Yes
Ollama	`mxbai-embed-large`	1024	Yes
OpenAI	`text-embedding-3-small`	1536	No
OpenAI	`text-embedding-3-large`	3072	No
Voyage	`voyage-3`	1024	No
Cohere	`embed-english-v3.0`	1024	No

To use a specific model, pass embedding_model to the ingest machine:

ask ingest, from: "@mashin/kits/rag/ingest"
  file_path: "/docs/report.pdf"
  collection: "docs"
  embedding_model: "ollama:nomic-embed-text"

Use the same embedding model for ingestion and search. Mixing models produces meaningless similarity scores.

Advanced Usage

Multiple collections

Organize knowledge by domain, then search across them:

machine multi_domain_qa

  accepts
    question as text, is required
    domains as list, default: ["engineering", "support", "legal"]

  responds with
    answer as text
    sources as list

  implements
    ask eng_results, from: "@mashin/kits/rag/search"
      query: input.question
      collection: "engineering"
      limit: 3

    ask support_results, from: "@mashin/kits/rag/search"
      query: input.question
      collection: "support"
      limit: 3

    compute merge
      let all = (steps.eng_results.results ?? []).concat(steps.support_results.results ?? [])
      let sorted = all.sort((a, b) => b.score - a.score).slice(0, 5)
      {results: sorted, count: sorted.length}

    ask synthesize, using: "fast"
      with role "Answer using only the provided sources. Cite with [1], [2], etc."
      with task "Question: ${input.question}\n\nSources:\n${steps.merge.results.map((r, i) => '[' + (i+1) + '] ' + r.content).join('\n\n')}"
      returns
        answer as text
        sources as list

Custom metadata

Tag documents during ingestion for filtered retrieval later:

ask ingest, from: "@mashin/kits/rag/ingest"
  file_path: "/docs/q1-report.pdf"
  collection: "reports"
  tags: ["finance", "Q1", "2026"]

Governance

Every RAG kit machine declares its permissions in ensures. The ingest machine requires memory_write and embedding_request. The search machine requires memory_read and embedding_request. The ask machine adds llm_inference on top of search permissions.

These permissions are checked at runtime by the governance interpreter. A machine that calls @mashin/kits/rag/ask must have at least the same permissions, or the call is denied.

Architecture

User machine
  |
  ask ... from: "@mashin/kits/rag/ask"
  |
  +-- search (embed query -> vector similarity -> ranked results)
  |     |
  |     +-- @mashin/actions/rag/query (LibSQL vector search)
  |
  +-- synthesize (format context -> LLM -> answer with citations)
        |
        +-- LLM provider (Ollama / cloud)

Ingestion path:
  ask ... from: "@mashin/kits/rag/ingest"
  |
  +-- @mashin/actions/rag/ingest
        |
        +-- Extractous (document extraction)
        +-- Chunker (semantic / paragraph / fixed)
        +-- Embedding API (Ollama / cloud)
        +-- LibSQL (vector storage)

Next steps

Memory - The remember and recall primitives that RAG builds on
Composition - How ask ... from works across machine boundaries
Surfaces: MCP - Exposing machines as MCP tools
Local Intelligence - Running embedding and inference models locally