vectors

Vector search capability on a store resource. Adds semantic search to a resource by configuring an embedding model, chunking strategy, and vector backend. The resource’s fields automatically become payload indexes in the vector database, so you can combine semantic similarity search with structured metadata filtering.

When to use

Use vectors when you need to:

Search over unstructured text using semantic similarity (e.g., “find documents about contract termination clauses”)
Build retrieval-augmented generation (RAG) pipelines
Combine structured queries (filter by category, date) with semantic search (find similar content)
Enable recall steps to retrieve relevant context from a knowledge base

If you only need structured data queries (SQL-style), use read actions without vectors. If you only need vector search without any structured fields, consider a store with source: qdrant.

Syntax

resource <resource_name>
  vectors
    source: qdrant
    embedding_model: "<model_name>"
    chunking: <strategy>
    chunk_size: <number>
    chunk_overlap: <number>

Parameters

Parameter	Required	Description
`source`	Yes	Vector backend. Currently `qdrant` is the supported option.
`embedding_model`	Yes	The model used to generate embeddings. Examples: `"nomic-embed-text"` (local via Ollama), `"text-embedding-3-small"` (OpenAI), `"voyage-3"` (Voyage AI).
`chunking`	No	How text is split into chunks before embedding. One of: `semantic`, `paragraph`, `fixed`. Default: `semantic`.
`chunk_size`	No	Target size of each chunk in tokens. Default varies by strategy.
`chunk_overlap`	No	Number of overlapping tokens between adjacent chunks. Provides context continuity. Default: 200.

Chunking strategies

Strategy	Description	Best for
`semantic`	Splits on semantic boundaries (topic shifts, section breaks). Uses an LLM or heuristic to find natural break points.	Long-form documents, articles, reports
`paragraph`	Splits on paragraph boundaries (double newlines).	Well-structured text with clear paragraphs
`fixed`	Splits at a fixed token count.	Uniform chunk sizes, code, logs

Examples

Resource with vector search

machine knowledge_base

stores
  store docs
    source: managed

    resource document
      id as uuid, is primary_key
      title as text, is required
      content as text, is required
      category as text
      timestamps

      create add_document
        accept: [title, content, category]

      read by_category
        argument category as text, is required
        filter: category == arg(category)

      vectors
        source: qdrant
        embedding_model: "nomic-embed-text"
        chunking: semantic
        chunk_size: 1000
        chunk_overlap: 200

When a document is created, its content field is automatically chunked, embedded, and stored in Qdrant. The title, category, and other fields become payload attributes in the vector index, enabling filtered similarity search.

RAG pipeline using vectors

machine contract_search

stores
  store legal
    source: managed

    resource contract
      id as uuid, is primary_key
      department as text, is required
      classification as text, is required
      body as text, is required
      expiry_date as date
      timestamps

      create ingest
        accept: [department, classification, body, expiry_date]

      vectors
        source: qdrant
        embedding_model: "text-embedding-3-small"
        chunking: paragraph
        chunk_size: 500
        chunk_overlap: 100

accepts
  query as text, is required
  department as text

responds with
  answer as text
  sources as list

implements
  recall find_relevant
    collection: "legal-contracts"
    query: input.query
    filter: {department: input.department}
    limit: 5

  ask answer, using: "anthropic:claude-sonnet-4-6"
    with task "Answer using ONLY these sources. Cite each claim. Sources: ${steps.find_relevant.results} Question: ${input.query}"
    returns
      answer as text
      sources as list

Local embeddings for offline use

resource memo
  id as uuid, is primary_key
  subject as text, is required
  body as text, is required
  timestamps

  vectors
    source: qdrant
    embedding_model: "nomic-embed-text"
    chunking: fixed
    chunk_size: 512

Using nomic-embed-text with a local Ollama instance means embeddings are generated on-device. No API keys, no network calls, works offline. On desktop, the Mashin app bundles Ollama and Qdrant as sidecars.

Governance

Vector operations are governed effects:

Embedding generation is a memory capability. If the embedding model is cloud-hosted (OpenAI, Voyage), it involves a network call governed by the interpreter.
Vector storage (inserting embeddings) is a memory capability recorded in the behavioral ledger.
Similarity search via recall steps is a governed memory capability with full provenance: every retrieval records which chunks were returned, their similarity scores, and which document they came from.
Provenance chain: document hash, chunk hash, embedding model, retrieval query, and answer are all linked in the behavioral ledger. This is the foundation for auditable RAG.

Local embedding models (via Ollama) do not require network governance but are still recorded in the ledger.

Translations

Language	Keyword
English	vectors
Spanish	vectores
French	vecteurs
German	Vektoren
Japanese	ベクトル
Chinese	向量
Korean	벡터

vectors

vectors

When to use

Syntax

Parameters

Chunking strategies

Examples

Resource with vector search

RAG pipeline using vectors

Local embeddings for offline use

Governance

Translations

See also