Skip to content

vectors

vectors

Vector search capability on a store resource. Adds semantic search to a resource by configuring an embedding model, chunking strategy, and vector backend. The resource’s fields automatically become payload indexes in the vector database, so you can combine semantic similarity search with structured metadata filtering.

When to use

Use vectors when you need to:

  • Search over unstructured text using semantic similarity (e.g., “find documents about contract termination clauses”)
  • Build retrieval-augmented generation (RAG) pipelines
  • Combine structured queries (filter by category, date) with semantic search (find similar content)
  • Enable recall steps to retrieve relevant context from a knowledge base

If you only need structured data queries (SQL-style), use read actions without vectors. If you only need vector search without any structured fields, consider a store with source: qdrant.

Syntax

resource <resource_name>
vectors
source: qdrant
embedding_model: "<model_name>"
chunking: <strategy>
chunk_size: <number>
chunk_overlap: <number>

Parameters

ParameterRequiredDescription
sourceYesVector backend. Currently qdrant is the supported option.
embedding_modelYesThe model used to generate embeddings. Examples: "nomic-embed-text" (local via Ollama), "text-embedding-3-small" (OpenAI), "voyage-3" (Voyage AI).
chunkingNoHow text is split into chunks before embedding. One of: semantic, paragraph, fixed. Default: semantic.
chunk_sizeNoTarget size of each chunk in tokens. Default varies by strategy.
chunk_overlapNoNumber of overlapping tokens between adjacent chunks. Provides context continuity. Default: 200.

Chunking strategies

StrategyDescriptionBest for
semanticSplits on semantic boundaries (topic shifts, section breaks). Uses an LLM or heuristic to find natural break points.Long-form documents, articles, reports
paragraphSplits on paragraph boundaries (double newlines).Well-structured text with clear paragraphs
fixedSplits at a fixed token count.Uniform chunk sizes, code, logs

Examples

machine knowledge_base
stores
store docs
source: managed
resource document
id as uuid, is primary_key
title as text, is required
content as text, is required
category as text
timestamps
create add_document
accept: [title, content, category]
read by_category
argument category as text, is required
filter: category == arg(category)
vectors
source: qdrant
embedding_model: "nomic-embed-text"
chunking: semantic
chunk_size: 1000
chunk_overlap: 200

When a document is created, its content field is automatically chunked, embedded, and stored in Qdrant. The title, category, and other fields become payload attributes in the vector index, enabling filtered similarity search.

RAG pipeline using vectors

machine contract_search
stores
store legal
source: managed
resource contract
id as uuid, is primary_key
department as text, is required
classification as text, is required
body as text, is required
expiry_date as date
timestamps
create ingest
accept: [department, classification, body, expiry_date]
vectors
source: qdrant
embedding_model: "text-embedding-3-small"
chunking: paragraph
chunk_size: 500
chunk_overlap: 100
accepts
query as text, is required
department as text
responds with
answer as text
sources as list
implements
recall find_relevant
collection: "legal-contracts"
query: input.query
filter: {department: input.department}
limit: 5
ask answer, using: "anthropic:claude-sonnet-4-6"
with task "Answer using ONLY these sources. Cite each claim. Sources: ${steps.find_relevant.results} Question: ${input.query}"
returns
answer as text
sources as list

Local embeddings for offline use

resource memo
id as uuid, is primary_key
subject as text, is required
body as text, is required
timestamps
vectors
source: qdrant
embedding_model: "nomic-embed-text"
chunking: fixed
chunk_size: 512

Using nomic-embed-text with a local Ollama instance means embeddings are generated on-device. No API keys, no network calls, works offline. On desktop, the Mashin app bundles Ollama and Qdrant as sidecars.

Governance

Vector operations are governed effects:

  • Embedding generation is a memory capability. If the embedding model is cloud-hosted (OpenAI, Voyage), it involves a network call governed by the interpreter.
  • Vector storage (inserting embeddings) is a memory capability recorded in the behavioral ledger.
  • Similarity search via recall steps is a governed memory capability with full provenance: every retrieval records which chunks were returned, their similarity scores, and which document they came from.
  • Provenance chain: document hash, chunk hash, embedding model, retrieval query, and answer are all linked in the behavioral ledger. This is the foundation for auditable RAG.

Local embedding models (via Ollama) do not require network governance but are still recorded in the ledger.

Translations

LanguageKeyword
Englishvectors
Spanishvectores
Frenchvecteurs
GermanVektoren
Japaneseベクトル
Chinese向量
Korean벡터

See also

  • resource - Resource declarations
  • recall - Step type for semantic retrieval
  • remember - Step type for storing data in memory
  • store - Store block and source types