Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Shire

                       .,:lccc:,.
                  .,codxkkOOOOkkxdoc,.
              .;ldkkOOOOOOOOOOOOOOOkkdl;.
           .:oxOOkxdollccccccccllodxkOOkxo:.
         ,lkOOxl;..                ..,lxOOkl,
       .ckOOd:.                        .:dOOkc.
      ;xOOo,          .,clllc,.          ,oOOx;
     lOOk;         .:dkOOOOOOkd:.         ;kOOl
    oOOx,        .ckOOOOOOOOOOOOkc.        ,xOOo
   lOOk,        ;xOOOkdl:;;:ldkOOOx;        ,kOOl
  ;OOO;        lOOOd;.        .;dOOOl        ;OOO;
  dOOd        :OOOl              lOOO:        dOOd
  kOOl        oOOx      .;;.     xOOo        lOOk
  kOOl        oOOx     .xOOx.    xOOo        lOOk
  dOOd        :OOOl    .oOOo.   lOOO:        dOOd
  ;OOO;        lOOOd;.  .,,. .;dOOOl        ;OOO;
   lOOk,        ;xOOOkdl:,:ldkOOOx;        ,kOOl
    oOOx,        .ckOOOOOOOOOOOOkc.        ,xOOo
     lOOk;         .:dkOOOOOOkd:.         ;kOOl
      ;xOOo,          .,clllc,.          ,oOOx;
       .ckOOd:.                        .:dOOkc.
         ,lkOOxl;..                ..,lxOOkl,
           .:oxOOkxdollccccccccllodxkOOkxo:.
              .;ldkkOOOOOOOOOOOOOOOkkdl;.
                  .,codxkkOOOOkkxdoc,.
                       .,:lccc:,.

One index to rule them all.

Search, Hierarchy, Index, Repo Explorer — a monorepo package indexer that builds a dependency graph in SQLite and serves it over Model Context Protocol.

Point it at a monorepo. It discovers every package, maps their dependency relationships, extracts symbols from source code, and gives your AI tools structured access to the result.

Get started in 30 seconds

brew install justinjdev/shire/shire
shire init --global
shire build

That’s it. Claude Code can now search your packages, symbols, files, and dependency graph. See Setup for details.

Installation

Homebrew (macOS, Linux)

brew tap justinjdev/shire
brew install shire

From prebuilt binary

Download the latest release from GitHub Releases and add the binary to your PATH.

Nix

# Install into your profile
nix profile install github:justinjdev/shire

# Or run without installing
nix run github:justinjdev/shire

From source

Requires Rust toolchain.

cargo install --path .

# With RAG vector search support (~30-50MB larger binary due to ONNX Runtime):
cargo install --path . --features rag

Setup

Claude Code

One command configures shire globally for all projects:

shire init --global

This creates:

  • ~/.claude/shire.toml — shared config with db_path = "~/.claude/shire/{repo}/{worktree}/index.db" (auto-namespaced per repo and worktree)
  • mcpServers.shire in ~/.claude.json — serves the index via shire serve
  • PostToolUse hook in ~/.claude/settings.json — auto-rebuilds the index after file edits (Edit, Write, NotebookEdit, Bash)
  • ~/.claude/rules/shire.md — rules file guiding Claude Code to prefer Shire tools

The {repo} placeholder is replaced with the repository directory name at runtime, and {worktree} with the worktree name (or _primary for the main checkout), so each repo and worktree gets its own index file automatically.

After running shire init --global, open any repo and run:

shire build

The index is ready. Claude Code will automatically use it via the MCP server.

Rules file

shire init creates ~/.claude/rules/shire.md with guidance on when to use Shire tools vs Grep/Glob. This helps Claude Code default to Shire for codebase searches, so you spend fewer tool calls on broad exploration.

The file is only written once — if it already exists, shire init leaves it untouched, so your customizations are preserved.

CLAUDE.md integration

During interactive setup, shire init prompts:

Add Shire search guidance to ~/.claude/CLAUDE.md?

If accepted, it appends a one-liner to ~/.claude/CLAUDE.md directing Claude Code to prefer Shire MCP tools over Grep/Glob for code search. The line is idempotent — running init again won’t duplicate it. If ~/.claude/CLAUDE.md doesn’t exist yet, it creates the file.

Terminal output

shire init uses styled terminal output to show what it does:

  • (green) — a file or config entry was created or updated
  • (dimmed) — a file or config entry already exists, skipped
  • Section headers appear in cyan

Most file writes (.gitignore, CLAUDE.md, settings.json, .mcp.json, ~/.claude.json) use atomic writes — content is written to a temporary file first, then renamed into place. This prevents partial writes if the process is interrupted.

Project-level setup

To create a shire.toml in the current repo instead of globally:

shire init

This generates a local config file with commented-out defaults you can customize, and writes the MCP server config to .mcp.json. If the db_path points to a local directory (e.g., .shire/index.db), it offers to add that directory to .gitignore.

Manual setup

If you prefer manual configuration, add to ~/.claude.json (global) or .mcp.json (project-level):

{
  "mcpServers": {
    "shire": {
      "command": "shire",
      "args": ["serve"]
    }
  }
}

To keep the index fresh during a session, add a PostToolUse hook to ~/.claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|NotebookEdit|Bash",
        "hooks": [{ "type": "command", "command": "shire rebuild --stdin" }]
      }
    ]
  }
}

Claude Desktop

Add Shire to your claude_desktop_config.json:

{
  "mcpServers": {
    "shire": {
      "command": "shire",
      "args": ["serve", "--db", "/path/to/repo/.shire/index.db"]
    }
  }
}

Other MCP clients

Shire speaks standard MCP over stdio. Any client that supports MCP can connect:

shire serve --db /path/to/repo/.shire/index.db

Use --root to enable on-demand reindexing (the server checks .git/index mtime for staleness):

shire serve --root /path/to/repo

CLI reference

Build an index

shire build --root /path/to/repo

Rebuild from scratch

Ignore cached hashes and re-parse everything:

shire build --root /path/to/repo --force

Custom database location

shire build --root /path/to/repo --db /tmp/my-index.db

The index defaults to .shire/index.db inside the repo root. Override with --db or db_path in shire.toml (see Configuration).

Clean up

Remove the index database, WAL/SHM files, the .shire directory, and stop the watch daemon:

shire clean

Incremental builds

Subsequent builds are incremental — only manifests whose content has changed (by SHA-256 hash) are re-parsed. Source files are tracked at per-file granularity: if individual source files change without a manifest change, only those files have their symbols re-extracted. An mtime pre-check skips hash computation entirely for packages whose source files haven’t been touched since the last build.

File indexing is also incremental — a file-tree hash detects structural changes, skipping the file indexing phase entirely when no files have been added, removed, or resized.

Symbol extraction and source hashing are parallelized across packages and within packages using rayon for multi-core throughput. Files are read once per build (single-pass hash + extraction). All database writes use batched multi-row INSERTs within explicit transactions, with FTS5 triggers temporarily disabled during bulk operations for maximum SQLite throughput.

Build progress

shire build shows real-time progress for each build phase:

  • Spinners for quick phases (discovering manifests, workspace context, recomputing internals, indexing files)
  • Progress bars with ETAs for longer phases (parsing manifests, extracting symbols)
  • When RAG is enabled, an embedding progress bar tracks file embedding in the background, displaying error messages on failure

Progress bars persist after completion so you can see the full build history in your terminal. Quiet mode (used internally by the MCP server for on-demand rebuilds) hides all progress output.

Configuration

Drop a shire.toml in the repo root to customize behavior:

# Custom database location (default: .shire/index.db)
db_path = "/path/to/custom/index.db"

[discovery]
manifests = ["package.json", "go.mod", "go.work", "Cargo.toml", "pyproject.toml", "pom.xml", "build.gradle", "build.gradle.kts", "settings.gradle", "settings.gradle.kts", "cpanfile", "Gemfile"]
exclude = ["node_modules", "vendor", "dist", ".build", "target", "third_party", ".shire", ".gradle", "build"]

# Symbol extraction
[symbols]
exclude_extensions = [".proto", ".pl"]
references_enabled = false  # EXPERIMENTAL, default false — see below

# Documentation indexing
[docs]
extensions = [".md", ".rst", ".txt", ".adoc"]
max_file_size = 262144  # 256 KB — files larger than this are truncated

# Override package descriptions
[[packages]]
name = "legacy-auth"
description = "Deprecated auth service — do not add new dependencies"

Watch daemon

[watch]
debounce_ms = 2000  # milliseconds to wait after last change before rebuilding

Logging

[log]
level = "warn"          # error, warn, info, debug, trace
dir = ".shire/logs"     # log directory (relative to repo root). Set to "" to disable file logging
max_days = 30           # automatically delete log files older than this

The SHIRE_LOG environment variable overrides the config level (e.g., SHIRE_LOG=debug shire build). Log files are daily-rotated with filenames like shire.log.2026-03-26. Each session includes a unique session ID for correlation across concurrent processes.

All fields are optional. Defaults are shown above. The --db CLI flag takes precedence over db_path in config.

Cross-reference index (experimental)

symbols.references_enabled (default false) populates the symbol_refs table so the symbol_references, symbol_callers, and symbol_callees MCP tools can answer “where is this used?” / “who calls this?” questions. Reference extraction is supported for 8 tier-1 languages: Go, Python, Java, TypeScript, JavaScript, Perl, Ruby, Scala.

Opt-in: shire init asks whether to enable this (prompt labelled experimental), and writes references_enabled = true to shire.toml when you say yes. You can also add it manually:

[symbols]
references_enabled = true

Cost: DB grows substantially — roughly +30% on TS/JS repos to +150% on Go-heavy repos (benchmarks on shire-bench: turborepo +29%, grafana +152%, kubernetes +104% vs main baseline). Build time grows ~5-7%.

Toggling the flag takes effect on the next build. Disabling wipes symbol_refs at the start of the build; re-enabling repopulates it on the next full rebuild (shire build --force).

This feature is marked experimental: its schema and coverage may change in minor versions as language support broadens and edge cases surface.

Custom package discovery

For codebases where packages aren’t defined by standard manifest files — Go single-module monorepos, repos that use ownership.yml + build files, or any non-standard convention — you can define custom discovery rules:

# Discover Go apps: directories containing both main.go and ownership.yml
[[discovery.custom]]
name = "go-apps"
kind = "go"
requires = ["main.go", "ownership.yml"]
paths = ["services/", "cmd/"]
exclude = ["testdata", "examples"]
max_depth = 3
name_prefix = "go:"

# Discover proto packages: directories containing *.proto and buf.yaml
[[discovery.custom]]
name = "proto-packages"
kind = "proto"
requires = ["*.proto", "buf.yaml"]
paths = ["proto/", "services/"]
max_depth = 4
FieldRequiredDescription
nameyesRule identifier
kindyesPackage kind for symbol extraction (go, proto, npm, etc.)
requiresyesFile patterns that must ALL exist in a directory (supports globs like *.proto)
pathsnoLimit search to specific subtrees (default: repo root)
excludenoRule-specific directory exclusions (on top of global excludes)
max_depthnoMaximum depth to search from each paths entry
name_prefixnoPrefix prepended to directory-derived package name (e.g., go:services/auth)
extensionsnoOverride which file extensions get symbol extraction

Custom discovery runs alongside manifest-based discovery. Directories already found by manifest parsers are skipped. Subdirectories of matched directories are also skipped to prevent nested matches.

RAG adds semantic vector search to search_symbols. It requires compiling with the rag feature flag and enabling it in config.

Build with RAG support:

cargo install --path . --features rag

Enable in shire.toml:

[rag]
enabled = true
# model = "BAAI/bge-small-en-v1.5"   # default, only supported model currently
# cache_dir = "~/.cache/shire-rag"    # optional, for model file storage

When enabled, shire build embeds all symbols after extraction. The first build downloads the model (~33MB) automatically. Subsequent builds are incremental — only changed packages get re-embedded.

RAG is non-fatal: if the model fails to load or embeddings fail, shire falls back to FTS-only search with a warning. If the rag feature is not compiled in, the [rag] config section is silently ignored.

MCP Tools & Prompts

Tools

Shire exposes the following tools over the Model Context Protocol:

ToolDescription
search_packagesSearch packages by name or description. Use instead of Grep for finding packages.
list_packagesList all indexed packages, optionally filtered by kind
package_dependenciesList a package’s dependencies. Set depth>1 for transitive graph (returns edge list with different schema).
package_dependentsFind all packages that depend on this package
search_symbolsFind functions, classes, types, methods by name or signature. Use instead of Grep for “where is function X?” or “what matches pattern Y?”. Omit query with a package filter to list all symbols in that package. Supports hybrid FTS + vector search when RAG is enabled.
get_file_symbolsList all symbols defined in a specific file. Use instead of reading the file to understand its exports.
search_filesFind files by path or name. Use instead of Glob/find for locating files. Useful for “middleware”, “proto files”, or files in a specific directory.
search_docsSearch documentation files by content, title, or path — returns matching docs with text snippets
list_package_filesList all files in a package, optionally filtered by extension. Use instead of Glob for listing package contents.
exploreExplore a concept across the codebase — searches packages, symbols, files, and documentation semantically. Use as the first tool when investigating unfamiliar code or broad topics like “authentication” or “error handling”. Returns a structured context map organized by package.
index_statusIndex build metadata: timestamp, git commit, counts
symbol_referencesFind all references to a symbol by name. Returns [{name, kind, file_path, line, package, enclosing_symbol}]. Accepts optional kind and package filters. Requires symbols.references_enabled = true (experimental, opt-in). Note: matching is name-based — same-name symbols across different packages are merged.
symbol_callersList all callers of a symbol (call-site references). Returns [{caller_name, caller_file, caller_line, call_sites}]. Accepts optional package filter. Requires symbols.references_enabled = true. Same name-based-match caveat as symbol_references.
symbol_calleesList what a function calls (outbound call graph). Returns [{callee_name, call_sites}]. Accepts optional package filter. Requires symbols.references_enabled = true. Same name-based-match caveat as symbol_references.
change_impactAnalyze the blast radius of changing a symbol. Combines cross-references with the dependency graph to return {direct_impact, cross_package_impact, transitive_impact, summary}. Use before renaming, changing a signature, or deleting a symbol. Accepts optional package (home package hint, for disambiguation), transitive_depth (default 2), and limit. Requires symbols.references_enabled = true. Same name-based-match caveat as symbol_references.
schema_consumersFind all files generated from a schema file (e.g. .proto). Returns generated file paths and their packages. Use to understand the blast radius of a schema change.
generated_fromFind the source schema file that generated a given file. Use to trace a generated file (e.g. user.pb.go) back to its source proto.

When to use Shire vs Grep/Glob

TaskUseNot
Find a function, class, or type by namesearch_symbolsGrep
Find a file by name or pathsearch_filesGlob / find
List files in a packagelist_package_filesGlob
Find a packagesearch_packagesGrep
Explore an unfamiliar areaexploremultiple Grep calls
Search for a literal string or log messageGrepShire
Search inside function bodiesGrepShire
Pattern match on file contentsGrepShire

Prompts

Prompts are pre-built templates that compose multiple queries into structured context. They give your AI a map of where concepts live in the codebase.

PromptArgsDescription
explorequerySearch packages, symbols, files, and documentation for a concept — returns a structured context map organized by package
reference_auditnameGuides refactor-safety analysis for a symbol: classifies refs by kind, traces the call graph via symbol_callers, identifies cross-package impact, and assesses rename/change risk. Requires symbols.references_enabled = true (experimental).

Watch Daemon

shire watch starts a background daemon that auto-rebuilds the index when files change. It uses Unix domain socket IPC with configurable debounce (default 2s).

Start the daemon

Idempotent — safe to call multiple times:

shire watch --root /path/to/repo

Signal a rebuild manually

shire rebuild --root /path/to/repo

Signal a rebuild from a Claude Code hook

Reads JSON from stdin, uses cwd as repo root:

shire rebuild --stdin

Stop the daemon

shire watch --root /path/to/repo --stop

Smart filtering

The watch daemon avoids unnecessary rebuilds:

  • Edit/Write tools — checks file extension relevance and repo boundary
  • Bash commands — filtered against a denylist of known read-only commands (ls, git status, cargo test, etc.) — unknown commands default to rebuild

Git Worktrees

Shire automatically detects git worktrees and maintains separate indexes for each one. This works out of the box with no configuration required.

How it works

When you run shire build or shire serve inside a linked worktree, shire:

  1. Detects the worktree by inspecting .git — a directory means primary working tree, a file means linked worktree
  2. Resolves a per-worktree DB path using the {worktree} placeholder in db_path
  3. Seeds from the primary worktree’s DB on first build, so you don’t start from scratch — only changed packages need reindexing

The primary worktree uses the reserved name _primary. Linked worktrees use Git’s stable worktree ID (the directory name under .git/worktrees/<id>).

Configuration

The default global config (generated by shire init --global) already includes worktree support:

db_path = "~/.claude/shire/{repo}/{worktree}/index.db"

This produces separate databases like:

~/.claude/shire/my-project/_primary/index.db
~/.claude/shire/my-project/feat-auth/index.db
~/.claude/shire/my-project/bugfix-123/index.db

Placeholders

PlaceholderDescription
{repo}Name of the main repository (directory basename)
{worktree}Worktree identifier — _primary for the main working tree, or Git’s worktree ID for linked worktrees

Shared vs separate databases

If your db_path does not include {worktree}, all worktrees share the same database. This is fine for read-only use but means concurrent builds from different worktrees will conflict.

Including {worktree} in the path gives each worktree its own database, which is the recommended setup.

DB seeding

When shire builds in a linked worktree for the first time and no database exists yet, it checks whether the primary worktree has an existing database. If so, it copies that database as a seed — giving you a fully populated index immediately. Only packages that differ between the worktrees need reindexing.

$ cd ~/worktrees/feat-auth
$ shire build
Seeded DB from /Users/you/.claude/shire/my-project/_primary/index.db
Building index...

Local config

A local shire.toml at the repo root can use a relative db_path (e.g., .shire/index.db). Since each worktree has its own root directory, relative paths naturally resolve to separate databases per worktree. Seeding still applies in this case.

Supported Ecosystems

ManifestKindWorkspace support
package.jsonnpmworkspace: protocol versions normalized
go.modgogo.work member metadata
go.workgouse directives parsed for workspace context
Cargo.tomlcargoworkspace = true deps resolved from root
pyproject.tomlpython
pom.xmlmavenParent POM inheritance (groupId, version)
build.gradle / build.gradle.ktsgradlesettings.gradle project inclusion
cpanfileperlrequires / on 'test' blocks
Gemfilerubygem / group :test blocks
flake.nixnixinputs attrset (dotted and block forms)

Symbol extraction

Shire extracts public symbols (functions, classes, types, methods, interfaces) from source files using tree-sitter, with full signatures, parameters, and return types.

LanguageExtractor
TypeScript / JavaScripttree-sitter
Gotree-sitter
Rusttree-sitter
Pythontree-sitter
Javatree-sitter
Kotlintree-sitter
Darttree-sitter
Protobuftree-sitter
Ctree-sitter
C++tree-sitter
C#tree-sitter
Swifttree-sitter
PHPtree-sitter
Scalatree-sitter
Zigtree-sitter
Bash / Shelltree-sitter
Rtree-sitter
Haskelltree-sitter
YAMLtree-sitter
SQLtree-sitter
HCL / Terraformtree-sitter
TOMLtree-sitter
Perltree-sitter
Rubytree-sitter
OCamltree-sitter
Luatree-sitter
Elixirtree-sitter
Clojuretree-sitter
Erlangtree-sitter
Juliatree-sitter
Gleamtree-sitter
Odintree-sitter
Nixtree-sitter
Nimtree-sitter
COBOLregex-based

Reference extraction

Shire extracts cross-references (calls, type references, imports, and interface implementations) for a subset of languages. These are stored in the symbol_refs table and exposed via the symbol_references, symbol_callers, and symbol_callees MCP tools.

LanguageCallTypeImportImpl
Goyesyesyes— (implicit interfaces)
Pythonyesyesyesyes
Javayesyesyesyes
TypeScriptyesyesyesyes
JavaScriptyesyesyes
Perlyesyes
Rubyyesyesyesyes
Scalayesyesyesyes

All other languages: symbol definitions only; references are not extracted.

Performance

Shire is designed to index large monorepos quickly and answer queries instantly. This page documents benchmark methodology, results, and how to reproduce them.

Test repos

Benchmarks run against three real-world open-source monorepos covering a range of sizes and ecosystems:

RepoSizePackagesSymbolsFilesPrimary languages
turboreposmall40010,6865,451Rust, TypeScript, Go
grafanamedium2835,10414,054Go, TypeScript
kuberneteslarge3478,45818,275Go

Build performance

Full rebuild (no incremental cache), median of 4 iterations after a warmup run:

RepoMedianMinP95Std dev
turborepo571ms525ms678ms58ms
grafana1,150ms1,025ms1,172ms58ms
kubernetes1,703ms1,607ms1,897ms108ms

Build time scales roughly linearly with symbol count. The pipeline is parallelized with rayon across packages and files, with batched multi-row SQLite inserts within explicit transactions.

Incremental builds are significantly faster – only packages with changed source files are re-extracted, and an mtime pre-check skips SHA-256 computation entirely for untouched packages.

Query performance

Median latency over 100 iterations per query:

QuerySmallMediumLarge
search_symbols("parse")0.09ms0.09ms0.04ms
search_symbols("Config")0.20ms0.29ms1.01ms
search_files("mod")0.05ms0.03ms0.04ms
search_files("test")0.07ms0.60ms1.99ms
list_packages(None)0.11ms0.01ms0.01ms

All queries use SQLite FTS5 full-text search with unicode61 tokenizer and prefix indexes. Query latency depends primarily on result set size, not total index size.

Reproducing benchmarks

Shire includes an autoresearch binary for reproducible benchmarking.

Setup

Run the benchmark repo setup script to clone and prepare the test repos:

scripts/setup-bench-repo.sh

This clones the three repos into ~/.cache/shire-bench/ and creates a shire.toml in each.

Running benchmarks

# Build the benchmark binary
cargo build --release --bin autoresearch

# Run build benchmarks (all repos)
cargo run --release --bin autoresearch -- --phase build

# Run query benchmarks (all repos)
cargo run --release --bin autoresearch -- --phase query

# Filter by repo size
cargo run --release --bin autoresearch -- --phase build --size small

# Point at a specific repo
cargo run --release --bin autoresearch -- --phase build --repo /path/to/repo

Build benchmarks run 5 iterations (1 warmup + 4 measured) per repo. Query benchmarks run 100 iterations per query. Results are printed as JSON to stdout.

Environment notes

  • Results vary by machine (CPU, disk speed, available memory)
  • Close other applications for more stable measurements
  • The warmup iteration primes filesystem caches and SQLite page cache
  • Numbers in this document were captured on an Apple M-series Mac

Architecture

src/
├── main.rs          # CLI (clap): build, serve, watch, rebuild, init, clean subcommands
├── lib.rs           # Library re-exports for embedding shire as a crate
├── config.rs        # shire.toml parsing
├── git.rs           # Git worktree detection and repo root resolution
├── init.rs          # `shire init` setup (config, MCP server, hooks, rules)
├── db/
│   ├── mod.rs       # SQLite schema, open/create
│   └── queries.rs   # FTS search, dependency graph BFS, listing
├── index/
│   ├── mod.rs       # Walk + incremental index orchestrator
│   ├── custom_discovery.rs # Config-driven custom package discovery
│   ├── manifest.rs  # ManifestParser trait
│   ├── hash.rs      # SHA-256 content hashing for incremental builds
│   ├── npm.rs       # package.json parser (workspace: protocol)
│   ├── go.rs        # go.mod parser
│   ├── go_work.rs   # go.work parser (workspace use directives)
│   ├── cargo.rs     # Cargo.toml parser (workspace dep resolution)
│   ├── python.rs    # pyproject.toml parser
│   ├── maven.rs     # pom.xml parser (parent POM inheritance)
│   ├── gradle.rs    # build.gradle / build.gradle.kts parser
│   ├── gradle_settings.rs # settings.gradle parser (project inclusion)
│   ├── perl.rs      # cpanfile parser (requires, on 'test')
│   └── ruby.rs      # Gemfile parser (gem, group blocks)
├── symbols/
│   ├── mod.rs       # Symbol types, kind-agnostic extraction orchestrator
│   ├── walker.rs    # Source file discovery (extension filtering, excludes)
│   ├── registry.rs  # Language registry: maps extensions to tree-sitter grammars + hooks
│   ├── query_extract.rs # Generic tree-sitter query executor with hook callbacks
│   ├── queries/     # Tree-sitter .scm query files (one per language)
│   ├── hooks/       # Language-specific hooks (visibility, signatures, params, post-processing)
│   ├── elixir.rs    # Elixir extractor (regex-based)
│   └── cobol.rs     # COBOL extractor (regex-based)
│                    # Cross-reference extraction (call, type, import, impl) is supported
│                    # for 8 tier-1 languages: Go, Python, Java, TypeScript, JavaScript,
│                    # Perl, Ruby, Scala. References are captured via @reference.* captures
│                    # in the language's .scm query and written to the symbol_refs table.
│                    # Coverage is asymmetric per language: JavaScript omits Type refs
│                    # (no type system), and Go/Perl omit Impl refs (no extends/implements).
├── rag/             # Optional RAG vector search (behind `rag` feature flag)
│   ├── mod.rs       # Feature-gated module root
│   ├── embedder.rs  # fastembed wrapper, file-level text formatting, batch embedding (64 files/batch)
│   └── storage.rs   # sqlite-vec extension, vec0 table, vector CRUD, KNN search
├── mcp/
│   ├── mod.rs       # MCP server setup (rmcp, stdio transport)
│   ├── tools.rs     # 11 tool handlers (+ hybrid search when RAG enabled)
│   └── prompts.rs   # explore prompt template for semantic codebase exploration
└── watch/
    ├── mod.rs       # Daemon event loop (UDS listener, debounce, rebuild)
    ├── daemon.rs    # Process management (start/stop/is_running via PID)
    └── protocol.rs  # Hook input parsing, Bash read-only denylist

symbol_refs table

The symbol_refs table stores cross-reference records extracted alongside symbol definitions. Each row captures a reference to a named symbol:

ColumnTypeDescription
nameTEXTThe name being referenced (function, type, module, etc.)
kindTEXTOne of: call, type, import, impl
file_pathTEXTSource file containing the reference
lineINTEGERLine number of the reference
packageTEXTPackage the referencing file belongs to (nullable)
enclosing_symbolTEXTNearest enclosing function or method (nullable)

B-tree indexes on name, file_path, and enclosing_symbol support the exact-match lookups used by the symbol_references, symbol_callers, and symbol_callees MCP tools. No FTS5 table — reference queries are exact-name only.

Incremental behavior mirrors symbol extraction: references for a file are dropped and re-extracted whenever the file’s SHA-256 hash changes. No separate pass is needed — references are extracted in the same tree-sitter walk as symbol definitions.

File embeddings

When RAG is enabled, the build produces file-level vector embeddings for hybrid search. Each file is represented as a FileForEmbedding containing its symbols via FileSymbol (name, kind, and optional signature).

The text representation (file_to_text) works as follows:

  • Symbols are sorted by kind then name
  • Signatures are preferred over kind name fallback when available
  • A total character budget of 1800 caps the output text — after the file path prefix is accounted for, remaining budget is filled with symbols until exhausted
  • Files with no symbols produce a minimal file <path> in <package> string

Embedding runs in a background thread spawned during the build, executing concurrently with post-build housekeeping:

  • Files are processed in batches of 64 to balance throughput and memory
  • A progress callback reports batch completion for progress bar updates
  • Errors (model init, embedding, DB write) are reported on the progress bar rather than failing the build