Shire

                       .,:lccc:,.
                  .,codxkkOOOOkkxdoc,.
              .;ldkkOOOOOOOOOOOOOOOkkdl;.
           .:oxOOkxdollccccccccllodxkOOkxo:.
         ,lkOOxl;..                ..,lxOOkl,
       .ckOOd:.                        .:dOOkc.
      ;xOOo,          .,clllc,.          ,oOOx;
     lOOk;         .:dkOOOOOOkd:.         ;kOOl
    oOOx,        .ckOOOOOOOOOOOOkc.        ,xOOo
   lOOk,        ;xOOOkdl:;;:ldkOOOx;        ,kOOl
  ;OOO;        lOOOd;.        .;dOOOl        ;OOO;
  dOOd        :OOOl              lOOO:        dOOd
  kOOl        oOOx      .;;.     xOOo        lOOk
  kOOl        oOOx     .xOOx.    xOOo        lOOk
  dOOd        :OOOl    .oOOo.   lOOO:        dOOd
  ;OOO;        lOOOd;.  .,,. .;dOOOl        ;OOO;
   lOOk,        ;xOOOkdl:,:ldkOOOx;        ,kOOl
    oOOx,        .ckOOOOOOOOOOOOkc.        ,xOOo
     lOOk;         .:dkOOOOOOkd:.         ;kOOl
      ;xOOo,          .,clllc,.          ,oOOx;
       .ckOOd:.                        .:dOOkc.
         ,lkOOxl;..                ..,lxOOkl,
           .:oxOOkxdollccccccccllodxkOOkxo:.
              .;ldkkOOOOOOOOOOOOOOOkkdl;.
                  .,codxkkOOOOkkxdoc,.
                       .,:lccc:,.

One index to rule them all.

Search, Hierarchy, Index, Repo Explorer — a monorepo package indexer that builds a dependency graph in SQLite and serves it over Model Context Protocol.

Point it at a monorepo. It discovers every package, maps their dependency relationships, extracts symbols from source code, and gives your AI tools structured access to the result.

Get started in 30 seconds

brew install justinjdev/shire/shire
shire init --global
shire build

That’s it. Claude Code can now search your packages, symbols, files, and dependency graph. See Setup for details.

Installation

Homebrew (macOS, Linux)

brew tap justinjdev/shire
brew install shire

From prebuilt binary

Download the latest release from GitHub Releases and add the binary to your PATH.

Nix

# Install into your profile
nix profile install github:justinjdev/shire

# Or run without installing
nix run github:justinjdev/shire

From source

Requires Rust toolchain.

cargo install --path .

# With RAG vector search support (~30-50MB larger binary due to ONNX Runtime):
cargo install --path . --features rag

Setup

Claude Code

One command configures shire globally for all projects:

shire init --global

This creates:

~/.claude/shire.toml — shared config with db_path = "~/.claude/shire/{repo}/{worktree}/index.db" (auto-namespaced per repo and worktree)
mcpServers.shire in ~/.claude.json — serves the index via shire serve
PostToolUse hook in ~/.claude/settings.json — auto-rebuilds the index after file edits (Edit, Write, NotebookEdit, Bash)
~/.claude/rules/shire.md — rules file guiding Claude Code to prefer Shire tools

The {repo} placeholder is replaced with the repository directory name at runtime, and {worktree} with the worktree name (or _primary for the main checkout), so each repo and worktree gets its own index file automatically.

After running shire init --global, open any repo and run:

shire build

The index is ready. Claude Code will automatically use it via the MCP server.

Rules file

shire init creates ~/.claude/rules/shire.md with guidance on when to use Shire tools vs Grep/Glob. This helps Claude Code default to Shire for codebase searches, so you spend fewer tool calls on broad exploration.

The file is only written once — if it already exists, shire init leaves it untouched, so your customizations are preserved.

CLAUDE.md integration

During interactive setup, shire init prompts:

Add Shire search guidance to ~/.claude/CLAUDE.md?

If accepted, it appends a one-liner to ~/.claude/CLAUDE.md directing Claude Code to prefer Shire MCP tools over Grep/Glob for code search. The line is idempotent — running init again won’t duplicate it. If ~/.claude/CLAUDE.md doesn’t exist yet, it creates the file.

Terminal output

shire init uses styled terminal output to show what it does:

✓ (green) — a file or config entry was created or updated
– (dimmed) — a file or config entry already exists, skipped
Section headers appear in cyan

Most file writes (.gitignore, CLAUDE.md, settings.json, .mcp.json, ~/.claude.json) use atomic writes — content is written to a temporary file first, then renamed into place. This prevents partial writes if the process is interrupted.

Project-level setup

To create a shire.toml in the current repo instead of globally:

shire init

This generates a local config file with commented-out defaults you can customize, and writes the MCP server config to .mcp.json. If the db_path points to a local directory (e.g., .shire/index.db), it offers to add that directory to .gitignore.

Manual setup

If you prefer manual configuration, add to ~/.claude.json (global) or .mcp.json (project-level):

{
  "mcpServers": {
    "shire": {
      "command": "shire",
      "args": ["serve"]
    }
  }
}

To keep the index fresh during a session, add a PostToolUse hook to ~/.claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write|NotebookEdit|Bash",
        "hooks": [{ "type": "command", "command": "shire rebuild --stdin" }]
      }
    ]
  }
}

Claude Desktop

Add Shire to your claude_desktop_config.json:

{
  "mcpServers": {
    "shire": {
      "command": "shire",
      "args": ["serve", "--db", "/path/to/repo/.shire/index.db"]
    }
  }
}

Other MCP clients

Shire speaks standard MCP over stdio. Any client that supports MCP can connect:

shire serve --db /path/to/repo/.shire/index.db

Use --root to enable on-demand reindexing (the server checks .git/index mtime for staleness):

shire serve --root /path/to/repo

CLI reference

Build an index

shire build --root /path/to/repo

Rebuild from scratch

Ignore cached hashes and re-parse everything:

shire build --root /path/to/repo --force

Custom database location

shire build --root /path/to/repo --db /tmp/my-index.db

The index defaults to .shire/index.db inside the repo root. Override with --db or db_path in shire.toml (see Configuration).

Clean up

Remove the index database, WAL/SHM files, the .shire directory, and stop the watch daemon:

shire clean

Incremental builds

Subsequent builds are incremental — only manifests whose content has changed (by SHA-256 hash) are re-parsed. Source files are tracked at per-file granularity: if individual source files change without a manifest change, only those files have their symbols re-extracted. An mtime pre-check skips hash computation entirely for packages whose source files haven’t been touched since the last build.

File indexing is also incremental — a file-tree hash detects structural changes, skipping the file indexing phase entirely when no files have been added, removed, or resized.

Symbol extraction and source hashing are parallelized across packages and within packages using rayon for multi-core throughput. Files are read once per build (single-pass hash + extraction). All database writes use batched multi-row INSERTs within explicit transactions, with FTS5 triggers temporarily disabled during bulk operations for maximum SQLite throughput.

Build progress

shire build shows real-time progress for each build phase:

Spinners for quick phases (discovering manifests, workspace context, recomputing internals, indexing files)
Progress bars with ETAs for longer phases (parsing manifests, extracting symbols)
When RAG is enabled, an embedding progress bar tracks file embedding in the background, displaying error messages on failure

Progress bars persist after completion so you can see the full build history in your terminal. Quiet mode (used internally by the MCP server for on-demand rebuilds) hides all progress output.

Configuration

Drop a shire.toml in the repo root to customize behavior:

# Custom database location (default: .shire/index.db)
db_path = "/path/to/custom/index.db"

[discovery]
manifests = ["package.json", "go.mod", "go.work", "Cargo.toml", "pyproject.toml", "pom.xml", "build.gradle", "build.gradle.kts", "settings.gradle", "settings.gradle.kts", "cpanfile", "Gemfile"]
exclude = ["node_modules", "vendor", "dist", ".build", "target", "third_party", ".shire", ".gradle", "build"]

# Symbol extraction
[symbols]
exclude_extensions = [".proto", ".pl"]
references_enabled = false  # EXPERIMENTAL, default false — see below

# Documentation indexing
[docs]
extensions = [".md", ".rst", ".txt", ".adoc"]
max_file_size = 262144  # 256 KB — files larger than this are truncated

# Override package descriptions
[[packages]]
name = "legacy-auth"
description = "Deprecated auth service — do not add new dependencies"

Watch daemon

[watch]
debounce_ms = 2000  # milliseconds to wait after last change before rebuilding

Logging

[log]
level = "warn"          # error, warn, info, debug, trace
dir = ".shire/logs"     # log directory (relative to repo root). Set to "" to disable file logging
max_days = 30           # automatically delete log files older than this

The SHIRE_LOG environment variable overrides the config level (e.g., SHIRE_LOG=debug shire build). Log files are daily-rotated with filenames like shire.log.2026-03-26. Each session includes a unique session ID for correlation across concurrent processes.

All fields are optional. Defaults are shown above. The --db CLI flag takes precedence over db_path in config.

Cross-reference index (experimental)

symbols.references_enabled (default false) populates the symbol_refs table so the symbol_references, symbol_callers, and symbol_callees MCP tools can answer “where is this used?” / “who calls this?” questions. Reference extraction is supported for 8 tier-1 languages: Go, Python, Java, TypeScript, JavaScript, Perl, Ruby, Scala.

Opt-in: shire init asks whether to enable this (prompt labelled experimental), and writes references_enabled = true to shire.toml when you say yes. You can also add it manually:

[symbols]
references_enabled = true

Cost: DB grows substantially — roughly +30% on TS/JS repos to +150% on Go-heavy repos (benchmarks on shire-bench: turborepo +29%, grafana +152%, kubernetes +104% vs main baseline). Build time grows ~5-7%.

Toggling the flag takes effect on the next build. Disabling wipes symbol_refs at the start of the build; re-enabling repopulates it on the next full rebuild (shire build --force).

This feature is marked experimental: its schema and coverage may change in minor versions as language support broadens and edge cases surface.

Custom package discovery

For codebases where packages aren’t defined by standard manifest files — Go single-module monorepos, repos that use ownership.yml + build files, or any non-standard convention — you can define custom discovery rules:

# Discover Go apps: directories containing both main.go and ownership.yml
[[discovery.custom]]
name = "go-apps"
kind = "go"
requires = ["main.go", "ownership.yml"]
paths = ["services/", "cmd/"]
exclude = ["testdata", "examples"]
max_depth = 3
name_prefix = "go:"

# Discover proto packages: directories containing *.proto and buf.yaml
[[discovery.custom]]
name = "proto-packages"
kind = "proto"
requires = ["*.proto", "buf.yaml"]
paths = ["proto/", "services/"]
max_depth = 4

Field	Required	Description
`name`	yes	Rule identifier
`kind`	yes	Package kind for symbol extraction (`go`, `proto`, `npm`, etc.)
`requires`	yes	File patterns that must ALL exist in a directory (supports globs like `*.proto`)
`paths`	no	Limit search to specific subtrees (default: repo root)
`exclude`	no	Rule-specific directory exclusions (on top of global excludes)
`max_depth`	no	Maximum depth to search from each `paths` entry
`name_prefix`	no	Prefix prepended to directory-derived package name (e.g., `go:services/auth`)
`extensions`	no	Override which file extensions get symbol extraction

Custom discovery runs alongside manifest-based discovery. Directories already found by manifest parsers are skipped. Subdirectories of matched directories are also skipped to prevent nested matches.

RAG vector search

RAG adds semantic vector search to search_symbols. It requires compiling with the rag feature flag and enabling it in config.

Build with RAG support:

cargo install --path . --features rag

Enable in shire.toml:

[rag]
enabled = true
# model = "BAAI/bge-small-en-v1.5"   # default, only supported model currently
# cache_dir = "~/.cache/shire-rag"    # optional, for model file storage

When enabled, shire build embeds all symbols after extraction. The first build downloads the model (~33MB) automatically. Subsequent builds are incremental — only changed packages get re-embedded.

RAG is non-fatal: if the model fails to load or embeddings fail, shire falls back to FTS-only search with a warning. If the rag feature is not compiled in, the [rag] config section is silently ignored.

MCP Tools & Prompts

Tools

Shire exposes the following tools over the Model Context Protocol:

Tool	Description
`search_packages`	Search packages by name or description. Use instead of Grep for finding packages.
`list_packages`	List all indexed packages, optionally filtered by kind
`package_dependencies`	List a package’s dependencies. Set `depth>1` for transitive graph (returns edge list with different schema).
`package_dependents`	Find all packages that depend on this package
`search_symbols`	Find functions, classes, types, methods by name or signature. Use instead of Grep for “where is function X?” or “what matches pattern Y?”. Omit query with a package filter to list all symbols in that package. Supports hybrid FTS + vector search when RAG is enabled.
`get_file_symbols`	List all symbols defined in a specific file. Use instead of reading the file to understand its exports.
`search_files`	Find files by path or name. Use instead of Glob/find for locating files. Useful for “middleware”, “proto files”, or files in a specific directory.
`search_docs`	Search documentation files by content, title, or path — returns matching docs with text snippets
`list_package_files`	List all files in a package, optionally filtered by extension. Use instead of Glob for listing package contents.
`explore`	Explore a concept across the codebase — searches packages, symbols, files, and documentation semantically. Use as the first tool when investigating unfamiliar code or broad topics like “authentication” or “error handling”. Returns a structured context map organized by package.
`index_status`	Index build metadata: timestamp, git commit, counts
`symbol_references`	Find all references to a symbol by name. Returns `[{name, kind, file_path, line, package, enclosing_symbol}]`. Accepts optional `kind` and `package` filters. Requires `symbols.references_enabled = true` (experimental, opt-in). Note: matching is name-based — same-name symbols across different packages are merged.
`symbol_callers`	List all callers of a symbol (call-site references). Returns `[{caller_name, caller_file, caller_line, call_sites}]`. Accepts optional `package` filter. Requires `symbols.references_enabled = true`. Same name-based-match caveat as `symbol_references`.
`symbol_callees`	List what a function calls (outbound call graph). Returns `[{callee_name, call_sites}]`. Accepts optional `package` filter. Requires `symbols.references_enabled = true`. Same name-based-match caveat as `symbol_references`.
`change_impact`	Analyze the blast radius of changing a symbol. Combines cross-references with the dependency graph to return `{direct_impact, cross_package_impact, transitive_impact, summary}`. Use before renaming, changing a signature, or deleting a symbol. Accepts optional `package` (home package hint, for disambiguation), `transitive_depth` (default 2), and `limit`. Requires `symbols.references_enabled = true`. Same name-based-match caveat as `symbol_references`.
`schema_consumers`	Find all files generated from a schema file (e.g. `.proto`). Returns generated file paths and their packages. Use to understand the blast radius of a schema change.
`generated_from`	Find the source schema file that generated a given file. Use to trace a generated file (e.g. `user.pb.go`) back to its source proto.

When to use Shire vs Grep/Glob

Task	Use	Not
Find a function, class, or type by name	`search_symbols`	Grep
Find a file by name or path	`search_files`	Glob / find
List files in a package	`list_package_files`	Glob
Find a package	`search_packages`	Grep
Explore an unfamiliar area	`explore`	multiple Grep calls
Search for a literal string or log message	Grep	Shire
Search inside function bodies	Grep	Shire
Pattern match on file contents	Grep	Shire

Prompts

Prompts are pre-built templates that compose multiple queries into structured context. They give your AI a map of where concepts live in the codebase.

Prompt	Args	Description
`explore`	`query`	Search packages, symbols, files, and documentation for a concept — returns a structured context map organized by package
`reference_audit`	`name`	Guides refactor-safety analysis for a symbol: classifies refs by kind, traces the call graph via `symbol_callers`, identifies cross-package impact, and assesses rename/change risk. Requires `symbols.references_enabled = true` (experimental).

Watch Daemon

shire watch starts a background daemon that auto-rebuilds the index when files change. It uses Unix domain socket IPC with configurable debounce (default 2s).

Start the daemon

Idempotent — safe to call multiple times:

shire watch --root /path/to/repo

Signal a rebuild manually

shire rebuild --root /path/to/repo

Signal a rebuild from a Claude Code hook

Reads JSON from stdin, uses cwd as repo root:

shire rebuild --stdin

Stop the daemon

shire watch --root /path/to/repo --stop

Smart filtering

The watch daemon avoids unnecessary rebuilds:

Edit/Write tools — checks file extension relevance and repo boundary
Bash commands — filtered against a denylist of known read-only commands (ls, git status, cargo test, etc.) — unknown commands default to rebuild

Git Worktrees

Shire automatically detects git worktrees and maintains separate indexes for each one. This works out of the box with no configuration required.

How it works

When you run shire build or shire serve inside a linked worktree, shire:

Detects the worktree by inspecting .git — a directory means primary working tree, a file means linked worktree
Resolves a per-worktree DB path using the {worktree} placeholder in db_path
Seeds from the primary worktree’s DB on first build, so you don’t start from scratch — only changed packages need reindexing

The primary worktree uses the reserved name _primary. Linked worktrees use Git’s stable worktree ID (the directory name under .git/worktrees/<id>).

Configuration

The default global config (generated by shire init --global) already includes worktree support:

db_path = "~/.claude/shire/{repo}/{worktree}/index.db"

This produces separate databases like:

~/.claude/shire/my-project/_primary/index.db
~/.claude/shire/my-project/feat-auth/index.db
~/.claude/shire/my-project/bugfix-123/index.db

Placeholders

Placeholder	Description
`{repo}`	Name of the main repository (directory basename)
`{worktree}`	Worktree identifier — `_primary` for the main working tree, or Git’s worktree ID for linked worktrees

Shared vs separate databases

If your db_path does not include {worktree}, all worktrees share the same database. This is fine for read-only use but means concurrent builds from different worktrees will conflict.

Including {worktree} in the path gives each worktree its own database, which is the recommended setup.

DB seeding

When shire builds in a linked worktree for the first time and no database exists yet, it checks whether the primary worktree has an existing database. If so, it copies that database as a seed — giving you a fully populated index immediately. Only packages that differ between the worktrees need reindexing.

$ cd ~/worktrees/feat-auth
$ shire build
Seeded DB from /Users/you/.claude/shire/my-project/_primary/index.db
Building index...

Local config

A local shire.toml at the repo root can use a relative db_path (e.g., .shire/index.db). Since each worktree has its own root directory, relative paths naturally resolve to separate databases per worktree. Seeding still applies in this case.

Supported Ecosystems

Manifest	Kind	Workspace support
`package.json`	npm	`workspace:` protocol versions normalized
`go.mod`	go	`go.work` member metadata
`go.work`	go	`use` directives parsed for workspace context
`Cargo.toml`	cargo	`workspace = true` deps resolved from root
`pyproject.toml`	python	—
`pom.xml`	maven	Parent POM inheritance (groupId, version)
`build.gradle` / `build.gradle.kts`	gradle	`settings.gradle` project inclusion
`cpanfile`	perl	`requires` / `on 'test'` blocks
`Gemfile`	ruby	`gem` / `group :test` blocks
`flake.nix`	nix	`inputs` attrset (dotted and block forms)

Symbol extraction

Shire extracts public symbols (functions, classes, types, methods, interfaces) from source files using tree-sitter, with full signatures, parameters, and return types.

Language	Extractor
TypeScript / JavaScript	tree-sitter
Go	tree-sitter
Rust	tree-sitter
Python	tree-sitter
Java	tree-sitter
Kotlin	tree-sitter
Dart	tree-sitter
Protobuf	tree-sitter
C	tree-sitter
C++	tree-sitter
C#	tree-sitter
Swift	tree-sitter
PHP	tree-sitter
Scala	tree-sitter
Zig	tree-sitter
Bash / Shell	tree-sitter
R	tree-sitter
Haskell	tree-sitter
YAML	tree-sitter
SQL	tree-sitter
HCL / Terraform	tree-sitter
TOML	tree-sitter
Perl	tree-sitter
Ruby	tree-sitter
OCaml	tree-sitter
Lua	tree-sitter
Elixir	tree-sitter
Clojure	tree-sitter
Erlang	tree-sitter
Julia	tree-sitter
Gleam	tree-sitter
Odin	tree-sitter
Nix	tree-sitter
Nim	tree-sitter
COBOL	regex-based

Reference extraction

Shire extracts cross-references (calls, type references, imports, and interface implementations) for a subset of languages. These are stored in the symbol_refs table and exposed via the symbol_references, symbol_callers, and symbol_callees MCP tools.

Language	Call	Type	Import	Impl
Go	yes	yes	yes	— (implicit interfaces)
Python	yes	yes	yes	yes
Java	yes	yes	yes	yes
TypeScript	yes	yes	yes	yes
JavaScript	yes	—	yes	yes
Perl	yes	—	yes	—
Ruby	yes	yes	yes	yes
Scala	yes	yes	yes	yes

All other languages: symbol definitions only; references are not extracted.

Performance

Shire is designed to index large monorepos quickly and answer queries instantly. This page documents benchmark methodology, results, and how to reproduce them.

Test repos

Benchmarks run against three real-world open-source monorepos covering a range of sizes and ecosystems:

Repo	Size	Packages	Symbols	Files	Primary languages
turborepo	small	400	10,686	5,451	Rust, TypeScript, Go
grafana	medium	28	35,104	14,054	Go, TypeScript
kubernetes	large	34	78,458	18,275	Go

Build performance

Full rebuild (no incremental cache), median of 4 iterations after a warmup run:

Repo	Median	Min	P95	Std dev
turborepo	571ms	525ms	678ms	58ms
grafana	1,150ms	1,025ms	1,172ms	58ms
kubernetes	1,703ms	1,607ms	1,897ms	108ms

Build time scales roughly linearly with symbol count. The pipeline is parallelized with rayon across packages and files, with batched multi-row SQLite inserts within explicit transactions.

Incremental builds are significantly faster – only packages with changed source files are re-extracted, and an mtime pre-check skips SHA-256 computation entirely for untouched packages.

Query performance

Median latency over 100 iterations per query:

Query	Small	Medium	Large
`search_symbols("parse")`	0.09ms	0.09ms	0.04ms
`search_symbols("Config")`	0.20ms	0.29ms	1.01ms
`search_files("mod")`	0.05ms	0.03ms	0.04ms
`search_files("test")`	0.07ms	0.60ms	1.99ms
`list_packages(None)`	0.11ms	0.01ms	0.01ms

All queries use SQLite FTS5 full-text search with unicode61 tokenizer and prefix indexes. Query latency depends primarily on result set size, not total index size.

Reproducing benchmarks

Shire includes an autoresearch binary for reproducible benchmarking.

Setup

Run the benchmark repo setup script to clone and prepare the test repos:

scripts/setup-bench-repo.sh

This clones the three repos into ~/.cache/shire-bench/ and creates a shire.toml in each.

Running benchmarks

# Build the benchmark binary
cargo build --release --bin autoresearch

# Run build benchmarks (all repos)
cargo run --release --bin autoresearch -- --phase build

# Run query benchmarks (all repos)
cargo run --release --bin autoresearch -- --phase query

# Filter by repo size
cargo run --release --bin autoresearch -- --phase build --size small

# Point at a specific repo
cargo run --release --bin autoresearch -- --phase build --repo /path/to/repo

Build benchmarks run 5 iterations (1 warmup + 4 measured) per repo. Query benchmarks run 100 iterations per query. Results are printed as JSON to stdout.

Environment notes

Results vary by machine (CPU, disk speed, available memory)
Close other applications for more stable measurements
The warmup iteration primes filesystem caches and SQLite page cache
Numbers in this document were captured on an Apple M-series Mac

Architecture

src/
├── main.rs          # CLI (clap): build, serve, watch, rebuild, init, clean subcommands
├── lib.rs           # Library re-exports for embedding shire as a crate
├── config.rs        # shire.toml parsing
├── git.rs           # Git worktree detection and repo root resolution
├── init.rs          # `shire init` setup (config, MCP server, hooks, rules)
├── db/
│   ├── mod.rs       # SQLite schema, open/create
│   └── queries.rs   # FTS search, dependency graph BFS, listing
├── index/
│   ├── mod.rs       # Walk + incremental index orchestrator
│   ├── custom_discovery.rs # Config-driven custom package discovery
│   ├── manifest.rs  # ManifestParser trait
│   ├── hash.rs      # SHA-256 content hashing for incremental builds
│   ├── npm.rs       # package.json parser (workspace: protocol)
│   ├── go.rs        # go.mod parser
│   ├── go_work.rs   # go.work parser (workspace use directives)
│   ├── cargo.rs     # Cargo.toml parser (workspace dep resolution)
│   ├── python.rs    # pyproject.toml parser
│   ├── maven.rs     # pom.xml parser (parent POM inheritance)
│   ├── gradle.rs    # build.gradle / build.gradle.kts parser
│   ├── gradle_settings.rs # settings.gradle parser (project inclusion)
│   ├── perl.rs      # cpanfile parser (requires, on 'test')
│   └── ruby.rs      # Gemfile parser (gem, group blocks)
├── symbols/
│   ├── mod.rs       # Symbol types, kind-agnostic extraction orchestrator
│   ├── walker.rs    # Source file discovery (extension filtering, excludes)
│   ├── registry.rs  # Language registry: maps extensions to tree-sitter grammars + hooks
│   ├── query_extract.rs # Generic tree-sitter query executor with hook callbacks
│   ├── queries/     # Tree-sitter .scm query files (one per language)
│   ├── hooks/       # Language-specific hooks (visibility, signatures, params, post-processing)
│   ├── elixir.rs    # Elixir extractor (regex-based)
│   └── cobol.rs     # COBOL extractor (regex-based)
│                    # Cross-reference extraction (call, type, import, impl) is supported
│                    # for 8 tier-1 languages: Go, Python, Java, TypeScript, JavaScript,
│                    # Perl, Ruby, Scala. References are captured via @reference.* captures
│                    # in the language's .scm query and written to the symbol_refs table.
│                    # Coverage is asymmetric per language: JavaScript omits Type refs
│                    # (no type system), and Go/Perl omit Impl refs (no extends/implements).
├── rag/             # Optional RAG vector search (behind `rag` feature flag)
│   ├── mod.rs       # Feature-gated module root
│   ├── embedder.rs  # fastembed wrapper, file-level text formatting, batch embedding (64 files/batch)
│   └── storage.rs   # sqlite-vec extension, vec0 table, vector CRUD, KNN search
├── mcp/
│   ├── mod.rs       # MCP server setup (rmcp, stdio transport)
│   ├── tools.rs     # 11 tool handlers (+ hybrid search when RAG enabled)
│   └── prompts.rs   # explore prompt template for semantic codebase exploration
└── watch/
    ├── mod.rs       # Daemon event loop (UDS listener, debounce, rebuild)
    ├── daemon.rs    # Process management (start/stop/is_running via PID)
    └── protocol.rs  # Hook input parsing, Bash read-only denylist

symbol_refs table

The symbol_refs table stores cross-reference records extracted alongside symbol definitions. Each row captures a reference to a named symbol:

Column	Type	Description
`name`	TEXT	The name being referenced (function, type, module, etc.)
`kind`	TEXT	One of: `call`, `type`, `import`, `impl`
`file_path`	TEXT	Source file containing the reference
`line`	INTEGER	Line number of the reference
`package`	TEXT	Package the referencing file belongs to (nullable)
`enclosing_symbol`	TEXT	Nearest enclosing function or method (nullable)

B-tree indexes on name, file_path, and enclosing_symbol support the exact-match lookups used by the symbol_references, symbol_callers, and symbol_callees MCP tools. No FTS5 table — reference queries are exact-name only.

Incremental behavior mirrors symbol extraction: references for a file are dropped and re-extracted whenever the file’s SHA-256 hash changes. No separate pass is needed — references are extracted in the same tree-sitter walk as symbol definitions.

File embeddings

When RAG is enabled, the build produces file-level vector embeddings for hybrid search. Each file is represented as a FileForEmbedding containing its symbols via FileSymbol (name, kind, and optional signature).

The text representation (file_to_text) works as follows:

Symbols are sorted by kind then name
Signatures are preferred over kind name fallback when available
A total character budget of 1800 caps the output text — after the file path prefix is accounted for, remaining budget is filled with symbols until exhausted
Files with no symbols produce a minimal file <path> in <package> string

Embedding runs in a background thread spawned during the build, executing concurrently with post-build housekeeping:

Files are processed in batches of 64 to balance throughput and memory
A progress callback reports batch completion for progress bar updates
Errors (model init, embedding, DB write) are reported on the progress bar rather than failing the build

Keyboard shortcuts

Shire