Shire
.,:lccc:,.
.,codxkkOOOOkkxdoc,.
.;ldkkOOOOOOOOOOOOOOOkkdl;.
.:oxOOkxdollccccccccllodxkOOkxo:.
,lkOOxl;.. ..,lxOOkl,
.ckOOd:. .:dOOkc.
;xOOo, .,clllc,. ,oOOx;
lOOk; .:dkOOOOOOkd:. ;kOOl
oOOx, .ckOOOOOOOOOOOOkc. ,xOOo
lOOk, ;xOOOkdl:;;:ldkOOOx; ,kOOl
;OOO; lOOOd;. .;dOOOl ;OOO;
dOOd :OOOl lOOO: dOOd
kOOl oOOx .;;. xOOo lOOk
kOOl oOOx .xOOx. xOOo lOOk
dOOd :OOOl .oOOo. lOOO: dOOd
;OOO; lOOOd;. .,,. .;dOOOl ;OOO;
lOOk, ;xOOOkdl:,:ldkOOOx; ,kOOl
oOOx, .ckOOOOOOOOOOOOkc. ,xOOo
lOOk; .:dkOOOOOOkd:. ;kOOl
;xOOo, .,clllc,. ,oOOx;
.ckOOd:. .:dOOkc.
,lkOOxl;.. ..,lxOOkl,
.:oxOOkxdollccccccccllodxkOOkxo:.
.;ldkkOOOOOOOOOOOOOOOkkdl;.
.,codxkkOOOOkkxdoc,.
.,:lccc:,.
One index to rule them all.
Search, Hierarchy, Index, Repo Explorer — a monorepo package indexer that builds a dependency graph in SQLite and serves it over Model Context Protocol.
Point it at a monorepo. It discovers every package, maps their dependency relationships, extracts symbols from source code, and gives your AI tools structured access to the result.
Get started in 30 seconds
brew install justinjdev/shire/shire
shire init --global
shire build
That’s it. Claude Code can now search your packages, symbols, files, and dependency graph. See Setup for details.
Installation
Homebrew (macOS, Linux)
brew tap justinjdev/shire
brew install shire
From prebuilt binary
Download the latest release from GitHub Releases and add the binary to your PATH.
Nix
# Install into your profile
nix profile install github:justinjdev/shire
# Or run without installing
nix run github:justinjdev/shire
From source
Requires Rust toolchain.
cargo install --path .
# With RAG vector search support (~30-50MB larger binary due to ONNX Runtime):
cargo install --path . --features rag
Setup
Claude Code
One command configures shire globally for all projects:
shire init --global
This creates:
~/.claude/shire.toml— shared config withdb_path = "~/.claude/shire/{repo}/{worktree}/index.db"(auto-namespaced per repo and worktree)mcpServers.shirein~/.claude.json— serves the index viashire servePostToolUsehook in~/.claude/settings.json— auto-rebuilds the index after file edits (Edit,Write,NotebookEdit,Bash)~/.claude/rules/shire.md— rules file guiding Claude Code to prefer Shire tools
The {repo} placeholder is replaced with the repository directory name at runtime, and {worktree} with the worktree name (or _primary for the main checkout), so each repo and worktree gets its own index file automatically.
After running shire init --global, open any repo and run:
shire build
The index is ready. Claude Code will automatically use it via the MCP server.
Rules file
shire init creates ~/.claude/rules/shire.md with guidance on when to use Shire tools vs Grep/Glob. This helps Claude Code default to Shire for codebase searches, so you spend fewer tool calls on broad exploration.
The file is only written once — if it already exists, shire init leaves it untouched, so your customizations are preserved.
CLAUDE.md integration
During interactive setup, shire init prompts:
Add Shire search guidance to ~/.claude/CLAUDE.md?
If accepted, it appends a one-liner to ~/.claude/CLAUDE.md directing Claude Code to prefer Shire MCP tools over Grep/Glob for code search. The line is idempotent — running init again won’t duplicate it. If ~/.claude/CLAUDE.md doesn’t exist yet, it creates the file.
Terminal output
shire init uses styled terminal output to show what it does:
- ✓ (green) — a file or config entry was created or updated
- – (dimmed) — a file or config entry already exists, skipped
- Section headers appear in cyan
Most file writes (.gitignore, CLAUDE.md, settings.json, .mcp.json, ~/.claude.json) use atomic writes — content is written to a temporary file first, then renamed into place. This prevents partial writes if the process is interrupted.
Project-level setup
To create a shire.toml in the current repo instead of globally:
shire init
This generates a local config file with commented-out defaults you can customize, and writes the MCP server config to .mcp.json. If the db_path points to a local directory (e.g., .shire/index.db), it offers to add that directory to .gitignore.
Manual setup
If you prefer manual configuration, add to ~/.claude.json (global) or .mcp.json (project-level):
{
"mcpServers": {
"shire": {
"command": "shire",
"args": ["serve"]
}
}
}
To keep the index fresh during a session, add a PostToolUse hook to ~/.claude/settings.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|NotebookEdit|Bash",
"hooks": [{ "type": "command", "command": "shire rebuild --stdin" }]
}
]
}
}
Claude Desktop
Add Shire to your claude_desktop_config.json:
{
"mcpServers": {
"shire": {
"command": "shire",
"args": ["serve", "--db", "/path/to/repo/.shire/index.db"]
}
}
}
Other MCP clients
Shire speaks standard MCP over stdio. Any client that supports MCP can connect:
shire serve --db /path/to/repo/.shire/index.db
Use --root to enable on-demand reindexing (the server checks .git/index mtime for staleness):
shire serve --root /path/to/repo
CLI reference
Build an index
shire build --root /path/to/repo
Rebuild from scratch
Ignore cached hashes and re-parse everything:
shire build --root /path/to/repo --force
Custom database location
shire build --root /path/to/repo --db /tmp/my-index.db
The index defaults to .shire/index.db inside the repo root. Override with --db or db_path in shire.toml (see Configuration).
Clean up
Remove the index database, WAL/SHM files, the .shire directory, and stop the watch daemon:
shire clean
Incremental builds
Subsequent builds are incremental — only manifests whose content has changed (by SHA-256 hash) are re-parsed. Source files are tracked at per-file granularity: if individual source files change without a manifest change, only those files have their symbols re-extracted. An mtime pre-check skips hash computation entirely for packages whose source files haven’t been touched since the last build.
File indexing is also incremental — a file-tree hash detects structural changes, skipping the file indexing phase entirely when no files have been added, removed, or resized.
Symbol extraction and source hashing are parallelized across packages and within packages using rayon for multi-core throughput. Files are read once per build (single-pass hash + extraction). All database writes use batched multi-row INSERTs within explicit transactions, with FTS5 triggers temporarily disabled during bulk operations for maximum SQLite throughput.
Build progress
shire build shows real-time progress for each build phase:
- Spinners for quick phases (discovering manifests, workspace context, recomputing internals, indexing files)
- Progress bars with ETAs for longer phases (parsing manifests, extracting symbols)
- When RAG is enabled, an embedding progress bar tracks file embedding in the background, displaying error messages on failure
Progress bars persist after completion so you can see the full build history in your terminal. Quiet mode (used internally by the MCP server for on-demand rebuilds) hides all progress output.
Configuration
Drop a shire.toml in the repo root to customize behavior:
# Custom database location (default: .shire/index.db)
db_path = "/path/to/custom/index.db"
[discovery]
manifests = ["package.json", "go.mod", "go.work", "Cargo.toml", "pyproject.toml", "pom.xml", "build.gradle", "build.gradle.kts", "settings.gradle", "settings.gradle.kts", "cpanfile", "Gemfile"]
exclude = ["node_modules", "vendor", "dist", ".build", "target", "third_party", ".shire", ".gradle", "build"]
# Symbol extraction
[symbols]
exclude_extensions = [".proto", ".pl"]
references_enabled = false # EXPERIMENTAL, default false — see below
# Documentation indexing
[docs]
extensions = [".md", ".rst", ".txt", ".adoc"]
max_file_size = 262144 # 256 KB — files larger than this are truncated
# Override package descriptions
[[packages]]
name = "legacy-auth"
description = "Deprecated auth service — do not add new dependencies"
Watch daemon
[watch]
debounce_ms = 2000 # milliseconds to wait after last change before rebuilding
Logging
[log]
level = "warn" # error, warn, info, debug, trace
dir = ".shire/logs" # log directory (relative to repo root). Set to "" to disable file logging
max_days = 30 # automatically delete log files older than this
The SHIRE_LOG environment variable overrides the config level (e.g., SHIRE_LOG=debug shire build). Log files are daily-rotated with filenames like shire.log.2026-03-26. Each session includes a unique session ID for correlation across concurrent processes.
All fields are optional. Defaults are shown above. The --db CLI flag takes precedence over db_path in config.
Cross-reference index (experimental)
symbols.references_enabled (default false) populates the symbol_refs
table so the symbol_references, symbol_callers, and symbol_callees
MCP tools can answer “where is this used?” / “who calls this?” questions.
Reference extraction is supported for 8 tier-1 languages: Go, Python,
Java, TypeScript, JavaScript, Perl, Ruby, Scala.
Opt-in: shire init asks whether to enable this (prompt labelled
experimental), and writes references_enabled = true to shire.toml
when you say yes. You can also add it manually:
[symbols]
references_enabled = true
Cost: DB grows substantially — roughly +30% on TS/JS repos to +150% on Go-heavy repos (benchmarks on shire-bench: turborepo +29%, grafana +152%, kubernetes +104% vs main baseline). Build time grows ~5-7%.
Toggling the flag takes effect on the next build. Disabling wipes
symbol_refs at the start of the build; re-enabling repopulates it on
the next full rebuild (shire build --force).
This feature is marked experimental: its schema and coverage may change in minor versions as language support broadens and edge cases surface.
Custom package discovery
For codebases where packages aren’t defined by standard manifest files — Go single-module monorepos, repos that use ownership.yml + build files, or any non-standard convention — you can define custom discovery rules:
# Discover Go apps: directories containing both main.go and ownership.yml
[[discovery.custom]]
name = "go-apps"
kind = "go"
requires = ["main.go", "ownership.yml"]
paths = ["services/", "cmd/"]
exclude = ["testdata", "examples"]
max_depth = 3
name_prefix = "go:"
# Discover proto packages: directories containing *.proto and buf.yaml
[[discovery.custom]]
name = "proto-packages"
kind = "proto"
requires = ["*.proto", "buf.yaml"]
paths = ["proto/", "services/"]
max_depth = 4
| Field | Required | Description |
|---|---|---|
name | yes | Rule identifier |
kind | yes | Package kind for symbol extraction (go, proto, npm, etc.) |
requires | yes | File patterns that must ALL exist in a directory (supports globs like *.proto) |
paths | no | Limit search to specific subtrees (default: repo root) |
exclude | no | Rule-specific directory exclusions (on top of global excludes) |
max_depth | no | Maximum depth to search from each paths entry |
name_prefix | no | Prefix prepended to directory-derived package name (e.g., go:services/auth) |
extensions | no | Override which file extensions get symbol extraction |
Custom discovery runs alongside manifest-based discovery. Directories already found by manifest parsers are skipped. Subdirectories of matched directories are also skipped to prevent nested matches.
RAG vector search
RAG adds semantic vector search to search_symbols. It requires compiling with the rag feature flag and enabling it in config.
Build with RAG support:
cargo install --path . --features rag
Enable in shire.toml:
[rag]
enabled = true
# model = "BAAI/bge-small-en-v1.5" # default, only supported model currently
# cache_dir = "~/.cache/shire-rag" # optional, for model file storage
When enabled, shire build embeds all symbols after extraction. The first build downloads the model (~33MB) automatically. Subsequent builds are incremental — only changed packages get re-embedded.
RAG is non-fatal: if the model fails to load or embeddings fail, shire falls back to FTS-only search with a warning. If the rag feature is not compiled in, the [rag] config section is silently ignored.
MCP Tools & Prompts
Tools
Shire exposes the following tools over the Model Context Protocol:
| Tool | Description |
|---|---|
search_packages | Search packages by name or description. Use instead of Grep for finding packages. |
list_packages | List all indexed packages, optionally filtered by kind |
package_dependencies | List a package’s dependencies. Set depth>1 for transitive graph (returns edge list with different schema). |
package_dependents | Find all packages that depend on this package |
search_symbols | Find functions, classes, types, methods by name or signature. Use instead of Grep for “where is function X?” or “what matches pattern Y?”. Omit query with a package filter to list all symbols in that package. Supports hybrid FTS + vector search when RAG is enabled. |
get_file_symbols | List all symbols defined in a specific file. Use instead of reading the file to understand its exports. |
search_files | Find files by path or name. Use instead of Glob/find for locating files. Useful for “middleware”, “proto files”, or files in a specific directory. |
search_docs | Search documentation files by content, title, or path — returns matching docs with text snippets |
list_package_files | List all files in a package, optionally filtered by extension. Use instead of Glob for listing package contents. |
explore | Explore a concept across the codebase — searches packages, symbols, files, and documentation semantically. Use as the first tool when investigating unfamiliar code or broad topics like “authentication” or “error handling”. Returns a structured context map organized by package. |
index_status | Index build metadata: timestamp, git commit, counts |
symbol_references | Find all references to a symbol by name. Returns [{name, kind, file_path, line, package, enclosing_symbol}]. Accepts optional kind and package filters. Requires symbols.references_enabled = true (experimental, opt-in). Note: matching is name-based — same-name symbols across different packages are merged. |
symbol_callers | List all callers of a symbol (call-site references). Returns [{caller_name, caller_file, caller_line, call_sites}]. Accepts optional package filter. Requires symbols.references_enabled = true. Same name-based-match caveat as symbol_references. |
symbol_callees | List what a function calls (outbound call graph). Returns [{callee_name, call_sites}]. Accepts optional package filter. Requires symbols.references_enabled = true. Same name-based-match caveat as symbol_references. |
change_impact | Analyze the blast radius of changing a symbol. Combines cross-references with the dependency graph to return {direct_impact, cross_package_impact, transitive_impact, summary}. Use before renaming, changing a signature, or deleting a symbol. Accepts optional package (home package hint, for disambiguation), transitive_depth (default 2), and limit. Requires symbols.references_enabled = true. Same name-based-match caveat as symbol_references. |
schema_consumers | Find all files generated from a schema file (e.g. .proto). Returns generated file paths and their packages. Use to understand the blast radius of a schema change. |
generated_from | Find the source schema file that generated a given file. Use to trace a generated file (e.g. user.pb.go) back to its source proto. |
When to use Shire vs Grep/Glob
| Task | Use | Not |
|---|---|---|
| Find a function, class, or type by name | search_symbols | Grep |
| Find a file by name or path | search_files | Glob / find |
| List files in a package | list_package_files | Glob |
| Find a package | search_packages | Grep |
| Explore an unfamiliar area | explore | multiple Grep calls |
| Search for a literal string or log message | Grep | Shire |
| Search inside function bodies | Grep | Shire |
| Pattern match on file contents | Grep | Shire |
Prompts
Prompts are pre-built templates that compose multiple queries into structured context. They give your AI a map of where concepts live in the codebase.
| Prompt | Args | Description |
|---|---|---|
explore | query | Search packages, symbols, files, and documentation for a concept — returns a structured context map organized by package |
reference_audit | name | Guides refactor-safety analysis for a symbol: classifies refs by kind, traces the call graph via symbol_callers, identifies cross-package impact, and assesses rename/change risk. Requires symbols.references_enabled = true (experimental). |
Watch Daemon
shire watch starts a background daemon that auto-rebuilds the index when files change. It uses Unix domain socket IPC with configurable debounce (default 2s).
Start the daemon
Idempotent — safe to call multiple times:
shire watch --root /path/to/repo
Signal a rebuild manually
shire rebuild --root /path/to/repo
Signal a rebuild from a Claude Code hook
Reads JSON from stdin, uses cwd as repo root:
shire rebuild --stdin
Stop the daemon
shire watch --root /path/to/repo --stop
Smart filtering
The watch daemon avoids unnecessary rebuilds:
- Edit/Write tools — checks file extension relevance and repo boundary
- Bash commands — filtered against a denylist of known read-only commands (
ls,git status,cargo test, etc.) — unknown commands default to rebuild
Git Worktrees
Shire automatically detects git worktrees and maintains separate indexes for each one. This works out of the box with no configuration required.
How it works
When you run shire build or shire serve inside a linked worktree, shire:
- Detects the worktree by inspecting
.git— a directory means primary working tree, a file means linked worktree - Resolves a per-worktree DB path using the
{worktree}placeholder indb_path - Seeds from the primary worktree’s DB on first build, so you don’t start from scratch — only changed packages need reindexing
The primary worktree uses the reserved name _primary. Linked worktrees use Git’s stable worktree ID (the directory name under .git/worktrees/<id>).
Configuration
The default global config (generated by shire init --global) already includes worktree support:
db_path = "~/.claude/shire/{repo}/{worktree}/index.db"
This produces separate databases like:
~/.claude/shire/my-project/_primary/index.db
~/.claude/shire/my-project/feat-auth/index.db
~/.claude/shire/my-project/bugfix-123/index.db
Placeholders
| Placeholder | Description |
|---|---|
{repo} | Name of the main repository (directory basename) |
{worktree} | Worktree identifier — _primary for the main working tree, or Git’s worktree ID for linked worktrees |
Shared vs separate databases
If your db_path does not include {worktree}, all worktrees share the same database. This is fine for read-only use but means concurrent builds from different worktrees will conflict.
Including {worktree} in the path gives each worktree its own database, which is the recommended setup.
DB seeding
When shire builds in a linked worktree for the first time and no database exists yet, it checks whether the primary worktree has an existing database. If so, it copies that database as a seed — giving you a fully populated index immediately. Only packages that differ between the worktrees need reindexing.
$ cd ~/worktrees/feat-auth
$ shire build
Seeded DB from /Users/you/.claude/shire/my-project/_primary/index.db
Building index...
Local config
A local shire.toml at the repo root can use a relative db_path (e.g., .shire/index.db). Since each worktree has its own root directory, relative paths naturally resolve to separate databases per worktree. Seeding still applies in this case.
Supported Ecosystems
| Manifest | Kind | Workspace support |
|---|---|---|
package.json | npm | workspace: protocol versions normalized |
go.mod | go | go.work member metadata |
go.work | go | use directives parsed for workspace context |
Cargo.toml | cargo | workspace = true deps resolved from root |
pyproject.toml | python | — |
pom.xml | maven | Parent POM inheritance (groupId, version) |
build.gradle / build.gradle.kts | gradle | settings.gradle project inclusion |
cpanfile | perl | requires / on 'test' blocks |
Gemfile | ruby | gem / group :test blocks |
flake.nix | nix | inputs attrset (dotted and block forms) |
Symbol extraction
Shire extracts public symbols (functions, classes, types, methods, interfaces) from source files using tree-sitter, with full signatures, parameters, and return types.
| Language | Extractor |
|---|---|
| TypeScript / JavaScript | tree-sitter |
| Go | tree-sitter |
| Rust | tree-sitter |
| Python | tree-sitter |
| Java | tree-sitter |
| Kotlin | tree-sitter |
| Dart | tree-sitter |
| Protobuf | tree-sitter |
| C | tree-sitter |
| C++ | tree-sitter |
| C# | tree-sitter |
| Swift | tree-sitter |
| PHP | tree-sitter |
| Scala | tree-sitter |
| Zig | tree-sitter |
| Bash / Shell | tree-sitter |
| R | tree-sitter |
| Haskell | tree-sitter |
| YAML | tree-sitter |
| SQL | tree-sitter |
| HCL / Terraform | tree-sitter |
| TOML | tree-sitter |
| Perl | tree-sitter |
| Ruby | tree-sitter |
| OCaml | tree-sitter |
| Lua | tree-sitter |
| Elixir | tree-sitter |
| Clojure | tree-sitter |
| Erlang | tree-sitter |
| Julia | tree-sitter |
| Gleam | tree-sitter |
| Odin | tree-sitter |
| Nix | tree-sitter |
| Nim | tree-sitter |
| COBOL | regex-based |
Reference extraction
Shire extracts cross-references (calls, type references, imports, and interface implementations) for a subset of languages. These are stored in the symbol_refs table and exposed via the symbol_references, symbol_callers, and symbol_callees MCP tools.
| Language | Call | Type | Import | Impl |
|---|---|---|---|---|
| Go | yes | yes | yes | — (implicit interfaces) |
| Python | yes | yes | yes | yes |
| Java | yes | yes | yes | yes |
| TypeScript | yes | yes | yes | yes |
| JavaScript | yes | — | yes | yes |
| Perl | yes | — | yes | — |
| Ruby | yes | yes | yes | yes |
| Scala | yes | yes | yes | yes |
All other languages: symbol definitions only; references are not extracted.
Performance
Shire is designed to index large monorepos quickly and answer queries instantly. This page documents benchmark methodology, results, and how to reproduce them.
Test repos
Benchmarks run against three real-world open-source monorepos covering a range of sizes and ecosystems:
| Repo | Size | Packages | Symbols | Files | Primary languages |
|---|---|---|---|---|---|
| turborepo | small | 400 | 10,686 | 5,451 | Rust, TypeScript, Go |
| grafana | medium | 28 | 35,104 | 14,054 | Go, TypeScript |
| kubernetes | large | 34 | 78,458 | 18,275 | Go |
Build performance
Full rebuild (no incremental cache), median of 4 iterations after a warmup run:
| Repo | Median | Min | P95 | Std dev |
|---|---|---|---|---|
| turborepo | 571ms | 525ms | 678ms | 58ms |
| grafana | 1,150ms | 1,025ms | 1,172ms | 58ms |
| kubernetes | 1,703ms | 1,607ms | 1,897ms | 108ms |
Build time scales roughly linearly with symbol count. The pipeline is parallelized with rayon across packages and files, with batched multi-row SQLite inserts within explicit transactions.
Incremental builds are significantly faster – only packages with changed source files are re-extracted, and an mtime pre-check skips SHA-256 computation entirely for untouched packages.
Query performance
Median latency over 100 iterations per query:
| Query | Small | Medium | Large |
|---|---|---|---|
search_symbols("parse") | 0.09ms | 0.09ms | 0.04ms |
search_symbols("Config") | 0.20ms | 0.29ms | 1.01ms |
search_files("mod") | 0.05ms | 0.03ms | 0.04ms |
search_files("test") | 0.07ms | 0.60ms | 1.99ms |
list_packages(None) | 0.11ms | 0.01ms | 0.01ms |
All queries use SQLite FTS5 full-text search with unicode61 tokenizer and prefix indexes. Query latency depends primarily on result set size, not total index size.
Reproducing benchmarks
Shire includes an autoresearch binary for reproducible benchmarking.
Setup
Run the benchmark repo setup script to clone and prepare the test repos:
scripts/setup-bench-repo.sh
This clones the three repos into ~/.cache/shire-bench/ and creates a shire.toml in each.
Running benchmarks
# Build the benchmark binary
cargo build --release --bin autoresearch
# Run build benchmarks (all repos)
cargo run --release --bin autoresearch -- --phase build
# Run query benchmarks (all repos)
cargo run --release --bin autoresearch -- --phase query
# Filter by repo size
cargo run --release --bin autoresearch -- --phase build --size small
# Point at a specific repo
cargo run --release --bin autoresearch -- --phase build --repo /path/to/repo
Build benchmarks run 5 iterations (1 warmup + 4 measured) per repo. Query benchmarks run 100 iterations per query. Results are printed as JSON to stdout.
Environment notes
- Results vary by machine (CPU, disk speed, available memory)
- Close other applications for more stable measurements
- The warmup iteration primes filesystem caches and SQLite page cache
- Numbers in this document were captured on an Apple M-series Mac
Architecture
src/
├── main.rs # CLI (clap): build, serve, watch, rebuild, init, clean subcommands
├── lib.rs # Library re-exports for embedding shire as a crate
├── config.rs # shire.toml parsing
├── git.rs # Git worktree detection and repo root resolution
├── init.rs # `shire init` setup (config, MCP server, hooks, rules)
├── db/
│ ├── mod.rs # SQLite schema, open/create
│ └── queries.rs # FTS search, dependency graph BFS, listing
├── index/
│ ├── mod.rs # Walk + incremental index orchestrator
│ ├── custom_discovery.rs # Config-driven custom package discovery
│ ├── manifest.rs # ManifestParser trait
│ ├── hash.rs # SHA-256 content hashing for incremental builds
│ ├── npm.rs # package.json parser (workspace: protocol)
│ ├── go.rs # go.mod parser
│ ├── go_work.rs # go.work parser (workspace use directives)
│ ├── cargo.rs # Cargo.toml parser (workspace dep resolution)
│ ├── python.rs # pyproject.toml parser
│ ├── maven.rs # pom.xml parser (parent POM inheritance)
│ ├── gradle.rs # build.gradle / build.gradle.kts parser
│ ├── gradle_settings.rs # settings.gradle parser (project inclusion)
│ ├── perl.rs # cpanfile parser (requires, on 'test')
│ └── ruby.rs # Gemfile parser (gem, group blocks)
├── symbols/
│ ├── mod.rs # Symbol types, kind-agnostic extraction orchestrator
│ ├── walker.rs # Source file discovery (extension filtering, excludes)
│ ├── registry.rs # Language registry: maps extensions to tree-sitter grammars + hooks
│ ├── query_extract.rs # Generic tree-sitter query executor with hook callbacks
│ ├── queries/ # Tree-sitter .scm query files (one per language)
│ ├── hooks/ # Language-specific hooks (visibility, signatures, params, post-processing)
│ ├── elixir.rs # Elixir extractor (regex-based)
│ └── cobol.rs # COBOL extractor (regex-based)
│ # Cross-reference extraction (call, type, import, impl) is supported
│ # for 8 tier-1 languages: Go, Python, Java, TypeScript, JavaScript,
│ # Perl, Ruby, Scala. References are captured via @reference.* captures
│ # in the language's .scm query and written to the symbol_refs table.
│ # Coverage is asymmetric per language: JavaScript omits Type refs
│ # (no type system), and Go/Perl omit Impl refs (no extends/implements).
├── rag/ # Optional RAG vector search (behind `rag` feature flag)
│ ├── mod.rs # Feature-gated module root
│ ├── embedder.rs # fastembed wrapper, file-level text formatting, batch embedding (64 files/batch)
│ └── storage.rs # sqlite-vec extension, vec0 table, vector CRUD, KNN search
├── mcp/
│ ├── mod.rs # MCP server setup (rmcp, stdio transport)
│ ├── tools.rs # 11 tool handlers (+ hybrid search when RAG enabled)
│ └── prompts.rs # explore prompt template for semantic codebase exploration
└── watch/
├── mod.rs # Daemon event loop (UDS listener, debounce, rebuild)
├── daemon.rs # Process management (start/stop/is_running via PID)
└── protocol.rs # Hook input parsing, Bash read-only denylist
symbol_refs table
The symbol_refs table stores cross-reference records extracted alongside symbol definitions. Each row captures a reference to a named symbol:
| Column | Type | Description |
|---|---|---|
name | TEXT | The name being referenced (function, type, module, etc.) |
kind | TEXT | One of: call, type, import, impl |
file_path | TEXT | Source file containing the reference |
line | INTEGER | Line number of the reference |
package | TEXT | Package the referencing file belongs to (nullable) |
enclosing_symbol | TEXT | Nearest enclosing function or method (nullable) |
B-tree indexes on name, file_path, and enclosing_symbol support the exact-match lookups used by the symbol_references, symbol_callers, and symbol_callees MCP tools. No FTS5 table — reference queries are exact-name only.
Incremental behavior mirrors symbol extraction: references for a file are dropped and re-extracted whenever the file’s SHA-256 hash changes. No separate pass is needed — references are extracted in the same tree-sitter walk as symbol definitions.
File embeddings
When RAG is enabled, the build produces file-level vector embeddings for hybrid search. Each file is represented as a FileForEmbedding containing its symbols via FileSymbol (name, kind, and optional signature).
The text representation (file_to_text) works as follows:
- Symbols are sorted by kind then name
- Signatures are preferred over
kind namefallback when available - A total character budget of 1800 caps the output text — after the file path prefix is accounted for, remaining budget is filled with symbols until exhausted
- Files with no symbols produce a minimal
file <path> in <package>string
Embedding runs in a background thread spawned during the build, executing concurrently with post-build housekeeping:
- Files are processed in batches of 64 to balance throughput and memory
- A progress callback reports batch completion for progress bar updates
- Errors (model init, embedding, DB write) are reported on the progress bar rather than failing the build