How Probe Works

Probe combines efficient text search with code intelligence to find relevant code in your codebase.

System Architecture

Probe operates in six stages:

Scan: File identification using ripgrep
Parse: Code structure analysis via Abstract Syntax Trees
Process: Query enhancement with NLP techniques
Rank: Result prioritization
Extract: Code block isolation
Format: Output generation

Search Workflow

Here's how a search for "error handling" works:

Query Processing:
- Tokenize → [error, handling]
- Stem → [error, handl]
- Generate patterns → \berror\b, \bhandl\w*\b
- Parse query syntax → AND(error, handling)
File Scanning:
- Identify potential matches using ripgrep
- Filter based on .gitignore and custom patterns
- Exclude test files (unless specified otherwise)
Code Analysis:
- Parse matching files into ASTs
- Identify complete code blocks containing matches
- Extract metadata (function names, parameters, etc.)
Result Ranking:
- Calculate relevance scores
- Apply position weights for terms in identifiers
- Sort by combined relevance metrics
- Filter based on session cache (if enabled)
Block Extraction:
- Extract complete functions/methods with matches
- Merge related code blocks when appropriate
- Apply context lines if requested
Result Delivery:
- Format with syntax highlighting
- Apply token/size constraints
- Generate structured output

Rapid Scanning

The foundation of Probe's speed:

Ripgrep Engine: Fast line scanning at core
Parallel Processing: Utilizes all CPU cores
Smart Filtering: Respects .gitignore patterns
Stream Processing: Minimal memory footprint
Incremental Matching: Stops scanning when limits are reached

Code Structure Parsing

Where Probe becomes more than just text search:

Tree-sitter: Industry-standard parsing technology
AST Generation: Builds complete code structure map
Language-specific: Understands each language's unique patterns
Robust Handling: Works with partial or imperfect code
Symbol Resolution: Maps identifiers to declarations

Session-based Caching

Avoiding duplicate results across multiple searches:

Unique Identifiers: Cache keys based on file path and line numbers
Result Tracking: Remembers which blocks have been shown
Session Management: Generates and maintains session IDs
Cache Invalidation: Clears cache when appropriate

Output Strategies

Delivering results in the most useful format:

Markdown/Syntax: Rich, readable code presentation
JSON: Structured for programmatic use
Token Limiting: Fits within AI context windows
Priority Handling: Most relevant results survive limits
Streaming: Real-time output for interactive use

Integration Architecture

How Probe connects with other tools:

MCP Server

STDIO Transport: Communicates via standard input/output
JSON Protocol: Structured message format
Tool Definitions: Exposes search, query, and extract capabilities

Node.js SDK

JS Bindings: JavaScript interface to core functionality
Vercel AI SDK: Integration for streaming AI responses
Promise-based: Async/await compatible API
Type Definitions: TypeScript support

Web Interface

Express Backend: Node.js server for API endpoints
Vanilla JS Frontend: No framework dependencies
Streaming Responses: Real-time AI output
Markdown Rendering: Rich text and code formatting

For detailed information on specific features, see:

How Probe Works ​

System Architecture ​

Search Workflow ​

Rapid Scanning ​

Code Structure Parsing ​

Session-based Caching ​

Output Strategies ​

Integration Architecture ​

MCP Server ​

Node.js SDK ​

Web Interface ​