How Probe Works
Probe combines efficient text search with code intelligence to find relevant code in your codebase.
System Architecture
Probe operates in six stages:
- Scan: File identification using ripgrep
- Parse: Code structure analysis via Abstract Syntax Trees
- Process: Query enhancement with NLP techniques
- Rank: Result prioritization
- Extract: Code block isolation
- Format: Output generation
Search Workflow
Here's how a search for "error handling" works:
Query Processing:
- Tokenize → [error, handling]
- Stem → [error, handl]
- Generate patterns →
\berror\b
,\bhandl\w*\b
- Parse query syntax → AND(error, handling)
File Scanning:
- Identify potential matches using ripgrep
- Filter based on .gitignore and custom patterns
- Exclude test files (unless specified otherwise)
Code Analysis:
- Parse matching files into ASTs
- Identify complete code blocks containing matches
- Extract metadata (function names, parameters, etc.)
Result Ranking:
- Calculate relevance scores
- Apply position weights for terms in identifiers
- Sort by combined relevance metrics
- Filter based on session cache (if enabled)
Block Extraction:
- Extract complete functions/methods with matches
- Merge related code blocks when appropriate
- Apply context lines if requested
Result Delivery:
- Format with syntax highlighting
- Apply token/size constraints
- Generate structured output
Rapid Scanning
The foundation of Probe's speed:
- Ripgrep Engine: Fast line scanning at core
- Parallel Processing: Utilizes all CPU cores
- Smart Filtering: Respects .gitignore patterns
- Stream Processing: Minimal memory footprint
- Incremental Matching: Stops scanning when limits are reached
Code Structure Parsing
Where Probe becomes more than just text search:
- Tree-sitter: Industry-standard parsing technology
- AST Generation: Builds complete code structure map
- Language-specific: Understands each language's unique patterns
- Robust Handling: Works with partial or imperfect code
- Symbol Resolution: Maps identifiers to declarations
Session-based Caching
Avoiding duplicate results across multiple searches:
- Unique Identifiers: Cache keys based on file path and line numbers
- Result Tracking: Remembers which blocks have been shown
- Session Management: Generates and maintains session IDs
- Cache Invalidation: Clears cache when appropriate
Output Strategies
Delivering results in the most useful format:
- Markdown/Syntax: Rich, readable code presentation
- JSON: Structured for programmatic use
- Token Limiting: Fits within AI context windows
- Priority Handling: Most relevant results survive limits
- Streaming: Real-time output for interactive use
Integration Architecture
How Probe connects with other tools:
MCP Server
- STDIO Transport: Communicates via standard input/output
- JSON Protocol: Structured message format
- Tool Definitions: Exposes search, query, and extract capabilities
Node.js SDK
- JS Bindings: JavaScript interface to core functionality
- Vercel AI SDK: Integration for streaming AI responses
- Promise-based: Async/await compatible API
- Type Definitions: TypeScript support
Web Interface
- Express Backend: Node.js server for API endpoints
- Vanilla JS Frontend: No framework dependencies
- Streaming Responses: Real-time AI output
- Markdown Rendering: Rich text and code formatting
For detailed information on specific features, see: