logo
All projects
2024 · Active

oqab

High-performance CLI search utility in Rust. Three search modes — file find, content grep, and fuzzy match — backed by a multi-threaded worker pool with composable filters.

RustCrossbeamRegexClapFuzzy-Matcher
View on GitHub
Overview

Oqab (Arabic for "eagle") is a CLI search utility I built in Rust for finding files and content across large, deeply nested directory trees. The core problem it solves is composability: tools like find and grep work fine in isolation but composing them for multi-criteria queries — filter by extension, search content, constrain by size and date, run in parallel — gets unwieldy fast. Oqab provides a single binary with three distinct search modes, a pluggable filter pipeline, and a multi-threaded worker pool, all configured from a coherent CLI or a reusable JSON config file.

Key features
01Three search modes dispatched at runtime: SearchCommand for file discovery by name and extension, GrepCommand for regex content search across matched files, and FuzzyCommand for approximate name matching using fuzzy-matcher.
02Multi-threaded WorkerPool with two separate MPSC channels — one for directories, one for files. Workers drain the directory queue first so subdirectories fan out across threads immediately, keeping all cores busy on deep trees.
03TraversalStrategy trait with three implementations: DefaultTraversalStrategy (hidden file/dir filtering), RegexTraversalStrategy (include/exclude path patterns), and CompositeTraversalStrategy (AND-chains multiple strategies). Swapped at runtime with no changes to the search core.
04CompositeFilter with AND and OR logic and a three-value FilterResult enum: Accept, Reject, and Prune. Prune signals that an entire directory subtree can be skipped, short-circuiting traversal before any filesystem reads.
05Observer pattern for decoupled progress reporting: TrackingObserver collects matched paths behind a Mutex<Vec<PathBuf>>, ProgressReporter logs every 100 files and every 50 directories, SilentObserver counts silently. Swapped without touching search logic.
06JSON config file support: FileSearchConfig is fully serde-serializable. Save any query with --save-config, reload it with --config. CLI flags always take precedence over loaded values through an explicit merge step.

Command Routing

main.rs resolves the right command by inspecting the parsed config after arguments are processed. If --grep is set, it dispatches GrepCommand for content search. If --fuzzy is set, it dispatches FuzzyCommand for approximate name matching. Otherwise it falls through to SearchCommand for standard file discovery. Showing help is also a command — HelpCommand — rather than a special case scattered through the logic.

This means each command owns its entire execution path. SearchCommand handles file traversal and metrics. GrepCommand first runs a file traversal to build the candidate list, then opens each file with BufReader and scans lines with a compiled regex. Neither knows about the other.

src/main.rs
fn create_command(config: &FileSearchConfig) -> Result<Box<dyn Command + '_>> {
    if config.help || (config.file_extension.is_none()
        && config.file_name.is_none()
        && config.pattern.is_none())
    {
        return Ok(Box::new(HelpCommand::new()));
    }
    if config.pattern.is_some() {
        return Ok(Box::new(GrepCommand::new(config)));
    }
    if config.fuzzy {
        return Ok(Box::new(FuzzyCommand::new(config)));
    }
    Ok(Box::new(SearchCommand::new(config)))
}

Worker Pool

The WorkerPool spins up N threads and gives them two receivers: one for directories, one for files. Each thread checks the directory channel first, then the file channel. When a worker picks up a directory it runs the directory_consumer closure — which calls process_directory, discovers subdirectories, and submits them back into the directory channel. Found files are forwarded to the file channel and consumed by the file_consumer closure, which runs the filter registry and notifies observers.

Shutdown works via an AtomicBool and a Done sentinel. Calling complete() sends Done into both channels. The first worker to receive Done forwards it so every other worker eventually sees it and exits. The Drop impl stores true into the stopped flag and calls complete() again, so the pool cleans up even if join() is never called.

src/core/worker.rs
pub enum WorkerMessage {
    Directory(PathBuf),
    File(PathBuf),
    Done,
}

// Each worker: drain directories first, then files
loop {
    let dir_msg = directory_rx.lock().ok()?.try_recv();
    match dir_msg {
        Ok(WorkerMessage::Directory(dir)) => { directory_consumer(dir); }
        Ok(WorkerMessage::Done) => {
            directory_tx.send(WorkerMessage::Done).ok();
            break;
        }
        _ => {}
    }
    let file_msg = file_rx.lock().ok()?.try_recv();
    match file_msg {
        Ok(WorkerMessage::File(file)) => { file_consumer(file); }
        _ => { thread::sleep(timeout); }
    }
}

Filter Pipeline

Every filter implements a single trait with one method: filter(&Path) -> FilterResult. FilterResult has three variants: Accept passes the file, Reject drops it, and Prune drops it and signals to the traversal layer that the whole subtree under this path can be skipped — no further reads needed.

CompositeFilter holds a Vec<Box<dyn Filter>> and a FilterOperation (And or Or). In And mode it short-circuits on the first non-Accept result. In Or mode it short-circuits on the first Accept. There is also TypedCompositeFilter<F1, F2> for zero-cost static dispatch when both filter types are known at compile time. The factory assembles the right combination from the config — extension, name, regex, min/max size, newer-than, older-than — and hands it to FileFinder as a FilterRegistry.

src/filters/composite.rs
impl Filter for CompositeFilter {
    fn filter(&self, path: &Path) -> FilterResult {
        match self.operation {
            FilterOperation::And => {
                for filter in &self.filters {
                    match filter.filter(path) {
                        FilterResult::Accept => continue,
                        other => return other, // Reject or Prune short-circuits
                    }
                }
                FilterResult::Accept
            }
            FilterOperation::Or => {
                for filter in &self.filters {
                    if let FilterResult::Accept = filter.filter(path) {
                        return FilterResult::Accept;
                    }
                }
                FilterResult::Reject
            }
        }
    }
}

Content Search

GrepCommand runs in two phases. First it calls search_directory to collect every file that passes the file-level filters (extension, name, size, date). Then it iterates those files, opens each with BufReader::lines(), and tests every line against a compiled Regex. The regex is built once with RegexBuilder, which takes the ignore_case flag, so case-insensitive matching has no per-line overhead.

Permission-denied errors are silently skipped — the tool is designed to be run against directories you only partially own, like a home directory with some protected paths. Other IO errors on line reads are also skipped per line rather than aborting the file. Output is colored with the console crate: filenames in bold cyan, line numbers in green.

src/commands/grep.rs
fn search_file(&self, path: &Path, regex: &Regex) -> Result<Vec<(usize, String)>> {
    let file = match File::open(path) {
        Ok(f) => f,
        Err(e) if e.kind() == ErrorKind::PermissionDenied => return Ok(vec![]),
        Err(e) => return Err(e).context(format!("open: {}", path.display())),
    };
    let mut matches = vec![];
    for (i, line) in BufReader::new(file).lines().enumerate() {
        let Ok(line) = line else { continue }; // skip encoding errors
        if regex.is_match(&line) {
            matches.push((i + 1, line));
        }
    }
    Ok(matches)
}

CLI & Config

The CLI is built with Clap's derive API. The positional QUERY argument is parsed with smart detection: if it contains glob characters (* ? [) it's treated as a name pattern; if it contains a dot it's split into name and extension; otherwise it's used as a plain name filter. This means oqab main.rs, oqab *.log, and oqab config all do the expected thing without extra flags.

FileSearchConfig implements serde Serialize and Deserialize, so --save-config writes the fully resolved config (after CLI/file merge) to disk as JSON. --config loads it back. When both are present, CLI args win via a selective_apply_to_config pass that only overwrites fields that were explicitly set on the command line, leaving loaded values intact for everything else.

Usage examples
# Find all Rust files with "unsafe" modified after 2024-01-01
oqab --grep "unsafe" --ext rs --newer-than 2024-01-01

# Fuzzy-match file names against "cnfg"
oqab --fuzzy cnfg --path ./src

# Search with 8 threads, skip hidden files
oqab --ext log --path /var/log --workers 8

# Save this query for later
oqab --ext ts --grep "TODO" --save-config todo-scan.json

# Reload it, override path only
oqab --config todo-scan.json --path ./packages/api