Suture Performance Baseline
Date: 2026-05-12 Platform: Linux x86_64, release profile (optimized) Toolchain: Rust stable, Criterion.rs 0.5
1. DAG Operations
dag_perf_commit/commit_1000_filesdag_perf_log/log_1000_commitsdag_perf_log/log_10000_commitsdag_perf_merge/merge_100_filesThresholds
2. Semantic Merge (JSON)
semantic_merge_perf/json_small_10_fieldssemantic_merge_perf/json_large_100_fieldssemantic_merge_perf/json_conflict/10semantic_merge_perf/json_conflict/100Thresholds
3. CAS Operations
cas_perf_store/store_100_blobs/1KBcas_perf_store/store_100_blobs/10KBcas_perf_store/store_100_blobs/100KBcas_perf_store/store_100_blobs/1MBcas_perf_lookup/store_1000_lookup_100Thresholds
4. Patch Serialization
patch_perf_serialize/serialize_100_patchespatch_perf_deserialize/deserialize_100_patchesThresholds
5. Hub Operations
hub_perf_repo/create_100_reposhub_perf_push_pull/push_50_patcheshub_perf_push_pull/pull_50_patcheshub_perf_push_pull/push_pull_roundtrip_50Thresholds
6. Historical Benchmark Results (from existing suite)
These results are from the pre-existing benchmark suite in benchmarks.rs:
blake3_hashing/hash/1024blake3_hashing/hash/102400dag_insertion/linear_chain/1000repo_add_commit/commit_n_files/1000repo_merge/merge_cleansemantic_merge_json/merge/1000hub_storage/push_n_patches_blobs/1000diff_large/patience_diff_1k_linescompress_decompress/compress_1MBTop 3 Bottlenecks
1. Log walk at scale FIXED in v3.2.1
The repo_log timeout at 10,000 commits was caused by commit() calling snapshot_uncached() after every commit, which replayed the entire patch chain from root to tip (O(n) per commit → O(n²) total). Fixed by computing the file tree incrementally: load the parent's cached tree from SQLite (O(1)) and apply only the new patch's changes (O(k) where k = files in the batch).
Before: 10,000 commits → >600s (timeout). After: 10,000 commits → 8.5s (70x faster).
2. Commit 1000 files — 652 ms (repo_add_commit vs dag_perf_commit)
The full repo add + commit path for 1000 files takes 652 ms (~0.65 ms/file), while the DAG-level commit benchmark takes 97.8 ms. The ~6.7x overhead comes from per-file filesystem I/O: reading files, hashing, writing blobs to CAS, and updating the staging index.
Recommendation: Batch filesystem reads with parallel hashing (rayon). Defer CAS writes until commit time. Use mtime/size stat cache to skip unchanged files.
3. Patience diff on large files — 30 ms for 1K lines
The patience diff algorithm takes 30 ms for a 1000-line file with 10% changes. This scales poorly — a 10K-line file would take ~300 ms, which is perceptible.
Recommendation: Profile to determine if the bottleneck is LCS computation or unique-line fingerprinting. Consider the imara-diff crate for large-file diffing, or switch to Myers' algorithm for files above a threshold.
Quick-Win Optimizations Applied
#[inline]on hot-path methods — Added toTouchSet::intersects,
TouchSet::len, TouchSet::iter, TouchSet::insert, TouchSet::contains, DagNode::id, PatchDag::patch_count, and hash_bytes (was already inlined).
- Removed redundant
from_utf8inhash_with_context— The function
already takes &str, so from_utf8(context.as_bytes()) was a no-op conversion that could fail unnecessarily.
- Eliminated duplicate method definitions — Removed accidental duplicate
len, iter, insert on TouchSet and has_patch on PatchDag that were causing code bloat.
Binary Size
dev profile)opt-level = 3 onlystrip after pre-optimization buildopt-level = 3, lto = true, codegen-units = 1, panic = "abort", strip = true)Optimizations Applied
- LTO (Link-Time Optimization):
lto = true— enables cross-crate inlining and dead code elimination - Single codegen unit:
codegen-units = 1— better optimization at the cost of slower compilation - Panic = abort:
panic = "abort"— removes unwinding machinery, saves ~50-100 KB - Strip:
strip = true— removes debug info and symbol tables
Build Times
Benchmark Files
benches/benchmarks.rsbenches/dag_perf.rsbenches/semantic_merge_perf.rsbenches/cas_perf.rsbenches/hub_perf.rs7. Comprehensive Merge Benchmarks (v5.1.0)
Date: 2026-05-01 Platform: Linux x86_64, release profile, Criterion.rs 0.5 Method: Median of 50 iterations
JSON
YAML
TOML
CSV
XML
Key Takeaways
- JSON is the fastest format — <1ms for typical files (up to 1,000 keys)
- All formats merge in <5ms for typical config files (<100 keys)
- CSV is slowest at scale due to row-based parsing — O(rows × cols)
- Conflict detection is fast (<100µs for most formats) because it only needs to find the first mismatch
- JSON scales sub-linearly past 1K keys (likely due to serde_json's optimized parser)
7. Recent Optimizations (v5.4.0)
7.1 Merge Engine (2026-05-12)
output_lines() returns &[String] instead of Vec<String>engine/merge.rsengine/merge.rsformat!() allocation for <<<<<<<, =======, >>>>>>> markers on every conflict region.diff_trees() uses sort_by instead of sort_by_keyengine/diff.rs7.2 Stash Operations (2026-05-12)
repository/repo_impl.rsHashSet<String> of staged paths before the head-tree loop. Replaces `files.iter().any( (linear scan per file) with hashset.contains(path) (O(1)). For repos with many tracked files, this eliminates quadratic behavior during suture stash push`.