The Challenge of Large Commits

Modern development often involves commits that change many files. Whether it's a major feature addition, a refactoring effort, or merging a long-running branch, large commits can exceed the size limits of AI providers. AI Diff Review solves this with intelligent batching that automatically splits large commits into manageable chunks.

How Batching Works

When a commit or selected scope exceeds the active budget (256 KB for cloud providers, 128 KB for local), AI Diff Review automatically splits the request into multiple batches. But it doesn't just split randomly—it uses a sophisticated relatedness-first strategy to keep related files together, improving analysis quality.

Relatedness Strategy

The batching system uses lightweight heuristics to determine which files are related:

Import and Package References

Files that import symbols defined in other changed files are considered related. For example, if UserService.java imports User from User.java, and both are changed, they'll be grouped together.

Symbol Mentions

Files that mention class, object, or function names defined in other changed files are grouped together. This captures relationships even when explicit imports aren't present.

Directory Proximity

Files in the same package or directory are considered weakly related, serving as a tie-breaker when other relationships aren't clear.

Clustering Algorithm

The batching system builds an undirected graph of files connected by the relationships above, then computes connected components as initial groups. This ensures that files that reference each other stay together in the same batch.

Packing Strategy

Groups are placed into batches using a first-fit-decreasing algorithm by estimated request size. This:

  • Preserves groups when possible (keeps related files together)
  • Maximizes batch size efficiency
  • Ensures each batch stays under the size limit

Oversized Group Handling

If a single group of related files exceeds the budget, it's split by breadth-first search (BFS) order while preserving high-weight edges. This means the strongest relationships are maintained even when splitting is necessary.

User Experience

When batching occurs, the tool window shows progress as "Batch X/Y", making it clear that multiple batches are being processed. Results are merged seamlessly across batches, so you see a unified view of all findings. Sorting, filtering, and search work across all results as if they came from a single analysis.

Default Batch Sizes

The system prefers batches of 3-7 files each when many files are present, but exact sizes vary based on:

  • Per-file diff and content size
  • Provider-specific size caps
  • Relatedness relationships

This balance ensures efficient processing while maintaining analysis quality.

Benefits of Intelligent Batching

Better Analysis Quality

By keeping related files together, the AI can understand context and relationships, leading to more accurate findings. For example, if you change both a model class and its service, analyzing them together helps the AI understand the full impact of your changes.

Automatic Handling

You don't need to manually split commits or worry about size limits. The plugin handles it automatically, so you can focus on your code.

Seamless Experience

Results are merged automatically, so you see a unified view regardless of how many batches were needed. You don't need to manually combine results or switch between different analyses.

When Batching Occurs

Batching is triggered when:

  • A commit changes many files (typically 10+ files)
  • Files are large (even a few large files can trigger batching)
  • Full file content is included (increases size significantly)
  • The combined size exceeds provider limits

Best Practices

Keep Commits Focused

While batching handles large commits well, smaller, focused commits are still better for code review and understanding. Consider splitting large changes into logical commits when possible.

Trust the System

The batching algorithm is designed to preserve relationships and maintain analysis quality. Trust that related files will be analyzed together, even across batches if necessary.

Review All Results

Since results are merged, make sure to review findings from all batches. The unified view makes this easy, but be aware that large commits may have many findings across multiple batches.

Technical Details

The batching system uses a fast request-size estimator that approximates JSON-wrapped payload size to stay under the cap per batch. This estimation happens before sending, ensuring efficient use of provider limits.

Conclusion

Intelligent batching makes it possible to analyze large commits without manual intervention. By keeping related files together and automatically managing batch sizes, AI Diff Review ensures comprehensive analysis regardless of commit size.

The relatedness-first strategy maintains analysis quality while the automatic splitting ensures you can analyze commits of any size. Combined with seamless result merging, batching provides a smooth experience even for the largest changes.

Ready to analyze large commits? Install AI Diff Review and see how it handles your largest changes.