KV Cache Quantization Benchmark

Results for Qwen3-30B on LongBench-v2

View Project on GitHub

Full Benchmark (16 Combinations)

Accuracy vs. Speed from the first benchmark run.

Synchronous Combinations (3-Test Average)

X-axis: Speed (left=better) | Y-axis: Accuracy (top=better) | Bubble Size: VRAM (small=better).