MCG — Modular Compute Governor
Engine tuning for AI models. MCG analyses neural networks across architectures and identifies which parts are unnecessary, then removes them. The result is a standard model with fewer layers.
Shut off the cylinders you don't need
Like a modern engine that deactivates cylinders in city driving, MCG discovers which "cylinders" in a neural network can be shut off, and which ones must stay active. Every model has a different structure, so every model reveals different savings. Unlike compression, nothing is damaged: the remaining parts keep their full original precision.
Most approaches to efficient AI start from a finished model and try to make it smaller, pruning weights, reducing precision, or distilling into a smaller model. MCG takes a fundamentally different approach: it identifies redundancy during training itself.
Learns what matters
During training, MCG automatically discovers which parts of the network are critical and which are structurally redundant. Each architecture reveals a different optimal structure, there are no universal shortcuts.
Verified layer removal
Parts identified as low-contribution can be physically removed from the original model at inference time, no retraining required. The result is a standard model with fewer layers.
Architecture-agnostic
Works on CNNs (ResNet, WideResNet), Vision Transformers (ViT), and LLMs (TinyLlama, Qwen, Llama). Verified across multiple architectures and scales from 11M to 72B parameters.
Additive, stacks with existing tools
MCG has been verified alongside quantization (4-bit) and LoRA. Layer removal compounds with your existing optimization stack.
Tested across scales
During a short analysis phase, the model reveals which layers are critical and which are redundant. All results verified and reproducible.
Compute reduction during governed training
Verified on models up to 14B parameters. FLOP savings and quality measured during the governance phase vs. the unmodified dense baseline.
| Architecture | Parameters | FLOP reduction | Quality vs. baseline |
|---|---|---|---|
| ResNet-18 | 11M | 55% | Preserved |
| WideResNet-28-10 | 36M | 47% | Preserved |
| ViT-B/16 | 86M | 78% | Preserved |
| TinyLlama | 1.1B | 48% | Improved |
| Qwen-3B | 3B | 51% | Preserved |
| Qwen-7B | 7B | 48% | Improved |
| Llama-3-8B | 8B | 40% | Preserved |
| Qwen-14B | 14B | 35% | Improved |
72B: four layers removed, zero quality loss
On a 72-billion parameter model (80 layers), MCG identified four layers that can be removed together, with no quality degradation. MMLU score actually improved by 0.1 percentage points. A second independent run confirmed the result: different layers identified, same outcome.
Verified on Qwen-72B-Instruct. Two independent seeds. Layer removal applied to the original, unmodified model, no retraining required.
MCG vs. existing approaches
MCG is not pruning. It does not remove weights or reduce precision. It governs compute allocation, a fundamentally different approach that preserves quality where other methods fail.
Pruning and compression
Existing methods remove or simplify weights after training. At 30%+ reduction, generation-based benchmarks collapse. The model loses coherent multi-step ability. Quality always degrades.
MCG governance
MCG identifies structural redundancy during training. The original model is then physically reduced based on what MCG discovered. All weights in the remaining layers stay intact. A different paradigm, not a better pruning method.