MCG — Modular Compute Governor

Engine tuning for AI models. MCG analyses neural networks across architectures and identifies which parts are unnecessary, then removes them. The result is a standard model with fewer layers.

Shut off the cylinders you don't need

Like a modern engine that deactivates cylinders in city driving, MCG discovers which "cylinders" in a neural network can be shut off, and which ones must stay active. Every model has a different structure, so every model reveals different savings. Unlike compression, nothing is damaged: the remaining parts keep their full original precision.

Most approaches to efficient AI start from a finished model and try to make it smaller, pruning weights, reducing precision, or distilling into a smaller model. MCG takes a fundamentally different approach: it identifies redundancy during training itself.

Learns what matters

During training, MCG automatically discovers which parts of the network are critical and which are structurally redundant. Each architecture reveals a different optimal structure, there are no universal shortcuts.

Verified layer removal

Parts identified as low-contribution can be physically removed from the original model at inference time, no retraining required. The result is a standard model with fewer layers.

Architecture-agnostic

Works on CNNs (ResNet, WideResNet), Vision Transformers (ViT), and LLMs (TinyLlama, Qwen, Llama). Verified across multiple architectures and scales from 11M to 72B parameters.

Additive, stacks with existing tools

MCG has been verified alongside quantization (4-bit) and LoRA. Layer removal compounds with your existing optimization stack.

Tested across scales

During a short analysis phase, the model reveals which layers are critical and which are redundant. All results verified and reproducible.

Compute reduction during governed training

Verified on models up to 14B parameters. FLOP savings and quality measured during the governance phase vs. the unmodified dense baseline.

ArchitectureParametersFLOP reductionQuality vs. baseline
ResNet-1811M55%Preserved
WideResNet-28-1036M47%Preserved
ViT-B/1686M78%Preserved
TinyLlama1.1B48%Improved
Qwen-3B3B51%Preserved
Qwen-7B7B48%Improved
Llama-3-8B8B40%Preserved
Qwen-14B14B35%Improved

72B: four layers removed, zero quality loss

On a 72-billion parameter model (80 layers), MCG identified four layers that can be removed together, with no quality degradation. MMLU score actually improved by 0.1 percentage points. A second independent run confirmed the result: different layers identified, same outcome.

Verified on Qwen-72B-Instruct. Two independent seeds. Layer removal applied to the original, unmodified model, no retraining required.

MCG vs. existing approaches

MCG is not pruning. It does not remove weights or reduce precision. It governs compute allocation, a fundamentally different approach that preserves quality where other methods fail.

Pruning and compression

Existing methods remove or simplify weights after training. At 30%+ reduction, generation-based benchmarks collapse. The model loses coherent multi-step ability. Quality always degrades.

MCG governance

MCG identifies structural redundancy during training. The original model is then physically reduced based on what MCG discovered. All weights in the remaining layers stay intact. A different paradigm, not a better pruning method.

TTU RouterCoF AuditFull stack