OhMyCalc

Inference Latency Calculator

Estimate inference latency for ML models by analyzing compute-bound and memory-bound components.

How to Use the Inference Latency Calculator

  1. Enter model size in millions of parameters.
  2. Set batch size and sequence length.
  3. Specify GPU TFLOPS and memory bandwidth.
  4. Click Calculate for latency analysis.

Casos de Uso

Fórmula

Latency = max(Compute, Memory); Compute = 2·Params·Batch·Seq / TFLOPS

Preguntas Frecuentes

How accurate is this calculator?
Results are based on standard industry formulas and are suitable for preliminary estimates.
What units are used?
Standard IT units (requests/sec, ms, %, USD) are used unless otherwise noted.
Is it free?
Yes, all calculators are completely free.