OhMyCalc

Inference Latency Calculator

Estimate inference latency for ML models by analyzing compute-bound and memory-bound components.

How to Use the Inference Latency Calculator

  1. Enter model size in millions of parameters.
  2. Set batch size and sequence length.
  3. Specify GPU TFLOPS and memory bandwidth.
  4. Click Calculate for latency analysis.

使用例

計算式

Latency = max(Compute, Memory); Compute = 2·Params·Batch·Seq / TFLOPS

よくある質問

How accurate is this calculator?
Results are based on standard industry formulas and are suitable for preliminary estimates.
What units are used?
Standard IT units (requests/sec, ms, %, USD) are used unless otherwise noted.
Is it free?
Yes, all calculators are completely free.