OhMyCalc

Inference Latency Calculator

Estimate inference latency for ML models by analyzing compute-bound and memory-bound components.

How to Use the Inference Latency Calculator

  1. Enter model size in millions of parameters.
  2. Set batch size and sequence length.
  3. Specify GPU TFLOPS and memory bandwidth.
  4. Click Calculate for latency analysis.

حالات الاستخدام

الصيغة

Latency = max(Compute, Memory); Compute = 2·Params·Batch·Seq / TFLOPS

الأسئلة الشائعة

How accurate is this calculator?
Results are based on standard industry formulas and are suitable for preliminary estimates.
What units are used?
Standard IT units (requests/sec, ms, %, USD) are used unless otherwise noted.
Is it free?
Yes, all calculators are completely free.