LLM VRAM & Time Estimator\nPlan memory & runtime for full finetune, LoRA, or RL (GRPO).

Hugging Face Model ID (e.g., meta-llama/Llama-2-7b-hf)

LoRA Rank (r)

1 256

LoRA Alpha (scaling)

1 256

Target module name filters (comma-separated)

Show per-layer LoRA table

Max Sequence Length

128 16384

Micro-batch Size (per GPU)

1 1024

Gradient Accumulation Steps

1 1024

World Size (number of GPUs)

1 256

DeepSpeed ZeRO Stage

None ZeRO-1 ZeRO-2 ZeRO-3

Enable FSDP (Fully Sharded Data Parallel)

FSDP Auto Wrap Policy

CPU Offload (optimizer/params)

Temp AllGather Overhead (fraction)

0 0.5

Safety Headroom (fraction)

0 0.5

Total Tokens to Train (Billions)

0.1 200

Throughput (tokens/sec) per GPU

1000 500000

Cost per GPU-hour (USD)

0 100

VRAM Breakdown (after sharding + overhead)

Per-layer LoRA Params

Raw VRAM JSON

Raw Time/Cost JSON