Passioni - Armando Passaro

Passioni › Informatica › AI e GPU

Guida: Benchmark GPU — misurare le prestazioni

04/03/2026 21:16

Testare e confrontare le GPU del homelab

Prima di scegliere quale GPU assegnare a quale workload, è utile misurarne le prestazioni con benchmark specifici per AI.

1. llama-bench (llama.cpp)

# Il benchmark standard per inferenza LLM
./llama-bench -m modello.gguf -ngl 99 -t 8

# Output: tokens/secondo per prompt processing e generation

2. Benchmark con Ollama

# Misurare velocità di generazione
time ollama run llama3.2 "Scrivi un paragrafo sulla sicurezza informatica" --verbose

# L output verbose mostra:
# - prompt eval rate: X tokens/s
# - eval rate: X tokens/s

3. CUDA bandwidth test

# Misurare banda memoria GPU
/usr/local/cuda/extras/demo_suite/bandwidthTest
# Risultato: GB/s per Host to Device, Device to Host, Device to Device

4. Risultati tipici nell'infrastruttura

GPU	Modello LLM	Tokens/s
Tesla P4	Llama 3.2 8B Q4	~15-20
Tesla P100	Llama 3.2 8B Q4	~25-35
RTX 3060	Llama 3.2 8B Q4	~35-45
Tesla P100	Mistral 7B Q4	~30-40
2x RTX 3060	Llama 3.2 70B Q4	~8-12

5. Stress test GPU

# Con gpu-burn
git clone https://github.com/wilicc/gpu-burn
cd gpu-burn && make
./gpu_burn 60  # stress test 60 secondi
# Monitora temperatura con nvidia-smi in parallelo

I benchmark aiutano a decidere quale GPU assegnare a quale workload: le P4 per task leggeri, le P100 per modelli medi, le RTX 3060 per inferenza veloce.