What GPU Do I Need for AI Agents
The Answer Depends on Your Model Size
GPU selection for AI agents is driven almost entirely by one specification: VRAM (Video RAM) capacity. VRAM determines which models fit in GPU memory, and models that fit in VRAM run 5x to 20x faster than models that spill to system RAM. The most common mistake is buying a GPU with fast compute but insufficient VRAM, only to discover that the target model cannot load entirely on the card.
The VRAM rule of thumb is simple: at Q4 quantization (the most common format for local inference), a model needs approximately 0.5 GB per billion parameters for weights, plus 1 to 3 GB of overhead for KV-cache and runtime. A 7B model needs about 5 GB total. A 13B model needs about 8 GB. A 30B model needs about 18 GB. A 70B model needs about 38 GB.
NVIDIA vs AMD vs Apple Silicon
NVIDIA is the default recommendation for AI GPUs. The CUDA software ecosystem, Tensor Core acceleration, and near-universal framework compatibility make NVIDIA GPUs work out of the box with every major AI tool. If you want the simplest setup experience and broadest compatibility, choose NVIDIA.
AMD GPUs offer more VRAM per dollar in some cases (the RX 7900 XTX provides 24 GB for $600 to $700 versus $1,600 for the RTX 4090 with the same VRAM), but the ROCm software ecosystem requires more setup effort and has occasional compatibility gaps with specific frameworks and libraries. AMD is a good choice for experienced Linux users who value hardware cost savings.
Apple Silicon Macs use unified memory where system RAM serves as VRAM. An M4 Max MacBook Pro with 128 GB of unified memory can run 70B models that would require expensive discrete GPUs on PC. The trade-off is lower inference speed: Apple Silicon produces roughly one-third to one-fifth the tokens per second of an equivalent NVIDIA GPU due to lower memory bandwidth. Choose Apple Silicon when you need to run very large models on a single machine with minimal setup.
Budget-Based Recommendations
Under $200: Used GTX 1070 8 GB ($120) or GTX 1080 Ti 11 GB ($180). Handles 7B models at Q4 and Q8. The 1080 Ti is worth the extra $60 for the additional 3 GB of VRAM.
$200 to $500: RTX 3060 12 GB ($200 used) or RTX 4060 Ti 16 GB ($450 new). Both handle 7B to 13B models comfortably. The 3060 offers better VRAM value, the 4060 Ti offers newer Tensor Cores and lower power consumption.
$500 to $1,000: RTX 3090 24 GB ($700 to $800 used). This is the single best value GPU for AI. The 24 GB of VRAM handles 13B Q8, 30B Q4, and even 70B Q4 with partial offloading. Nothing else in this price range comes close.
$1,000 to $2,000: RTX 4090 24 GB ($1,600 new) or RTX 5090 32 GB ($2,000 new). The 4090 delivers 1.5x the inference speed of the 3090 with the same VRAM. The 5090 adds 8 GB of VRAM (32 GB total), enabling 30B Q8 models on a single card.
Over $2,000: Dual RTX 3090s ($1,500 used, 48 GB), A100 80 GB ($10,000 used), or cloud instances for occasional high-end workloads. At this level, evaluate whether building or renting makes more financial sense for your usage patterns.
Do Not Overbuy
The most common mistake in GPU selection is buying more GPU than you need based on aspirational rather than actual workloads. If you currently use 7B models and might someday want to try 70B models, buy for your current workload and rent cloud instances for the occasional experiment. An RTX 3060 12 GB at $200 handles your daily 7B workload, and a cloud A100 instance at $1.50 per hour covers the rare 70B experiment for far less than the cost of hardware that sits underutilized.
Start with the GPU that matches your current needs, validate that local inference meets your requirements, and upgrade when your actual workload demands it. The used GPU market makes upgrading straightforward: sell your current card and put the proceeds toward the next tier.
Match your GPU VRAM to your target model size: 8 GB minimum for 7B models, 12 to 16 GB for 13B models, 24 GB for 30B models, and 40 GB or more for 70B models. The NVIDIA RTX 3090 24 GB at $700 used is the best overall value for most AI workloads.