Hardware Options for Running Local LLMs 2025

Explore the optimal hardware for running large language models (LLMs) locally, from entry-level edge devices to enterprise level.

List of hardware suitable for running local large language models (LLMs)

Entry-Level Edge Devices

Raspberry Pi AI Kit with Hailo-8L Accelerator
- M.2 HAT+ & Hailo-8L Module: Pre-installed 13 TOPS AI accelerator for Raspberry Pi 5
- Performance: 13 tera-operations per second (INT8)
- Price: $70
Hailo-8 M.2 AI Acceleration Kit (26 TOPS)
- 26 TOPS Performance: Double the compute power of Hailo-8L
- PCIe Gen3 x4 Interface: 4-lane connection for maximum bandwidth
- Thermal Design: Supports -40°C to 85°C operation with optional cooling fan
- Compatibility: Requires PCIe-to-M.2 adapter for Raspberry Pi 5 (inclu---ded in kit)
- Framework Support: TensorFlow, PyTorch, ONNX, Keras
- Use Cases: Multi-model inference, high-resolution video analysis
- Price: $140 (kit includes M.2 module, adapter, thermal pad)
- Key Features:
  - 2.5W typical power consumption (8.65W max)
  - CE/FCC certified for industrial deployment
  - Supports simultaneous 4K@60fps video processing
- Comparison vs. 13 TOPS Model:
  - 100% higher AI performance
  - 50% faster model inference times
  - 2× larger neural network capacity
NVIDIA Jetson Orin Nano Super

Base Model (8GB)

Price: $249 (Developer Kit)
AI Performance: 40 TOPS (67 TOPS in Super Mode)
GPU: 1024-core Ampere (1.02 GHz Max)
CPU: 6-core Arm Cortex-A78AE @ 1.7 GHz
Memory: 8GB LPDDR5 (102 GB/s bandwidth in Super Mode)
Power: 7-25W configurable

4GB Lite Version

Price: $199
AI Performance: 34 TOPS (Max)
GPU: 512-core Ampere
Memory: 4GB LPDDR5 (51 GB/s)

Key Features:

Overclocking Capability: Existing owners can unlock "Super Mode" via software update (JetPack 6.2+)
Camera Support: 8x MIPI CSI-2 lanes for 4K60 video input
Real-World Performance:
- 13B parameter LLM @ 18 tokens/sec (4-bit quantized)
- 70% faster than previous-gen Jetson Nano

Specialized & Alternative Accelerators

Google Coral Edge TPU, Intel Neural Compute Stick 2
- For tiny models and edge inference, but limited for LLMs
- Approximate price: $40–$ 100

Performance Comparison Table (Entry-Level)

Feature	Hailo-8 (26T) + RPi 5	Hailo-8L (13T) + RPi 5	NVIDIA Jetson Orin Nano Super
Price	$140	$70	$249
AI Performance (TOPS)	26	13	67
Power Consumption	2.5–8.65W	1–5W	7–25W
LLM Support	3–5B params	1–3B params	7–13B params
Camera Streams	4x 4K@30fps	2x 4K@30fps	8x 4K@30fps
Thermal Design	Active cooling option	Passive only	Active cooling required

Mid-Range Laptops & Mini-PCs

AMD Ryzen 7 8745HS with 64GB RAM
- 8 cores / 16 threads, Zen 4 (Hawk Point), 3.8–4.9 GHz
- Integrated Radeon 780M (12 CUs, up to 4 TFLOPS)
- Supports up to 256GB DDR5/LPDDR5x (dual channel, up to 120GB/s bandwidth)
- 35–54W configurable TDP
- Can run quantized LLMs up to 7B–13B parameters; performance is limited by memory bandwidth and iGPU speed, but 64GB RAM allows for larger context windows and models
- For experimentation, local chatbots, and small-scale inference
- NOTE: Currently there is missing Linux driver support
- Approximate price (mini-PC): $440–$ 549 (barebones or with 16GB RAM; expect $700–$ 900 with 64GB RAM)
AMD Ryzen 7 8745HS with 64GB RAM performance results
AMD Ryzen AI 9 HX 370 Systems
- Architecture: Zen 5 (4 cores) + Zen 5c (8 cores), 24 threads
- Clock Speeds: 2.0 GHz base, up to 5.1 GHz boost
- NPU: XDNA 2 with 50 TOPS AI performance (80 TOPS combined with CPU/GPU)
- GPU: Radeon 890M (16 RDNA 3.5 cores @ 2.9 GHz)
- Memory: LPDDR5x-8000 (up to 256GB)
- TDP: 28W configurable (15-54W range)
- Performance Highlights
  - Handles 13B-30B parameter LLMs with 4-bit quantization
  - 80 TOPS total AI processing for real-time inference
AMD Ryzen AI 9 395+ Systems
- Architecture: Zen 5 (16 cores/32 threads)
- Clock Speeds: 3.0 GHz base, up to 5.1 GHz boost
- NPU: XDNA 2 with 50 TOPS AI performance (126 TOPS combined with CPU/GPU)
- GPU: Radeon 8060S (40 RDNA 3.5 CUs @ 2.9 GHz)
- Memory: LPDDR5x-8000 (up to 128GB, 96GB allocatable as VRAM)
- TDP: 45-120W configurable
- PCIe Support: 4.0 x16 lanes
- Thermal Design: Honeywell PTM7958 phase-change material + 6 heatpipes
- Performance Highlights
  - Handles 70B+ parameter LLMs with 4-bit quantization
  - 29.7 TFLOPS GPU performance (rivals RTX 4070 Mobile)
  - 273 GB/s memory bandwidth via 256-bit bus
Apple M1/M2/M3/M4 (with 32–64GB RAM)
- Unified memory, strong neural engine
- Efficient for 7B–13B parameter models with frameworks like MLX
- Approximate price: $1,800–$ 3,500 (varies by model and RAM)

Performance Comparison Table (Mid-Range)

Feature	AMD Ryzen 7 8745HS	AMD Ryzen AI 9 HX 370	AMD Ryzen AI 9 395+	Apple M3 Systems
Price Range	$700-$ 900	$850-$ 1,500	$1,699-$ 3,699	$1,800-$ 3,500
CPU Cores	8 Zen 4	12 (4P+8E Zen 5)	16 Zen 5	8-10 Firestorm
NPU TOPS	N/A	50	50	18
GPU Performance	4 TFLOPS (780M)	12 TFLOPS (890M)	29.7 TFLOPS	20 TFLOPS (M3 GPU)
Max RAM	256GB	256GB	128GB	128GB
LLM Support	7B-13B	13B-30B	70B+	7B-13B
Power Consumption	35-54W	15-54W	45-120W	20-40W

High-End Workstations

✅ Apple Mac Studio with M3 Ultra (512GB Unified Memory)
- M3 Ultra chip: up to 32-core CPU, 80-core GPU, 32-core Neural Engine
- 512GB unified memory, 819GB/s memory bandwidth
- Can load and run extremely large models, including 4-bit quantized LLMs exceeding 600B parameters
- Real-world use: DeepSeek R1 (671B) runs locally using ~404GB storage and ~448GB RAM, at 17–18 tokens/sec, under 200W power
- Ideal for researchers, developers, and organizations needing local, private AI processing for sensitive data
- Robust I/O: Thunderbolt 5, HDMI 2.1, 10Gb Ethernet, Wi-Fi 6E, Bluetooth 5.3
- Approximate price: Starts at $3,999; fully configured (32-core CPU, 80-core GPU, 512GB RAM, 16TB SSD):$ 14,099

Consumer GPUs (Desktops)

✅ NVIDIA RTX 4060 Ti (16GB)
- Entry-level for desktop LLMs, suitable for 7B–13B models
- Approximate price: $449.99
NVIDIA RTX 3080 (12GB)
- Good for 13B–22B parameter models (with quantization)
- Approximate price (3080 new): $1,158; used:$ 400–$600
NVIDIA RTX 3090 (24GB)
- Good for 13B–22B parameter models (with quantization)
  - Approximate price (3090 new): $1,400–$ 1,800; ✅ used: $800–$ 1,200
NVIDIA RTX 4080 (16GB)
- Can handle 30B–40B parameter models with high performance
- Approximate price (4080): $1,199–$ 1,299
NVIDIA RTX 4090 (24GB)
- Can handle 30B–40B parameter models with high performance
- Approximate price (4090): $1,599–$ 2,099 (varies by brand and availability)

Prosumer & Workstation GPUs

NVIDIA RTX 5090 (32GB)
- Expected to offer higher VRAM and throughput for even larger models
- Approximate price: Not officially released; expected $2,000–$ 2,500
NVIDIA RTX A6000 (48GB)
- For professionals needing large context windows or fine-tuning
- Approximate price: $4,000–$ 4,500 (varies by vendor)

Enterprise & Datacenter GPUs

NVIDIA A100 (40GB/80GB HBM2e)
- Designed for 13B–70B parameter models at high speed
- Approximate price (40GB): $23,699–$ 32,449 (varies by model and vendor)
- Approximate price (80GB): $37,699
NVIDIA H100 (80GB HBM3)
- Up to 2× A100 performance, 250–300 tokens/sec on large models
- Approximate price: Starts at ~ $25,000 per GPU; multi-GPU setups can exceed$ 400,000
AMD Instinct MI250 (128GB HBM2e)
- High memory capacity, competitive for large LLMs
- Approximate price: $14,364

NVIDIA DGX Systems

NVIDIA DGX systems are purpose-built AI supercomputers designed for the most demanding machine learning and large language model workloads.

NVIDIA DGX A100
- 8× NVIDIA A100 GPUs (up to 640GB total GPU memory)
- 6× NVIDIA NVSwitches for high-speed GPU interconnect
- Dual 64-core AMD CPUs, 1TB system memory
- 15TB Gen4 NVMe SSD storage
- Approximate price: $149,000 (was$ 289,000 at launch)
NVIDIA DGX B200
- 8× Blackwell B200 GPUs (each with ~180GB HBM3e, 1.44TB total GPU memory)
- NVLink/NVSwitch for up to 1.8TB/s per GPU link
- 2× Intel 8570 CPUs (112 CPU cores), 4TB DDR5 RAM
- Approximate price: $515,410
NVIDIA DGX Station
- Desktop AI supercomputer with NVIDIA GB300 Grace Blackwell Ultra chip
- Up to 784GB unified system memory, 1× NVIDIA Blackwell Ultra GPU
- Approximate price: $50,000–$ 70,000 (estimate; varies by configuration)
✅ NVIDIA DGX Spark
- Compact AI workstation built on NVIDIA GB10 Grace Blackwell Superchip
- 20-core Arm CPU, Blackwell GPU architecture, 128GB unified system memory
- NOTE: The 273 GB/s bandwidth, may limit the effective utilization of the 1000 AI TOPS computational capability for bandwidth-bound workloads.
- Approximate price: $3,000–$ 4,000 (estimate; not widely available retail)

NVIDIA DGX Cloud
- Fully managed AI platform providing DGX infrastructure as a service
- Approximate price: Cloud subscription, typically $10–$ 30 per GPU/hour (varies by provider and GPU type)

NVIDIA DGX systems are widely used in enterprise AI, research labs, and cloud data centers to accelerate LLM development, fine-tuning, and deployment at scale.

NVIDIA DGX Spark Price

The NVIDIA DGX Spark has undergone significant price adjustments since its initial announcement"

Base Configuration

1TB Storage Model: Originally announced at $3,000 under the "Project Digits" codename, this entry-level configuration is now rarely available directly from NVIDIA but may still be offered by third-party OEMs like ASUS or Dell.
4TB Storage Model: Post-rebranding to DGX Spark, the primary retail configuration now starts at $3,999 for the 4TB NVMe SSD version. This reflects a 33% price increase compared to the original Project Digits pricing.

Market Variability

Retail Markups: Due to limited availability and high demand, reseller prices often exceed NVIDIA’s MSRP. Current market rates range from $4,200–$ 4,500 for the 4TB model.
Clustered Configurations: A pre-configured two-unit cluster with 200GbE RDMA networking and dual 4TB storage is priced at $8,500–$ 9,000, though this is not widely available to consumers.

Key Factors Influencing Cost

Technical Specifications

The DGX Spark’s pricing reflects its unique hardware:

GB10 Grace Blackwell Superchip: Combines a 20-core Arm CPU (10 Cortex-X925 + 10 Cortex-A725) and Blackwell GPU.
128GB Unified LPDDR5x Memory: Provides 273 GB/s bandwidth, critical for running models up to 200 billion parameters.
ConnectX-7 Networking: Dual-port 200GbE enables scalable clustering for distributed AI workloads.

Competitive Positioning

Compared to consumer GPUs like the RTX 5090 ($1,999 MSRP), the DGX Spark targets developers needing enterprise-grade features (e.g., NVLink-C2C, preinstalled DGX OS).
Apple’s M3 Ultra Mac Studio (512GB Unified Memory, $14,099) outperforms the DGX Spark in raw memory bandwidth (819 GB/s) but lacks clustering capabilities.

Regional Pricing Examples

United Kingdom: Scan.co.uk lists the 4TB model at £3,699.98 (~$4,700 USD including VAT).
European Union: Nextron Denmark offers configurations starting at €4,200 (~$4,550 USD).

Long-Term Value Considerations

Energy Efficiency: At 170W TDP, the DGX Spark operates at 1/3 the power consumption of comparable desktop GPUs like the RTX 4090 (450W).
Software Stack: Includes NVIDIA’s full AI toolkit (CUDA, TensorRT, Triton), valued at $10,000+ if licensed separately for enterprise use.

Notes on AMD Ryzen 7 8745HS for LLMs

The Ryzen 7 8745HS is a modern laptop processor with a strong CPU and integrated Radeon 780M GPU.
With 64GB RAM, it can load and run quantized LLMs up to 13B parameters, though performance is limited by memory bandwidth and iGPU speed.
The architecture supports DDR5/LPDDR5x, which helps with bandwidth, but the iGPU is still much slower than dedicated GPUs for LLM inference.
For best results, use optimized frameworks (such as llama.cpp with Vulkan or ROCm backend) and focus on quantized models.
This configuration is for portable, low-power, or experimental LLM use, but not for high-throughput or large-scale deployments.

All price values are approximate and reflect typical retail or direct purchase prices in the United States as of May 2025.

Actual prices may vary by region, configuration, and market conditions.

Published on 5/26/2025