Hardware Options for Running Local LLMs 2025

Hardware Options for Running Local LLMs 2025

Explore the optimal hardware for running large language models (LLMs) locally, from entry-level edge devices to enterprise level.

List of hardware suitable for running local large language models (LLMs)


Entry-Level Edge Devices

  • Raspberry Pi AI Kit with Hailo-8L Accelerator

    • M.2 HAT+ & Hailo-8L Module: Pre-installed 13 TOPS AI accelerator for Raspberry Pi 5
    • Performance: 13 tera-operations per second (INT8)
    • Price: $70
  • Hailo-8 M.2 AI Acceleration Kit (26 TOPS)

    • 26 TOPS Performance: Double the compute power of Hailo-8L
    • PCIe Gen3 x4 Interface: 4-lane connection for maximum bandwidth
    • Thermal Design: Supports -40°C to 85°C operation with optional cooling fan
    • Compatibility: Requires PCIe-to-M.2 adapter for Raspberry Pi 5 (inclu---ded in kit)
    • Framework Support: TensorFlow, PyTorch, ONNX, Keras
    • Use Cases: Multi-model inference, high-resolution video analysis
    • Price: $140 (kit includes M.2 module, adapter, thermal pad)
    • Key Features:
      • 2.5W typical power consumption (8.65W max)
      • CE/FCC certified for industrial deployment
      • Supports simultaneous 4K@60fps video processing
    • Comparison vs. 13 TOPS Model:
      • 100% higher AI performance
      • 50% faster model inference times
      • 2× larger neural network capacity
  • NVIDIA Jetson Orin Nano Super

  1. Base Model (8GB)
  • Price: $249 (Developer Kit)
  • AI Performance: 40 TOPS (67 TOPS in Super Mode)
  • GPU: 1024-core Ampere (1.02 GHz Max)
  • CPU: 6-core Arm Cortex-A78AE @ 1.7 GHz
  • Memory: 8GB LPDDR5 (102 GB/s bandwidth in Super Mode)
  • Power: 7-25W configurable
  1. 4GB Lite Version
  • Price: $199
  • AI Performance: 34 TOPS (Max)
  • GPU: 512-core Ampere
  • Memory: 4GB LPDDR5 (51 GB/s)

Key Features:

  • Overclocking Capability: Existing owners can unlock "Super Mode" via software update (JetPack 6.2+)
  • Camera Support: 8x MIPI CSI-2 lanes for 4K60 video input
  • Real-World Performance:
    • 13B parameter LLM @ 18 tokens/sec (4-bit quantized)
    • 70% faster than previous-gen Jetson Nano

Specialized & Alternative Accelerators

  • Google Coral Edge TPU, Intel Neural Compute Stick 2
    • For tiny models and edge inference, but limited for LLMs
    • Approximate price: $40–$100

Performance Comparison Table (Entry-Level)

FeatureHailo-8 (26T) + RPi 5Hailo-8L (13T) + RPi 5NVIDIA Jetson Orin Nano Super
Price$140$70$249
AI Performance (TOPS)261367
Power Consumption2.5–8.65W1–5W7–25W
LLM Support3–5B params1–3B params7–13B params
Camera Streams4x 4K@30fps2x 4K@30fps8x 4K@30fps
Thermal DesignActive cooling optionPassive onlyActive cooling required

Mid-Range Laptops & Mini-PCs

  • AMD Ryzen 7 8745HS with 64GB RAM

    • 8 cores / 16 threads, Zen 4 (Hawk Point), 3.8–4.9 GHz
    • Integrated Radeon 780M (12 CUs, up to 4 TFLOPS)
    • Supports up to 256GB DDR5/LPDDR5x (dual channel, up to 120GB/s bandwidth)
    • 35–54W configurable TDP
    • Can run quantized LLMs up to 7B–13B parameters; performance is limited by memory bandwidth and iGPU speed, but 64GB RAM allows for larger context windows and models
    • For experimentation, local chatbots, and small-scale inference
    • NOTE: Currently there is missing Linux driver support
    • Approximate price (mini-PC): $440–$549 (barebones or with 16GB RAM; expect $700–$900 with 64GB RAM)

    AMD Ryzen 7 8745HS with 64GB RAM performance results

  • AMD Ryzen AI 9 HX 370 Systems

    • Architecture: Zen 5 (4 cores) + Zen 5c (8 cores), 24 threads
    • Clock Speeds: 2.0 GHz base, up to 5.1 GHz boost
    • NPU: XDNA 2 with 50 TOPS AI performance (80 TOPS combined with CPU/GPU)
    • GPU: Radeon 890M (16 RDNA 3.5 cores @ 2.9 GHz)
    • Memory: LPDDR5x-8000 (up to 256GB)
    • TDP: 28W configurable (15-54W range)
    • Performance Highlights
      • Handles 13B-30B parameter LLMs with 4-bit quantization
      • 80 TOPS total AI processing for real-time inference
  • AMD Ryzen AI 9 395+ Systems

    • Architecture: Zen 5 (16 cores/32 threads)
    • Clock Speeds: 3.0 GHz base, up to 5.1 GHz boost
    • NPU: XDNA 2 with 50 TOPS AI performance (126 TOPS combined with CPU/GPU)
    • GPU: Radeon 8060S (40 RDNA 3.5 CUs @ 2.9 GHz)
    • Memory: LPDDR5x-8000 (up to 128GB, 96GB allocatable as VRAM)
    • TDP: 45-120W configurable
    • PCIe Support: 4.0 x16 lanes
    • Thermal Design: Honeywell PTM7958 phase-change material + 6 heatpipes
    • Performance Highlights
      • Handles 70B+ parameter LLMs with 4-bit quantization
      • 29.7 TFLOPS GPU performance (rivals RTX 4070 Mobile)
      • 273 GB/s memory bandwidth via 256-bit bus
  • Apple M1/M2/M3/M4 (with 32–64GB RAM)

    • Unified memory, strong neural engine
    • Efficient for 7B–13B parameter models with frameworks like MLX
    • Approximate price: $1,800–$3,500 (varies by model and RAM)

Performance Comparison Table (Mid-Range)

FeatureAMD Ryzen 7 8745HSAMD Ryzen AI 9 HX 370AMD Ryzen AI 9 395+Apple M3 Systems
Price Range$700-$900$850-$1,500$1,699-$3,699$1,800-$3,500
CPU Cores8 Zen 412 (4P+8E Zen 5)16 Zen 58-10 Firestorm
NPU TOPSN/A505018
GPU Performance4 TFLOPS (780M)12 TFLOPS (890M)29.7 TFLOPS20 TFLOPS (M3 GPU)
Max RAM256GB256GB128GB128GB
LLM Support7B-13B13B-30B70B+7B-13B
Power Consumption35-54W15-54W45-120W20-40W

High-End Workstations

  • Apple Mac Studio with M3 Ultra (512GB Unified Memory)
    • M3 Ultra chip: up to 32-core CPU, 80-core GPU, 32-core Neural Engine
    • 512GB unified memory, 819GB/s memory bandwidth
    • Can load and run extremely large models, including 4-bit quantized LLMs exceeding 600B parameters
    • Real-world use: DeepSeek R1 (671B) runs locally using ~404GB storage and ~448GB RAM, at 17–18 tokens/sec, under 200W power
    • Ideal for researchers, developers, and organizations needing local, private AI processing for sensitive data
    • Robust I/O: Thunderbolt 5, HDMI 2.1, 10Gb Ethernet, Wi-Fi 6E, Bluetooth 5.3
    • Approximate price: Starts at $3,999; fully configured (32-core CPU, 80-core GPU, 512GB RAM, 16TB SSD): $14,099

Consumer GPUs (Desktops)

  • NVIDIA RTX 4060 Ti (16GB)

    • Entry-level for desktop LLMs, suitable for 7B–13B models
    • Approximate price: $449.99
  • NVIDIA RTX 3080 (12GB)

    • Good for 13B–22B parameter models (with quantization)
    • Approximate price (3080 new): $1,158; used: $400–$600
  • NVIDIA RTX 3090 (24GB)

    • Good for 13B–22B parameter models (with quantization)
      • Approximate price (3090 new): $1,400–$1,800; ✅ used: $800–$1,200
  • NVIDIA RTX 4080 (16GB)

    • Can handle 30B–40B parameter models with high performance
    • Approximate price (4080): $1,199–$1,299
  • NVIDIA RTX 4090 (24GB)

    • Can handle 30B–40B parameter models with high performance
    • Approximate price (4090): $1,599–$2,099 (varies by brand and availability)

Prosumer & Workstation GPUs

  • NVIDIA RTX 5090 (32GB)

    • Expected to offer higher VRAM and throughput for even larger models
    • Approximate price: Not officially released; expected $2,000–$2,500
  • NVIDIA RTX A6000 (48GB)

    • For professionals needing large context windows or fine-tuning
    • Approximate price: $4,000–$4,500 (varies by vendor)

Enterprise & Datacenter GPUs

  • NVIDIA A100 (40GB/80GB HBM2e)

    • Designed for 13B–70B parameter models at high speed
    • Approximate price (40GB): $23,699–$32,449 (varies by model and vendor)
    • Approximate price (80GB): $37,699
  • NVIDIA H100 (80GB HBM3)

    • Up to 2× A100 performance, 250–300 tokens/sec on large models
    • Approximate price: Starts at ~$25,000 per GPU; multi-GPU setups can exceed $400,000
  • AMD Instinct MI250 (128GB HBM2e)

    • High memory capacity, competitive for large LLMs
    • Approximate price: $14,364

NVIDIA DGX Systems

NVIDIA DGX systems are purpose-built AI supercomputers designed for the most demanding machine learning and large language model workloads.

  • NVIDIA DGX A100

    • 8× NVIDIA A100 GPUs (up to 640GB total GPU memory)
    • 6× NVIDIA NVSwitches for high-speed GPU interconnect
    • Dual 64-core AMD CPUs, 1TB system memory
    • 15TB Gen4 NVMe SSD storage
    • Approximate price: $149,000 (was $289,000 at launch)
  • NVIDIA DGX B200

    • 8× Blackwell B200 GPUs (each with ~180GB HBM3e, 1.44TB total GPU memory)
    • NVLink/NVSwitch for up to 1.8TB/s per GPU link
    • 2× Intel 8570 CPUs (112 CPU cores), 4TB DDR5 RAM
    • Approximate price: $515,410
  • NVIDIA DGX Station

    • Desktop AI supercomputer with NVIDIA GB300 Grace Blackwell Ultra chip
    • Up to 784GB unified system memory, 1× NVIDIA Blackwell Ultra GPU
    • Approximate price: $50,000–$70,000 (estimate; varies by configuration)
  • NVIDIA DGX Spark

    • Compact AI workstation built on NVIDIA GB10 Grace Blackwell Superchip
    • 20-core Arm CPU, Blackwell GPU architecture, 128GB unified system memory
    • NOTE: The 273 GB/s bandwidth, may limit the effective utilization of the 1000 AI TOPS computational capability for bandwidth-bound workloads.
    • Approximate price: $3,000–$4,000 (estimate; not widely available retail)
  • NVIDIA DGX Cloud
    • Fully managed AI platform providing DGX infrastructure as a service
    • Approximate price: Cloud subscription, typically $10–$30 per GPU/hour (varies by provider and GPU type)

NVIDIA DGX systems are widely used in enterprise AI, research labs, and cloud data centers to accelerate LLM development, fine-tuning, and deployment at scale.


NVIDIA DGX Spark Price

The NVIDIA DGX Spark has undergone significant price adjustments since its initial announcement"

Base Configuration

  • 1TB Storage Model: Originally announced at $3,000 under the "Project Digits" codename, this entry-level configuration is now rarely available directly from NVIDIA but may still be offered by third-party OEMs like ASUS or Dell.
  • 4TB Storage Model: Post-rebranding to DGX Spark, the primary retail configuration now starts at $3,999 for the 4TB NVMe SSD version. This reflects a 33% price increase compared to the original Project Digits pricing.

Market Variability

  • Retail Markups: Due to limited availability and high demand, reseller prices often exceed NVIDIA’s MSRP. Current market rates range from $4,200–$4,500 for the 4TB model.
  • Clustered Configurations: A pre-configured two-unit cluster with 200GbE RDMA networking and dual 4TB storage is priced at $8,500–$9,000, though this is not widely available to consumers.

Key Factors Influencing Cost

Technical Specifications

The DGX Spark’s pricing reflects its unique hardware:

  • GB10 Grace Blackwell Superchip: Combines a 20-core Arm CPU (10 Cortex-X925 + 10 Cortex-A725) and Blackwell GPU.
  • 128GB Unified LPDDR5x Memory: Provides 273 GB/s bandwidth, critical for running models up to 200 billion parameters.
  • ConnectX-7 Networking: Dual-port 200GbE enables scalable clustering for distributed AI workloads.

Competitive Positioning

  • Compared to consumer GPUs like the RTX 5090 ($1,999 MSRP), the DGX Spark targets developers needing enterprise-grade features (e.g., NVLink-C2C, preinstalled DGX OS).
  • Apple’s M3 Ultra Mac Studio (512GB Unified Memory, $14,099) outperforms the DGX Spark in raw memory bandwidth (819 GB/s) but lacks clustering capabilities.

Regional Pricing Examples

  • United Kingdom: Scan.co.uk lists the 4TB model at £3,699.98 (~$4,700 USD including VAT).
  • European Union: Nextron Denmark offers configurations starting at €4,200 (~$4,550 USD).

Long-Term Value Considerations

  • Energy Efficiency: At 170W TDP, the DGX Spark operates at 1/3 the power consumption of comparable desktop GPUs like the RTX 4090 (450W).
  • Software Stack: Includes NVIDIA’s full AI toolkit (CUDA, TensorRT, Triton), valued at $10,000+ if licensed separately for enterprise use.

Notes on AMD Ryzen 7 8745HS for LLMs

  • The Ryzen 7 8745HS is a modern laptop processor with a strong CPU and integrated Radeon 780M GPU.
  • With 64GB RAM, it can load and run quantized LLMs up to 13B parameters, though performance is limited by memory bandwidth and iGPU speed.
  • The architecture supports DDR5/LPDDR5x, which helps with bandwidth, but the iGPU is still much slower than dedicated GPUs for LLM inference.
  • For best results, use optimized frameworks (such as llama.cpp with Vulkan or ROCm backend) and focus on quantized models.
  • This configuration is for portable, low-power, or experimental LLM use, but not for high-throughput or large-scale deployments.

All price values are approximate and reflect typical retail or direct purchase prices in the United States as of May 2025.

Actual prices may vary by region, configuration, and market conditions.

Published on 5/26/2025