How to Run AI Locally on Windows

Updated May 2026
Running AI locally on Windows works well with any recent NVIDIA GPU. Install Ollama from ollama.com, make sure your NVIDIA drivers are current, and run a single command to download and start chatting with a local model. The entire setup takes under ten minutes, and Windows 10 and 11 are both fully supported.

Windows is the most common platform for local AI among gaming PC owners and general desktop users. Most Windows machines with a dedicated NVIDIA GPU already have the hardware needed to run AI models at good speeds. The setup process is straightforward: install Ollama, ensure your GPU drivers are up to date, and start running models. This guide covers the Windows-specific steps and considerations.

Check Your System Requirements

You need Windows 10 version 22H2 or later, or Windows 11. Check your version by pressing Windows key + R, typing winver, and pressing Enter.

Check your RAM by right-clicking the Start button, selecting System, and looking at the "Installed RAM" value. You need at least 8 GB for small models and 16 GB for 8B parameter models.

Check your GPU by pressing Windows key + X, selecting Device Manager, and expanding "Display adapters." If you see an NVIDIA GeForce RTX card (any generation from 2060 onward), you have a capable GPU for AI. Older GTX cards and AMD cards work too but with varying levels of support and performance.

If you have no dedicated GPU, Ollama will run on CPU only. This is slower but functional, especially for smaller models up to 8B parameters.

Update Your GPU Drivers

For NVIDIA GPUs, updated drivers are essential for proper CUDA support. Go to nvidia.com/drivers, enter your GPU model, and download the latest Game Ready or Studio driver. Run the installer, choosing "Express Installation" for the simplest path. Restart your computer after installation.

Alternatively, if you have NVIDIA GeForce Experience installed, open it and check the Drivers tab for available updates. Either method works, the important thing is having a recent driver that supports CUDA 11.8 or later.

For AMD GPUs, download the latest Adrenalin drivers from AMD's website. AMD GPU support in Ollama works through ROCm and has improved significantly, though NVIDIA remains more consistently reliable for AI workloads on Windows.

You can verify your CUDA version by opening Command Prompt and running nvidia-smi. The top-right corner of the output shows your CUDA version. You need 11.8 or higher.

Download and Install Ollama

Go to ollama.com and click the download button for Windows. Run the downloaded .exe installer. The setup wizard installs Ollama and registers it as a Windows service that starts automatically when your computer boots.

After installation completes, you may see a Windows Security or Windows Defender firewall prompt asking if you want to allow Ollama to communicate on private or public networks. Allow access on private networks so that Ollama can accept connections from Open WebUI and other local tools.

Verify the installation by opening Command Prompt (press Windows key, type cmd, and press Enter) and running ollama --version. You should see a version number confirming Ollama is installed correctly.

Run Your First Model

In Command Prompt or PowerShell, run:

ollama run qwen3:8b

Ollama downloads the model (approximately 4.9 GB, stored in C:\Users\YourName\.ollama\models by default) and starts a chat session. Type a message and press Enter to get a response. Type /bye to exit.

While the model is loaded, open a second Command Prompt window and run ollama ps to verify GPU acceleration is active. You should see the model listed with GPU as the processor. If it shows CPU instead, check that your NVIDIA drivers are up to date and that your GPU has enough VRAM for the model.

If you get an out-of-memory error, try a smaller model: ollama run phi4-mini (uses about 2.5 GB) works well on systems with limited RAM or VRAM.

Install Open WebUI for a Visual Interface

The command-line chat works, but most users prefer a browser-based interface. Open WebUI provides this and is easiest to install via Docker Desktop on Windows.

First, install Docker Desktop from docker.com. During installation, ensure the WSL 2 backend is selected (this is the default on modern Windows). You may need to enable virtualization in your BIOS if it is not already enabled, though most recent PCs have it on by default.

After Docker Desktop is installed and running, open Command Prompt or PowerShell and run the Open WebUI Docker command:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Once the container starts, open your browser to http://localhost:3000. Create an admin account, select your model from the dropdown, and start chatting with a polished visual interface.

Windows-Specific Tips and Considerations

Windows Defender interference: Windows Defender's real-time scanning can occasionally slow down model loading because it scans the large model files as they are read into memory. If model loading seems unusually slow, you can add the Ollama models directory (C:\Users\YourName\.ollama\models) to Windows Defender's exclusion list. Go to Windows Security, Virus and Threat Protection, Manage Settings, then Exclusions, and add the folder path.

Model storage location: If your C: drive is small, you can move model storage to another drive by setting the OLLAMA_MODELS environment variable. Go to System Properties, Advanced, Environment Variables, and add a new user variable named OLLAMA_MODELS with the path to your preferred location (for example, D:\ollama\models). Restart the Ollama service for the change to take effect.

Power plan: Windows power-saving modes can throttle CPU and GPU performance. For best AI inference performance, set your power plan to "High Performance" or "Best Performance" in Settings, System, Power. This is especially important on laptops where the default balanced plan may reduce inference speed significantly.

WSL 2 alternative: Some advanced users prefer running Ollama inside Windows Subsystem for Linux (WSL 2) rather than the native Windows version. WSL 2 provides a full Linux environment within Windows and supports NVIDIA GPU passthrough. This approach is useful if you are already comfortable with Linux or want to follow Linux-specific tutorials. Install WSL 2, install Ubuntu, and follow the Linux installation instructions for Ollama inside WSL.

Task Manager monitoring: While running a model, open Task Manager (Ctrl + Shift + Esc) and switch to the Performance tab. Watch the GPU utilization and dedicated GPU memory usage to confirm your model is using the GPU. The Memory tab shows system RAM usage, which should stay below your total RAM to avoid performance issues from paging.

Startup behavior: Ollama installs as a Windows service that starts automatically on boot. If you prefer to start it manually to save resources, open the Services application (search for "services" in the Start menu), find the Ollama service, change its startup type to Manual, and start it only when you need it. Alternatively, leave it running since the idle service uses minimal resources when no model is loaded.

Key Takeaway

Windows with an NVIDIA GPU is an excellent platform for local AI. Install Ollama, update your NVIDIA drivers, and run a model. Add Docker Desktop and Open WebUI for a visual chat interface comparable to cloud services.