Getting Started with Ollama
Ollama's CLI is designed to be intuitive and requires no configuration to get started. The commands follow a consistent pattern, and most common tasks require just one or two commands. Let us walk through everything you need to know to use Ollama productively.
Run Your First Model
Open your terminal and type ollama run llama4. If this is your first time running this model, Ollama downloads it from the model library. The download takes a few minutes depending on your internet speed, as model files range from 2GB to 45GB. Once downloaded, the model loads into memory and you see a prompt where you can type messages and receive responses.
During the interactive session, type your message and press Enter to send it. The model responds with generated text that streams to your terminal in real time. Type /bye to exit the session. You can also use /set parameter temperature 0.3 to adjust settings within the session, or /show info to display the model's configuration.
If Llama 4 Scout is too large for your hardware, try ollama run llama3.2 for a smaller 3B model that runs on virtually any computer, or ollama run qwen3:8b for a capable 8B model that fits in 8GB of VRAM.
Manage Your Model Library
Use ollama list to see all models installed on your system, including their sizes and modification dates. Use ollama pull modelname to download a model without starting a chat session, useful for pre-loading models you plan to use later. Use ollama rm modelname to delete a model and free its disk space.
Models are identified by name and optional tag. The name identifies the model family (like llama4, qwen3, or deepseek-r1), and the tag specifies the variant (like :14b, :32b-q5_K_M, or :latest). Running ollama pull qwen3:14b downloads the 14B variant specifically, while ollama pull qwen3 downloads the default variant.
To see available variants for a model, visit ollama.com/library and click on the model name. The model page lists all available sizes and quantization options with their download sizes and memory requirements. This helps you choose the variant that best matches your hardware before committing to a download.
Use the REST API
The Ollama API runs automatically on http://localhost:11434. The simplest way to test it is with curl. For a chat request, send a POST to /api/chat with a JSON body containing the model name and a messages array. The API returns the model's response along with token usage statistics.
For non-streaming requests (useful for scripting), add "stream": false to your request body. This returns the complete response as a single JSON object instead of streaming individual tokens. Non-streaming mode is simpler to parse in scripts and applications where you need the full response before proceeding.
The OpenAI-compatible endpoint at http://localhost:11434/v1 works with any tool or library that supports the OpenAI API. Set the base URL to Ollama's address, use any string as the API key, and specify an Ollama model name. This compatibility lets you switch between local and cloud models by changing only the configuration, not the code.
Create a Custom Model
Modelfiles let you create named configurations with custom parameters and system prompts. Create a text file with the base model, your preferred settings, and a system prompt. For example, a coding assistant Modelfile might specify a low temperature for deterministic output, a large context window for understanding long code files, and a system prompt that instructs the model to produce clean, well-commented code.
Run ollama create mycoder -f ./CodingModelfile to build the custom model. Then use ollama run mycoder to start a session with your custom configuration. The custom model appears in ollama list alongside standard models and can be used through the API just like any other model.
You can create as many custom models as you need, each optimized for different tasks. A creative writing model with high temperature, a data analysis model with structured output instructions, a translation model with multilingual system prompts, all sharing the same base model weights but behaving differently based on their Modelfile configuration.
Essential Commands Reference
ollama run model starts an interactive chat session. ollama pull model downloads a model. ollama list shows installed models. ollama rm model deletes a model. ollama show model displays model metadata. ollama create name -f Modelfile creates a custom model. ollama cp source dest copies a model. ollama ps shows running models. ollama serve starts the server manually if it is not running as a service.
Within an interactive session, /bye exits, /set parameter name value adjusts settings, /show info displays model configuration, /clear resets the conversation, and /help lists all available session commands. You can paste multi-line text by enclosing it in triple quotes.
Next Steps
Once you are comfortable with the basics, explore more advanced topics. Try different models to find the best ones for your specific tasks. Set up the Python or JavaScript client library for building applications. Create Modelfiles for your common workflows. Connect Ollama to an AI agent framework like LangChain or CrewAI. Set up Open WebUI for a graphical chat interface. Each of these topics is covered in detail in the articles linked below.
Getting started with Ollama requires just one command: ollama run followed by a model name. From there, the CLI, API, and Modelfile system provide everything you need to use local models productively for any task.