GitHub Copilot costs $19 per month. It is an excellent tool — I use it myself and recommend it in my courses. But there is a growing number of developers who either cannot use cloud-based coding assistants (company policy, security requirements, air-gapped environments) or simply do not want to pay a recurring subscription for AI code completions. If that is you, Gemma 4 running locally through Ollama and connected to VS Code is the most capable free alternative available today.
I have spent the past few weeks setting up and testing three different methods of integrating Gemma 4 with VS Code. This guide walks through each one with concrete setup steps, configuration details, and an honest assessment of what works well and where the experience falls short compared to Copilot. If you have not installed Gemma 4 yet, start with my beginner's guide to running Gemma 4 locally and come back here once Ollama is running.
Prerequisites
Before configuring any VS Code extension, make sure you have the following ready:
- Ollama installed and running. Download it from ollama.com and verify it is running by opening a terminal and typing
ollama list. You should see your installed models. - Gemma 4 pulled. Run
ollama pull gemma4(for the default size) orollama pull gemma4:27bfor the larger version. The download is several gigabytes, so allow time on slower connections. - VS Code installed. Any recent version works. I recommend keeping it updated to the latest stable release for the best extension compatibility.
- Adequate hardware. For coding tasks specifically, the 12B model is the sweet spot. It runs well on a GPU with 8GB VRAM or an Apple Silicon Mac with 16GB unified memory. The 27B model provides better code quality but needs 16GB+ VRAM.
Verify Ollama is serving by visiting http://localhost:11434 in your browser — you should see "Ollama is running". This is the local API endpoint that all VS Code extensions will connect to.
Method 1: Continue Extension — The Most Complete Option
Continue is an open-source AI coding assistant extension that supports multiple backends, including Ollama. It is the closest experience to Copilot you will get with a local model, offering chat, inline editing, and tab-autocomplete.
Installation and Setup
- Open VS Code and go to the Extensions panel (Ctrl+Shift+X).
- Search for "Continue" and install the extension by Continue.dev.
- After installation, click the Continue icon in the sidebar to open the configuration panel.
- Click the gear icon or open the configuration file. You need to add Ollama as a provider with Gemma 4 as the model.
In your Continue configuration file (~/.continue/config.json), add or modify the models section:
{
"models": [
{
"title": "Gemma 4 27B",
"provider": "ollama",
"model": "gemma4:27b"
}
],
"tabAutocompleteModel": {
"title": "Gemma 4 12B",
"provider": "ollama",
"model": "gemma4:12b"
}
}
Notice that I use the 27B model for chat (where quality matters more and latency is acceptable) and the 12B model for tab-autocomplete (where speed is critical). This dual-model approach gives you the best balance of quality and responsiveness.
Configuring Autocomplete
Tab-autocomplete is the feature most developers associate with Copilot — you type a few characters or a comment, and the AI suggests the rest of the line or block. With Continue and Gemma 4, this works but requires some tuning:
- Set
"tabAutocompleteOptions"in your config to control how aggressively suggestions appear. I recommend starting with"debounceDelay": 500to avoid overwhelming your GPU with requests on every keystroke. - If completions feel slow, switch the autocomplete model to the 4B version (
gemma4:4b). The suggestions will be less sophisticated but appear faster. - Enable
"multilineCompletions": "always"if you want Gemma 4 to suggest entire function bodies rather than single lines.
Chat and Inline Editing
Continue's chat panel works like a dedicated coding assistant. You can highlight code, press Ctrl+L (or Cmd+L on Mac) to send it to the chat, and ask Gemma 4 to explain it, refactor it, add error handling, or write tests. The inline editing feature (Ctrl+I) lets you select code and describe a change in natural language — Gemma 4 generates the edit and shows you a diff to accept or reject.
In my experience, the chat and inline editing features work well with Gemma 4 27B. Explanations are clear, refactoring suggestions are sensible, and test generation covers the main paths. Where it sometimes falls short compared to Copilot or Claude is in understanding the broader project context — Gemma 4 processes what you give it in the prompt, but does not index your entire workspace the way Copilot does.
Method 2: CodeGPT Extension — Connecting to Local Ollama
CodeGPT is another popular AI coding extension that supports local model backends. Its interface is slightly different from Continue — it focuses more on the chat experience and less on inline completions.
Setup Steps
- Install the CodeGPT extension from the VS Code marketplace.
- Open CodeGPT settings (click the extension icon in the sidebar, then the settings gear).
- Select "Ollama" as your AI provider.
- The extension will automatically detect models available in your local Ollama installation. Select Gemma 4 from the dropdown.
- Optionally, configure the system prompt to tailor Gemma 4's behaviour for your coding style. For example: "You are a senior developer. Write clean, well-commented code. Prefer TypeScript over JavaScript. Always include error handling."
What CodeGPT Does Well
CodeGPT's strength is its chat interface. You can ask Gemma 4 coding questions, paste error messages, and get explanations without leaving VS Code. It also supports selecting code in the editor and sending it to the chat with a right-click context menu, which is convenient for quick "explain this" or "find the bug" queries.
The extension also supports saving conversation history, which is useful when you are working on a complex feature over multiple sessions and want to reference earlier discussions.
Limitations
CodeGPT's inline autocomplete integration with local models is less polished than Continue's. If tab-autocomplete is your priority, Continue is the better choice. CodeGPT is strongest when used as a chat-based coding companion alongside your normal typing workflow.
Method 3: Ollama Extension — Lightweight Direct Integration
For developers who want minimal overhead, the Ollama extension for VS Code provides a stripped-down chat interface that connects directly to your local Ollama server. No account creation, no provider configuration — just install and use.
Setup Steps
- Install the "Ollama" extension from the VS Code marketplace (by Matt Williams or similar — look for the one with the highest download count).
- Ensure Ollama is running locally.
- Open the command palette (Ctrl+Shift+P) and search for "Ollama". Select the chat command.
- Choose Gemma 4 from the list of available models.
This extension is intentionally simple. It gives you a chat panel and nothing else — no inline completions, no inline editing, no workspace indexing. Its advantage is that it is lightweight, fast to set up, and has minimal impact on VS Code's performance. If you just want to ask occasional coding questions without leaving your editor, this is sufficient.
Configuring Inline Code Completions
Tab-autocomplete is where the practical value of an AI coding assistant is felt most directly. Here is how to optimise the experience with Gemma 4:
- Model size matters for speed. For autocomplete specifically, response time is more important than response quality. A 200ms completion with an adequate suggestion beats a 3-second completion with a perfect suggestion. Start with the 12B model and drop to 4B if it feels sluggish.
- GPU offloading. Make sure Ollama is using your GPU rather than CPU. Run
ollama psto check which processor is being used. On NVIDIA systems, set theOLLAMA_GPU_LAYERSenvironment variable if needed. - Context window. Reduce the context window size in your extension settings if completions are slow. A context of 2048 tokens is usually sufficient for inline completions and reduces memory pressure.
- Fill-in-the-middle (FIM). Gemma 4 supports FIM, which means it can look at code both before and after the cursor position to generate better suggestions. Make sure your extension has FIM enabled in its settings — Continue supports this by default.
Chat-Based Coding Workflows
Beyond autocomplete, the chat interface is where Gemma 4 provides the most value in day-to-day development. Here are the workflows I use most frequently:
Explaining Unfamiliar Code
Select a block of code you inherited or found in a library, send it to the chat, and ask: "Explain what this code does, step by step. Highlight any potential issues or anti-patterns." Gemma 4 27B does this well across most popular languages. It correctly identifies design patterns, explains control flow, and flags issues like missing error handling or potential memory leaks.
Refactoring
Select a function and ask: "Refactor this to be more readable. Extract any magic numbers into named constants. Add TypeScript types if possible." Gemma 4 produces clean refactored code, though it occasionally over-refactors — splitting code into more functions than necessary. I find a follow-up prompt like "simplify — keep it to two functions maximum" corrects this tendency.
Generating Tests
Paste a function and ask: "Write unit tests for this function using Jest. Cover the happy path, edge cases (empty input, null values), and error conditions." Gemma 4 generates reasonable test suites. The coverage is not as thorough as what Claude or ChatGPT produce — it tends to cover the main paths but misses some edge cases. For production test suites, use Gemma 4 as a starting point and add the edge cases manually.
Debugging Errors
Paste an error message along with the relevant code and ask: "What is causing this error and how do I fix it?" This is one of Gemma 4's stronger use cases. Stack traces and common error patterns are well-represented in its training data, so it correctly diagnoses most issues on the first attempt.
Performance Tips: Model Size and Hardware Trade-Offs
Choosing the right model size for your hardware is critical for a good experience. Here is a practical guide:
| Model | Min VRAM | Autocomplete Speed | Code Quality | Best For |
|---|---|---|---|---|
| Gemma 4 4B | 4GB | Very fast (100-300ms) | Basic — simple completions | Low-end GPUs, CPU-only, quick suggestions |
| Gemma 4 12B | 8GB | Fast (300-800ms) | Good — handles most tasks | Most developers, best balance |
| Gemma 4 27B | 16GB | Moderate (800ms-2s) | Very good — complex logic | Chat-based workflows, complex code generation |
If you have an NVIDIA RTX 3060 (12GB VRAM), the 12B model runs comfortably with room to spare. The RTX 4070 (12GB) handles it faster due to the newer architecture. Apple Silicon Macs with M1 Pro/Max (16GB+) run the 12B model smoothly and can handle the 27B model with slightly longer response times.
A practical approach I recommend: run the 12B model for autocomplete and keep the 27B model available for chat tasks. Most extensions let you configure different models for different features, as shown in the Continue configuration above.
Gemma 4 vs GitHub Copilot for Coding: Honest Comparison
This is the comparison most developers want. Here is an honest assessment based on several weeks of using both side by side.
| Feature | Gemma 4 (Local) | GitHub Copilot |
|---|---|---|
| Inline autocomplete quality | Good for common patterns | Excellent across all contexts |
| Autocomplete speed | Hardware-dependent (300ms-2s) | Consistently fast (100-300ms) |
| Project context awareness | Limited to active file + prompt | Indexes workspace and open files |
| Chat / code explanation | Good (27B model) | Very good (Copilot Chat) |
| Test generation | Adequate — covers main paths | Good — better edge case coverage |
| Language support breadth | Strong in popular languages | Strong across all languages |
| Privacy | Fully local — code never transmitted | Cloud-based — code sent to GitHub servers |
| Offline availability | Yes | No |
| Cost | Free | $19/month individual, $39/month business |
| Setup effort | Moderate (install Ollama, pull model, configure extension) | Minimal (install extension, sign in) |
When Local Gemma 4 Makes Sense vs When Copilot Is Worth the Subscription
Choose Gemma 4 when:
- Your organisation prohibits sending code to external servers. This is common in finance, defence, healthcare, and any company with strict IP policies. Gemma 4 is the only option that keeps everything local.
- You work offline frequently — on flights, in restricted network environments, or in locations with unreliable internet.
- You are a student or early-career developer and $19/month is a meaningful expense. Gemma 4 gives you a capable coding assistant for free.
- You want to learn about AI infrastructure. Setting up and configuring local models is a valuable skill that translates directly to deploying AI in production environments.
- You primarily need chat-based assistance (explain, refactor, debug) rather than inline completions. The quality gap between Gemma 4 and Copilot is smaller for chat tasks than for autocomplete.
Choose Copilot when:
- Inline autocomplete speed and quality are your top priority. Copilot is still meaningfully better at predicting what you want to type next, especially in complex codebases.
- You work across many languages and frameworks. Copilot's training data and architecture give it broader coverage of less common languages and newer frameworks.
- You value zero-configuration setup. Copilot works out of the box with a single sign-in. No GPU requirements, no model downloads, no configuration files.
- Your team or company already has a Copilot Business licence. At that point, the cost is covered and Copilot's team features (organisation-wide policy controls, audit logs) add value.
For a broader perspective on how Gemma 4 compares to paid AI tools across all categories, see my full AI model comparison for 2026. And for specific use cases where Gemma 4 outperforms paid alternatives, see 5 real-world tasks where Gemma 4 beats paid models.
Frequently Asked Questions
What hardware do I need to run Gemma 4 as a coding assistant in VS Code?
For comfortable inline code completions, you need a GPU with at least 8GB VRAM to run the Gemma 4 12B model. The 27B model requires 16GB+ VRAM for reasonable speed. If you only have a CPU, the 4B model works but completions will be noticeably slower (2-5 seconds per suggestion instead of under 1 second). For most developers, a mid-range NVIDIA GPU like the RTX 3060 (12GB) or RTX 4060 Ti (16GB) provides a good experience. Apple Silicon Macs with 16GB unified memory also handle the 12B model well through Ollama.
Is Gemma 4 in VS Code as good as GitHub Copilot?
For inline code completions, GitHub Copilot is still ahead — its suggestions are faster, more contextually aware across your entire project, and handle a wider range of languages and frameworks. Where Gemma 4 closes the gap is in chat-based coding workflows (explaining code, refactoring, generating tests) where the quality difference is smaller. Gemma 4's advantages are cost (completely free), privacy (code never leaves your machine), and offline availability. For developers who cannot use cloud-based tools due to company policy or who want to avoid the $19/month Copilot subscription, Gemma 4 is a genuinely capable alternative.
Can I use Gemma 4 and GitHub Copilot together in VS Code?
Yes, you can run both simultaneously. A practical setup is to use Copilot for inline tab-completions (where it excels) and Gemma 4 through Continue for chat-based tasks like code explanation, refactoring, and test generation (where keeping your code local matters). This gives you the best of both worlds — fast cloud-powered completions for routine coding and a private local model for when you are working with sensitive codebases or need detailed architectural discussions.
Which Gemma 4 model size is best for coding in VS Code?
The 12B model offers the best balance for most developers. It is fast enough for interactive use (sub-second completions on a decent GPU), handles most programming languages well, and produces good quality code for standard tasks. The 27B model produces better code for complex logic, architectural decisions, and less common languages, but it is slower and requires more VRAM. The 4B model is usable for simple completions but struggles with anything requiring multi-step reasoning. Start with 12B and upgrade to 27B if you find yourself hitting quality limitations.
Sources and Further Reading
Related Posts
- How to Run Gemma 4 Locally for Free: A Beginner's Guide
- Gemma 4 vs ChatGPT vs Claude vs Copilot: Best AI Model Comparison in 2026
- 5 Real-World Tasks Where Gemma 4 Beats Paid AI Models
- Gemma 4 vs Gemini: What's the Difference and When to Use Which
- Gemma 4 vs GPT-4o vs Llama 4: Which Free AI Model Is Best for Excel Formulas?
- Gemma 4 for Data Analysis: Can It Replace ChatGPT for Spreadsheet Work?
Want to use AI tools more effectively?
My courses cover practical AI workflows, from spreadsheet formulas to app development, with real projects and honest tool comparisons.
Browse all courses