How to Use DeepSeek in VS Code: The $2/Month AI Coding Assistant Setup Guide

Coding Liquids blog cover featuring Sagnik Bhattacharya for How to Use DeepSeek in VS Code, showing VS Code editor with DeepSeek AI coding assistant integration.
Coding Liquids blog cover featuring Sagnik Bhattacharya for How to Use DeepSeek in VS Code, showing VS Code editor with DeepSeek AI coding assistant integration.

GitHub Copilot costs $19 per month. It is the industry standard for AI-assisted coding, and for good reason — it is fast, accurate, and deeply integrated into VS Code. But what if I told you there is an alternative that delivers near-GPT-4 coding quality for roughly $2 per month? That alternative is DeepSeek, and it has quietly become one of the most compelling budget AI coding assistants available in 2026.

I teach Flutter and Excel with AI — explore my courses if you want structured learning.

DeepSeek, developed by the Chinese AI lab of the same name, has made waves with its V3, R1, and now V4 model families. The current flagship is DeepSeek V4, released in preview on 24 April 2026, which ships in two tiers: deepseek-v4-flash (fast and cheap, with a non-thinking and thinking mode) and deepseek-v4-pro (higher reasoning capacity). Both expose a 1M-token context window through the API. The pricing is what makes DeepSeek remarkable: V4 Flash costs $0.14 per million input tokens and $0.28 per million output tokens, a fraction of what OpenAI or Anthropic charge. For the average developer writing code in VS Code, that translates to roughly $1.50 to $2.50 per month in real-world usage. If you prefer complete privacy and zero cost, you can also run DeepSeek Coder V2 locally through Ollama.

Note on legacy model names: If you have older tutorials or config snippets that reference deepseek-chat or deepseek-reasoner, those still work today as compatibility aliases — they map to V4 Flash's non-thinking and thinking modes respectively — but DeepSeek has scheduled them for deprecation on 24 July 2026. The deepseek-coder API model name was retired earlier (it was folded into deepseek-chat back when V2.5 launched in 2024). Use deepseek-v4-flash and deepseek-v4-pro in any new configuration.

Follow me on Instagram@sagnikteaches

I have spent the past few weeks setting up DeepSeek in VS Code using multiple methods, testing it across real coding tasks, and tracking my actual API costs. This guide walks through every step — from getting your API key to configuring autocomplete to understanding exactly what you are trading off compared to Copilot and Claude.

Connect on LinkedInSagnik Bhattacharya Subscribe on YouTube@codingliquids

Prerequisites

Before you begin, make sure you have the following ready:

  1. VS Code installed. Any recent stable release works. Keep it updated for the best extension compatibility.
  2. A DeepSeek API key (for the cloud method). You will create an account at platform.deepseek.com and generate an API key. DeepSeek requires a small initial top-up (as little as $2) to activate API access.
  3. Ollama installed (for the local method). Download it from ollama.com if you want to run DeepSeek Coder entirely on your machine with no API costs.
  4. The Continue extension. This is the open-source VS Code extension we will use to connect DeepSeek to your editor. It supports both API-based and local models, and provides chat, inline editing, and tab-autocomplete.

You do not need a powerful GPU for the API method — the model runs on DeepSeek's servers. For the local Ollama method, you will need at least 8GB of VRAM for the smaller DeepSeek Coder models.

Method 1: DeepSeek API + Continue Extension

This is the recommended method for most developers. You get access to DeepSeek's most powerful models (V4 Flash and V4 Pro) without any hardware requirements, and the cost is negligible compared to Copilot.

Step 1: Get Your DeepSeek API Key

  1. Go to platform.deepseek.com and create an account.
  2. Navigate to the API Keys section in your dashboard.
  3. Click "Create new API key" and copy the key. Store it somewhere safe — you will not be able to see it again.
  4. Top up your account with the minimum amount (typically $2). This balance will last most developers several weeks to a month depending on usage.

DeepSeek's API is OpenAI-compatible, which means any tool that works with the OpenAI API format can connect to DeepSeek by simply changing the base URL and API key. This is why the Continue extension works seamlessly with it.

Step 2: Install and Configure Continue

  1. Open VS Code and go to the Extensions panel (Ctrl+Shift+X).
  2. Search for "Continue" and install the extension by Continue.dev.
  3. After installation, click the Continue icon in the sidebar.
  4. Open the Continue configuration file. The current Continue extension uses YAML by default at ~/.continue/config.yaml. You can open it via the gear icon in the Continue panel, or with the command palette (Ctrl+Shift+P → "Continue: Open config.yaml"). The legacy config.json format still loads if present, but new installs are YAML.

Add the following to your config.yaml:

name: My Continue Config
version: 1.0.0
schema: v1
models:
  - name: DeepSeek V4 Flash
    provider: deepseek
    model: deepseek-v4-flash
    apiKey: YOUR_DEEPSEEK_API_KEY
    roles:
      - chat
      - edit
      - apply
  - name: DeepSeek V4 Pro
    provider: deepseek
    model: deepseek-v4-pro
    apiKey: YOUR_DEEPSEEK_API_KEY
    roles:
      - chat
      - edit
  - name: DeepSeek Autocomplete
    provider: deepseek
    model: deepseek-v4-flash
    apiKey: YOUR_DEEPSEEK_API_KEY
    roles:
      - autocomplete
    autocompleteOptions:
      debounceDelay: 300
      maxPromptTokens: 1024

Replace YOUR_DEEPSEEK_API_KEY with the key you copied earlier. The configuration above gives you V4 Flash as your everyday chat/edit model, V4 Pro for harder reasoning tasks, and V4 Flash again as the autocomplete model — Flash is the right choice for completions because latency matters more than depth for tab suggestions.

If you are still on the legacy config.json format, the equivalent shape uses title instead of name, a top-level tabAutocompleteModel object instead of the autocomplete role, and no roles array — but I recommend migrating to YAML, which is what Continue's documentation now ships as the default.

Step 3: Choose Your Model

DeepSeek V4 ships in two tiers, and choosing the right one matters:

  • DeepSeek V4 Flash (deepseek-v4-flash) — Fast, cheap, and the default for most tasks. Has both a non-thinking mode (instant responses, ideal for autocomplete and quick chat) and a thinking mode (extended reasoning, exposed under the legacy deepseek-reasoner alias). At $0.14/M input and $0.28/M output, this is your daily-driver model.
  • DeepSeek V4 Pro (deepseek-v4-pro) — Higher-capacity reasoning model. Use it for architectural design discussions, deep refactors, complex debugging, and longer-horizon code reviews. Pricier than Flash (currently around $0.435/M input and $0.87/M output during the launch discount window), but worth it for the hard problems.

I recommend using V4 Flash for autocomplete and most chat interactions, and V4 Pro for when you genuinely need deeper reasoning. The configuration above wires this up directly — Continue's model picker lets you swap between them with a dropdown in the chat panel.

Method 2: DeepSeek via Ollama (Local, Free)

If you prefer not to send your code to any external server — or if you simply want zero ongoing cost — you can run DeepSeek Coder locally through Ollama. The trade-off is that you need decent hardware and the model quality is limited by what your machine can handle.

Step 1: Pull the DeepSeek Coder Model

Open a terminal and run:

ollama pull deepseek-coder-v2:16b

This pulls the 16-billion parameter "Lite" variant of DeepSeek Coder V2 (a Mixture-of-Experts model with roughly 2.4B active parameters). It downloads at ~8.9GB in its default 4-bit quantisation and is the right balance of quality and performance for most developer machines — it runs comfortably on 8–12GB of VRAM, or on Apple Silicon with 16GB+ unified memory. The :latest tag currently points to the same 16B model.

If you have a workstation with serious GPU memory (think 80GB+ across one or more cards) and want the full-quality 236B model, pull instead:

ollama pull deepseek-coder-v2:236b

Note that there is no deepseek-coder-v2:lite tag in the Ollama library — the 16B model is the Lite variant, just published under its parameter count. Verify what you have by running ollama list; you should see deepseek-coder-v2:16b in the output.

Step 2: Configure Continue for Local DeepSeek

Update your config.yaml to point to your local Ollama instance:

models:
  - name: DeepSeek Coder V2 (Local)
    provider: ollama
    model: deepseek-coder-v2:16b
    roles:
      - chat
      - edit
      - apply
  - name: DeepSeek Coder V2 Autocomplete
    provider: ollama
    model: deepseek-coder-v2:16b
    roles:
      - autocomplete
    autocompleteOptions:
      debounceDelay: 500
      maxPromptTokens: 2048

Ensure Ollama is running — visit http://localhost:11434 in your browser to confirm you see "Ollama is running". Continue will automatically connect to the local Ollama API at that address and route all requests to your machine. No API key is required for local Ollama.

Step 3: Verify It Works

Open the Continue chat panel in VS Code and type a simple prompt like "Write a Python function that reverses a string." If you see a response from DeepSeek Coder, your local setup is working. Then open a code file and start typing — you should see tab-autocomplete suggestions appearing after a brief delay.

Configuring Autocomplete

Tab-autocomplete is the feature that makes the biggest difference in day-to-day coding. In Continue's config.yaml, autocomplete tuning lives under the autocompleteOptions block on whichever model has the autocomplete role (see the YAML examples above). Here is how to optimise DeepSeek's autocomplete behaviour:

  • Debounce delay. Set autocompleteOptions.debounceDelay: 300 on your autocomplete model. This waits 300 milliseconds after you stop typing before sending a completion request. Too low and you waste API calls (or GPU cycles) on partial words. Too high and suggestions feel sluggish. I find 300ms is the sweet spot for the DeepSeek API; bump it to 500ms if running locally on modest hardware.
  • Multiline completions. Set autocompleteOptions.multilineCompletions: "always" to let DeepSeek suggest entire function bodies, multi-line conditionals, and complete code blocks rather than just single lines. V4 Flash and DeepSeek Coder V2 both produce well-structured multi-line completions that are often correct on the first suggestion.
  • Prompt token budget. Set autocompleteOptions.maxPromptTokens to control how much surrounding context Continue sends with each completion request. The default of 1024 is fine for the API. For local Ollama, keeping it at 1024–2048 keeps autocomplete responsive on modest GPUs; most inline completions do not need more context than that.
  • Fill-in-the-middle (FIM). DeepSeek Coder V2 and V4 Flash both support FIM, meaning the model analyses code both before and after your cursor to generate contextually appropriate suggestions. Continue enables this by default for these models. This is particularly useful when you are writing code in the middle of an existing function — the suggestions will respect the surrounding code structure rather than treating your cursor position as the end of the file.

Chat-Based Workflows

Beyond autocomplete, the chat interface is where DeepSeek provides tremendous value. Here are the workflows I use most frequently with DeepSeek in VS Code:

Code Explanation

Select a block of unfamiliar code — perhaps something inherited from a colleague or pulled from a library — and press Ctrl+L to send it to Continue's chat. Ask: "Explain what this code does step by step. Highlight any potential issues." DeepSeek V4 Flash in thinking mode (or V4 Pro for denser code) handles this exceptionally well. It correctly identifies design patterns, traces control flow, and flags common issues like missing error handling, race conditions, or inefficient algorithms. The explanations are clear and well-structured, often rivalling what you would get from GPT-class models.

Refactoring

Select a function and ask DeepSeek to refactor it. For example: "Refactor this function to use async/await instead of callbacks. Add TypeScript types. Extract the configuration into a separate object." DeepSeek V4 produces clean, idiomatic refactored code across JavaScript, TypeScript, Python, Go, and Rust. It occasionally over-engineers the solution — adding abstraction layers that are unnecessary for simpler functions — but a follow-up prompt like "simplify this, keep it under 30 lines" corrects that tendency quickly.

Test Generation

Paste a function and ask: "Write unit tests for this function using Jest. Cover the happy path, edge cases (empty input, null values, type mismatches), and error conditions." DeepSeek V4 Flash generates solid test suites that cover the main execution paths. It handles standard testing frameworks well — Jest, Vitest, pytest, Go's testing package, Rust's built-in tests. Where it sometimes falls short is in generating truly creative edge cases; it covers the obvious ones but may miss domain-specific boundary conditions that a human tester would catch.

Debugging

Paste an error message along with the relevant code and ask: "What is causing this error and how do I fix it?" This is one of DeepSeek's strongest use cases. Common error patterns, stack traces, and framework-specific issues are well-represented in its training data. In my testing, DeepSeek correctly diagnosed the root cause on the first attempt roughly 80% of the time — comparable to Claude and GPT-4 for standard debugging tasks.

DeepSeek V4 Flash vs V4 Pro: When to Use Which

With V4, DeepSeek has consolidated everything into two tiers. Understanding when each one earns its keep saves you both money and latency:

Aspect DeepSeek V4 Flash DeepSeek V4 Pro
Primary strength Speed, cost, everyday tasks Deep reasoning, longer-horizon planning
Best for Autocomplete, quick chat, refactoring, test generation Architecture design, complex debugging, code review of large diffs
Modes Non-thinking (fast) and thinking (reasoning) Single high-capacity reasoning mode
Context window 1M tokens 1M tokens
Response speed Fast (typical 200–500ms first token) Slower; reasoning takes seconds, not milliseconds
Token cost (cache miss) $0.14/M input · $0.28/M output $0.435/M input · $0.87/M output (during launch discount)

The practical recommendation: keep V4 Flash as your default for autocomplete and most chat work, and reach for V4 Pro only when a problem genuinely needs deeper reasoning — refactors that touch many files, architectural trade-offs, or debugging that requires holding a lot of context in mind at once. The dual-model setup is already reflected in the configuration examples above.

Cost Breakdown: How $2/Month Actually Works

The $2/month figure is not marketing — it is based on real-world usage tracking. Here is how the maths works:

DeepSeek's API pricing (as of May 2026, post V4 launch) is:

  • DeepSeek V4 Flash (deepseek-v4-flash): $0.14 per million input tokens (cache miss), $0.0028 per million input tokens (cache hit), $0.28 per million output tokens.
  • DeepSeek V4 Pro (deepseek-v4-pro): $0.435 per million input tokens (cache miss), $0.003625 per million input tokens (cache hit), $0.87 per million output tokens — currently at a 75% launch discount through 31 May 2026, after which list prices apply.

The cache-hit pricing is the genuinely interesting line. DeepSeek's API automatically caches repeated prompt prefixes — system prompts, file headers, code you've already discussed — and bills repeated tokens at roughly 2% of the cache-miss rate. For tab-autocomplete and iterative chat, where the prompt prefix barely changes between requests, your effective bill drops dramatically below the headline numbers.

Compare this to OpenAI's GPT-4-class models at roughly $2.50–$10 per million input tokens, or Anthropic's Claude Sonnet 4.6 at $3 per million input tokens. DeepSeek V4 Flash is roughly 20× cheaper than the equivalents.

In a typical coding day, I make roughly 50–100 chat interactions and receive several hundred autocomplete suggestions. That works out to approximately 200,000–400,000 tokens per day in combined input and output. With caching, my actual DeepSeek bill comes in around $0.05 to $0.10 per day, or $1.50 to $3.00 per month.

If you use the local Ollama method, your cost is literally zero (beyond the electricity to run your GPU). The trade-off is that you need capable hardware and the local models are slightly less powerful than what the API provides.

For comparison, here is what the alternatives cost:

  • GitHub Copilot Individual: $19/month
  • GitHub Copilot Business: $39/month per seat
  • Claude Pro (for coding via API): $20/month subscription, or $3-15 per million tokens via API
  • ChatGPT Plus (for coding): $20/month
  • DeepSeek via API: ~$2/month for typical usage
  • Gemma 4 via Ollama (local): Free

DeepSeek vs Copilot vs Claude vs Gemma 4 (Local)

This is the comparison that matters. Here is an honest assessment based on weeks of side-by-side testing:

Feature DeepSeek V4 Flash (API) GitHub Copilot Claude Sonnet 4.6 (API) Gemma 4 (Local)
Monthly cost ~$2 $19 ~$5-15 (API usage) Free
Autocomplete quality Very good Excellent Good (via Continue) Good
Autocomplete speed Fast (200-500ms) Very fast (100-300ms) Moderate (300-800ms) Hardware-dependent
Chat / reasoning quality Very good Very good Excellent Good
Code explanation Excellent Very good Excellent Good (27B model)
Test generation Good Good Excellent Adequate
Language breadth Strong in popular languages Broad across all languages Strong across all languages Strong in popular languages
Privacy Code sent to China-based servers Code sent to GitHub/Microsoft Code sent to Anthropic Fully local
Offline availability No No No Yes
Setup effort Moderate (API key + Continue) Minimal (install + sign in) Moderate (API key + Continue) High (Ollama + model + Continue)
Project context awareness Limited to active file + prompt Indexes workspace Limited to active file + prompt Limited to active file + prompt

The takeaway is clear: DeepSeek occupies a unique position as the best value-for-money AI coding assistant. It is not the absolute best in any single category, but it is remarkably close to the best in most categories at a tenth of the price. If you are a budget-conscious developer, a student, or someone who simply objects to paying $19/month when a $2/month option exists that covers 90% of your needs, DeepSeek is the obvious choice.

Limitations and Considerations

DeepSeek is impressive for its price, but it is not without drawbacks. You should be aware of these before committing:

API Reliability

DeepSeek's API has experienced periodic outages and rate limiting, particularly during peak hours. When DeepSeek went viral in early 2025, the API was frequently unavailable for hours at a time. Reliability has improved significantly since then, but it is still not as consistently available as OpenAI's or Anthropic's APIs. If you rely on AI coding assistance for production work, consider having a fallback (a local Ollama model, for instance).

Privacy and Data Concerns

This is the elephant in the room. DeepSeek is a Chinese company, and when you use the API, your code is transmitted to servers based in China. DeepSeek's privacy policy states that data may be stored and processed in the People's Republic of China. For personal projects, open-source work, and learning, this is unlikely to be a practical concern. For proprietary corporate code, code subject to regulatory compliance (GDPR, HIPAA, SOX), or sensitive intellectual property, this is a serious consideration that may disqualify DeepSeek entirely.

If privacy is a hard requirement but you still want DeepSeek's capabilities, use the Ollama local method. Running DeepSeek Coder locally means no data leaves your machine — ever. The model quality is somewhat lower than the full API models, but it eliminates the data residency concern completely.

Content Filtering and Censorship

DeepSeek's models include content filters that occasionally interfere with legitimate coding tasks. In my testing, this was rare for standard development work, but it can surface when working with security-related code (penetration testing tools, encryption implementations), content that touches on politically sensitive topics (relevant for some NLP and content moderation projects), or certain medical or legal domain code. If you hit a refusal, rephrasing the prompt usually resolves it.

Language and Framework Coverage

DeepSeek excels at Python, JavaScript/TypeScript, Go, Rust, Java, and C++. It handles these languages at a level very close to Copilot. Where it falls behind is in less common languages (Elixir, Haskell, OCaml, Kotlin Multiplatform) and in very new frameworks or libraries that were released after its training cutoff. Copilot's advantage here comes from its continuous learning pipeline and GitHub's vast code corpus.

Frequently Asked Questions

Is DeepSeek safe to use for coding?

For personal projects, open-source contributions, and learning — yes, it is perfectly safe. The models produce high-quality code and the API functions reliably for most use cases. The primary concern is data privacy: your code is sent to DeepSeek's servers in China when using the API. If you work with proprietary or regulated code, either use the local Ollama method (which keeps everything on your machine) or check with your organisation's security team before using the cloud API. From a code quality perspective, DeepSeek's suggestions are comparable to other leading AI models — always review generated code before committing it, regardless of which AI tool produced it.

How does DeepSeek's $2/month compare to GitHub Copilot's $19/month in practice?

The day-to-day experience is surprisingly close. For autocomplete, Copilot is still faster and slightly more contextually aware (it indexes your entire workspace), but DeepSeek V4 Flash produces relevant completions for most standard coding tasks. For chat-based workflows — explaining code, refactoring, debugging, writing tests — the quality gap is negligible with V4 Flash, and V4 Pro closes it further on harder reasoning tasks. Where Copilot clearly wins is in setup simplicity (one-click install versus API key configuration), reliability (near-100% uptime versus occasional DeepSeek outages), and workspace-wide context awareness. Whether the $17/month difference justifies those advantages depends entirely on your priorities and budget.

Can I switch between DeepSeek and other models in the same VS Code setup?

Yes. The Continue extension supports multiple model providers simultaneously. You can configure DeepSeek, Claude, OpenAI, and local Ollama models all in the same config.yaml file and switch between them with a dropdown in the Continue panel. This is actually the setup I recommend: use DeepSeek as your primary model for cost efficiency, keep a local Ollama model as a fallback for when the API is unavailable, and optionally add Claude or GPT-4 for tasks where you want the absolute best quality regardless of cost.

Sources and Further Reading

Related Tutorials