Gemma 4 vs GPT-4o vs Llama 4: Which Free AI Model Is Best for Excel Formulas?

Coding Liquids blog cover featuring Sagnik Bhattacharya comparing Gemma 4, GPT-4o, and Llama 4 for Excel formula generation, with spreadsheet visuals and AI model logos.
Coding Liquids blog cover featuring Sagnik Bhattacharya comparing Gemma 4, GPT-4o, and Llama 4 for Excel formula generation, with spreadsheet visuals and AI model logos.

If you work with Excel professionally, you have probably already experimented with AI to generate formulas. The usual recommendation is ChatGPT, and it is genuinely good. But ChatGPT is not the only option any more, and it is not always the best one -- especially if you want something free, private, or runnable on your own hardware.

Google's Gemma 4, OpenAI's GPT-4o (free tier), and Meta's Llama 4 all represent a new generation of AI models that can handle Excel formula requests with surprising competence. I have spent the last two weeks running structured tests across all three to find out which one actually delivers the most reliable Excel output. This guide shares the results, including the exact prompts I used and the formulas each model returned.

If you are comparing the broader AI landscape beyond Excel, see my full Gemma 4 vs ChatGPT vs Claude vs Copilot comparison. This post focuses specifically on spreadsheet formula generation.

Why Comparing Free AI Models for Excel Matters

Paid AI tiers give you priority access, faster responses, and sometimes better models. But many professionals -- freelancers, small business owners, students, and teams in cost-conscious organisations -- rely on the free tier or open-weight alternatives. The question is not whether AI can help with Excel formulas. It clearly can. The question is whether free and open models are good enough for real work.

The answer, as I discovered in testing, is that they are -- but with meaningful differences in where each model excels and where it stumbles. Understanding those differences saves you from wasted prompts and incorrect formulas ending up in live workbooks.

The Three Models at a Glance

Gemma 4 by Google

Gemma 4 is Google's open-weight model, available in several sizes from 2B to 27B parameters. Because it is open-weight, you can run Gemma 4 locally on your own machine using tools like Ollama or LM Studio. This means no usage caps, no data leaving your network, and zero ongoing cost. For Excel work, the 12B and 27B variants are the most capable. I tested with the 27B variant running locally, which is roughly comparable in capability to the free tiers of cloud-hosted models.

GPT-4o Free Tier by OpenAI

GPT-4o is OpenAI's multimodal model, accessible for free through ChatGPT with usage limits. The free tier provides access to the full GPT-4o model but with a message cap that resets periodically. For Excel formula work, it remains one of the most polished options thanks to OpenAI's extensive instruction tuning. I used the free tier through the ChatGPT web interface.

Llama 4 by Meta

Llama 4 is Meta's latest open-weight model family. Like Gemma, it can be run locally, though the larger variants require significant hardware. I tested with the Llama 4 Scout variant, which is designed for efficient deployment while maintaining strong reasoning capabilities. Access is available through Meta's platform or locally via Ollama.

Head-to-Head Test: 5 Real Excel Formula Prompts

I gave each model the same five prompts, using identical wording and context. Each prompt describes a realistic Excel scenario with column references and expected output. I evaluated the results on three criteria: formula correctness (does it produce the right answer?), explanation quality (does the model explain the logic clearly?), and edge-case handling (does it account for blanks, errors, or unusual inputs?).

For a deeper look at how to structure prompts for AI-generated formulas, see my guide on AI prompts for Excel that actually work.

Test 1: SUMIFS With Multiple Conditions

Prompt: "I have a sales table in Excel. Column A is the salesperson name, column B is the region (North, South, East, West), column C is the product category, and column D is the sale amount. Write a formula for cell F2 that sums all sales from the North region for the Electronics category."

Gemma 4: Returned =SUMIFS(D:D, B:B, "North", C:C, "Electronics"). Correct on the first attempt. Provided a clear explanation of how SUMIFS evaluates multiple criteria simultaneously. Also suggested using a table reference format as an alternative.

GPT-4o: Returned =SUMIFS(D:D, B:B, "North", C:C, "Electronics"). Identical correct formula. Added a longer explanation with a step-by-step breakdown of each argument. Also proactively mentioned that SUMIFS is case-insensitive.

Llama 4: Returned =SUMIFS(D:D, B:B, "North", C:C, "Electronics"). Correct. Explanation was adequate but briefer than the other two. Did not mention case sensitivity or alternatives.

Verdict: All three nailed this one. SUMIFS is well-represented in training data, so this is expected. GPT-4o gave the most thorough explanation.

Test 2: Nested IF for Classification

Prompt: "Column E contains exam scores (0-100). Write a formula for F2 that classifies each score: 90 and above is Distinction, 75-89 is Merit, 60-74 is Pass, below 60 is Fail."

Gemma 4: Returned =IF(E2>=90, "Distinction", IF(E2>=75, "Merit", IF(E2>=60, "Pass", "Fail"))). Correct. Also suggested using IFS as a cleaner alternative for Microsoft 365 users and mentioned wrapping the formula in IFERROR to handle blank cells.

GPT-4o: Returned the same nested IF structure. Also offered the IFS alternative. Went further by suggesting a SWITCH-based approach and explaining why nested IFs evaluate top-down, which matters for the threshold ordering.

Llama 4: Returned a correct nested IF. Did not mention IFS or SWITCH. The explanation was functional but did not address edge cases like blank cells or non-numeric entries.

Verdict: All correct. Gemma 4 and GPT-4o both added practical edge-case handling suggestions. GPT-4o's explanation was the most educational.

Test 3: INDEX-MATCH With Multiple Criteria

Prompt: "I have a pricing table. Column A is the product name, column B is the size (Small, Medium, Large), and column C is the price. Write a formula for cell G2 that looks up the price for the product in E2 and the size in F2. There may be duplicate product names with different sizes."

Gemma 4: Returned =INDEX(C:C, MATCH(1, (A:A=E2)*(B:B=F2), 0)) with a note that this is an array formula and should be entered with Ctrl+Shift+Enter in older Excel versions, or will work normally in Microsoft 365. Correct and well-explained.

GPT-4o: Returned the same INDEX-MATCH approach. Also offered an XLOOKUP alternative using a helper column and a FILTER-based approach for Microsoft 365. Very thorough, but the additional options could overwhelm a beginner.

Llama 4: Returned =INDEX(C:C, MATCH(E2&F2, A:A&B:B, 0)). This concatenation approach works but is less robust -- it can produce false matches when product names and sizes happen to concatenate to the same string (for example, "LargeBox" + "Small" matching "Large" + "BoxSmall"). The model did not flag this limitation.

Verdict: Gemma 4 and GPT-4o both produced the more robust multiplication-based array approach. Llama 4's concatenation method works in most cases but has a subtle flaw it failed to mention. This is the kind of edge case that matters in production workbooks. For more on lookup formulas, see my advanced Excel formulas guide.

Test 4: Text Extraction Between Delimiters

Prompt: "Column A contains order codes formatted like 'ORD-12345-UK'. Write a formula for B2 that extracts just the numeric part (12345) from the code, assuming the format is always PREFIX-NUMBER-SUFFIX with hyphens as delimiters."

Gemma 4: Returned =MID(A2, FIND("-",A2)+1, FIND("-",A2,FIND("-",A2)+1)-FIND("-",A2)-1). Correct. Explained the nested FIND logic clearly, noting that the second FIND starts searching after the first hyphen.

GPT-4o: Returned an identical MID-FIND formula. Also offered a TEXTSPLIT alternative for Microsoft 365: =TEXTSPLIT(A2,"-",,2) to grab the second segment. Additionally suggested wrapping in VALUE() to convert the extracted text to a number if needed for calculations.

Llama 4: Returned a similar MID-FIND formula but with slightly different variable naming in the explanation that made it harder to follow. The formula itself was correct.

Verdict: All three produced correct formulas. GPT-4o's TEXTSPLIT suggestion was a genuinely useful addition for Microsoft 365 users. Gemma 4's explanation was the clearest.

Test 5: Running Total That Resets Monthly

Prompt: "Column A has dates and column B has daily revenue amounts. Write a formula for C2 that creates a running total of revenue that resets to zero at the start of each new month."

Gemma 4: Returned =SUMIFS(B$2:B2, MONTH(A$2:A2), MONTH(A2), YEAR(A$2:A2), YEAR(A2)). Correct and elegant. The expanding range with the fixed start row creates the running total, and the MONTH/YEAR criteria ensure it resets each month. Also noted that the data must be sorted by date for this to work correctly.

GPT-4o: Returned a SUMPRODUCT-based approach: =SUMPRODUCT((MONTH(A$2:A2)=MONTH(A2))*(YEAR(A$2:A2)=YEAR(A2))*B$2:B2). Also correct, and avoids potential array formula complications in older Excel versions. Provided a good explanation but did not mention the date-sorting requirement.

Llama 4: Returned =SUMPRODUCT((MONTH($A$2:A2)=MONTH(A2))*($B$2:B2)). This formula omits the YEAR check, which means it would incorrectly combine January 2025 with January 2026 if the data spans multiple years. This is a meaningful error for any dataset longer than 12 months.

Verdict: Gemma 4 was the cleanest and most complete. GPT-4o was correct but missed the sorting caveat. Llama 4 had a real logic bug. This test revealed the biggest quality gap between the models.

Overall Comparison Table

CriteriaGemma 4 (27B)GPT-4o (Free Tier)Llama 4 (Scout)
Formula Accuracy5/5 correct5/5 correct3/5 correct (2 had issues)
Explanation QualityClear and conciseThorough, sometimes verboseAdequate but surface-level
Edge-Case HandlingStrong -- flags limitationsStrong -- offers alternativesWeak -- misses subtle bugs
SpeedDepends on local hardwareFast (cloud-hosted)Depends on local hardware
PrivacyFull (runs locally)Data goes to OpenAIFull (runs locally)
CostFree (open-weight)Free with message limitsFree (open-weight)
Best ForAccurate formulas + privacyBest explanations + polishSimple formulas only

Which Model Handles Edge Cases Better?

Edge-case handling is where the real differences emerge. Simple formulas -- SUMIFS, basic IF, straightforward VLOOKUP -- work well across all three models. The trouble starts when the prompt involves multi-step logic, date boundaries, potential data inconsistencies, or version-specific functions.

Gemma 4 was the most consistent at flagging potential problems proactively. When it generated the running total formula, it noted the date-sorting requirement without being asked. When it used an array formula pattern, it mentioned the Ctrl+Shift+Enter requirement for legacy Excel. This kind of contextual awareness makes a real difference when you are building workbooks that other people will use.

GPT-4o excelled at offering alternative approaches. For nearly every prompt, it gave two or three ways to solve the same problem, often including modern Microsoft 365 functions alongside legacy-compatible versions. This is incredibly useful if you are working across teams with different Excel versions. However, it occasionally missed practical caveats (like the sorting requirement) while being thorough about syntax.

Llama 4 was the weakest on edge cases. The concatenation approach for INDEX-MATCH and the missing YEAR check in the running total are exactly the kind of errors that look correct at first glance but cause real problems in production. If you use Llama 4 for Excel formulas, plan to review every output more carefully. For a broader look at where Gemma 4 outperforms paid alternatives, see 5 real-world tasks where Gemma 4 beats paid AI models.

My Recommendation Based on Use Case

After running these tests and using all three models across my training workshops, here is how I would choose:

Use Gemma 4 if you want the best balance of accuracy, privacy, and cost. Running it locally means no message limits, no data leaving your machine, and reliable output for complex formulas. It is the model I now recommend to my corporate training clients who handle sensitive financial data in Excel. If you want to set it up, follow my step-by-step guide to running Gemma 4 locally.

Use GPT-4o free tier if you value the best explanations and the most polished user experience. If you are learning Excel formulas and want to understand the reasoning behind each function, GPT-4o's detailed breakdowns are genuinely educational. The message limit on the free tier is the main constraint. For a deeper comparison of ChatGPT with other paid models, see my ChatGPT Excel guide.

Use Llama 4 if you are already in the Meta ecosystem or specifically need a model that you can fine-tune for your own use cases. For standard Excel formula generation, Gemma 4 is the stronger open-weight choice. Llama 4 is better suited to broader language tasks where its larger context window provides an advantage.

For a comparison that includes paid models like Claude and Copilot alongside these free options, see my Gemma 4 vs ChatGPT vs Claude vs Copilot comparison.

Tips for Getting Better Excel Formulas From Any AI Model

Regardless of which model you choose, these prompting habits will improve your results:

  • Specify your column layout. Always tell the model which columns contain which data. "Column A has dates, column B has amounts" is far more useful than "I have a spreadsheet with some data."
  • Mention your Excel version. Functions like XLOOKUP, TEXTSPLIT, LAMBDA, and LET are only available in Microsoft 365 and Excel 2021+. If you are on Excel 2016, say so.
  • Describe edge cases explicitly. If your data might contain blanks, errors, or duplicates, mention this in the prompt. AI models handle edge cases much better when told about them upfront.
  • Ask for explanations. Even if you just need the formula, asking "explain how this works step by step" forces the model to verify its own logic, which often catches errors in the formula itself.
  • Test with sample data. Always paste the AI-generated formula into a test workbook with known data before using it in production. This is non-negotiable regardless of which model you use.

For 60 ready-to-use prompts structured around these principles, see my AI prompts for Excel guide.

How Gemma 4 Compares to Gemini for Excel Work

A common source of confusion is the difference between Gemma and Gemini. Gemma 4 is Google's open-weight model that you can run locally. Gemini is Google's cloud-hosted commercial model integrated into Google Workspace and available through the Gemini API. For a detailed breakdown of when to use which, see my Gemma 4 vs Gemini comparison.

For Excel-specific work, Gemini has the advantage of direct integration with Google Sheets (not Excel), while Gemma 4 requires a separate interface. If you work primarily in Excel rather than Sheets, Gemma 4 running locally through Ollama gives you more control. If you use Google Sheets, Gemini's built-in capabilities are worth exploring.

Can Gemma 4 Replace a Dedicated Excel AI Tool?

Not entirely, and that is not really the right framing. Tools like Microsoft Copilot have direct access to your workbook context -- they can see your data, understand your table structures, and modify cells in place. Gemma 4, GPT-4o, and Llama 4 are general-purpose language models that generate formulas from your text description. You still need to copy the formula into your spreadsheet and verify it against your actual data.

That said, for pure formula generation -- which is the most common AI use case in Excel -- these free models are remarkably capable. The gap between a free open-weight model and a paid integrated tool is much smaller than most people expect. For more on this topic, see whether Gemma 4 can replace ChatGPT for spreadsheet work.

For a detailed walkthrough of how Claude handles Excel formulas specifically, see my Claude AI for Excel formulas guide.

Frequently Asked Questions

Which free AI model is most accurate for Excel formulas?

In testing across five common formula types, Gemma 4 and GPT-4o (free tier) both scored highly for accuracy. Gemma 4 edged ahead on complex nested logic and multi-criteria lookups, while GPT-4o provided slightly better plain-English explanations. Llama 4 was competitive on simpler formulas but struggled with edge cases in advanced scenarios.

Can Gemma 4 handle advanced Excel formulas like INDEX-MATCH and SUMIFS?

Yes. Gemma 4 handles SUMIFS, INDEX-MATCH with multiple criteria, and nested IF statements reliably. In head-to-head tests, it produced correct formulas on the first attempt for four out of five advanced prompts, including a running total that resets monthly. The key is providing clear context about your column layout and Excel version.

Is Gemma 4 better than ChatGPT for Excel work?

Gemma 4 is comparable to GPT-4o on formula accuracy and slightly better at structured, multi-condition formulas. However, ChatGPT (GPT-4o) offers a more polished chat interface and tends to explain formulas more thoroughly. For pure formula generation on a budget, Gemma 4 is an excellent choice, especially if you run it locally.

Can I run these AI models locally for free Excel formula help?

Gemma 4 and Llama 4 are both open-weight models that can be run locally using tools like Ollama or LM Studio. GPT-4o is only available through OpenAI's platform. Running locally means no usage limits, full privacy, and zero cost after initial setup. See my guide on running Gemma 4 locally for step-by-step instructions.

Sources & Further Reading

Related Posts

Ready to level up your Excel skills?

The Complete Excel Course with AI Integration takes you from formulas to production-grade spreadsheets, with real projects and AI-assisted workflows.

Explore the Excel + AI course