
Executive summary
If you feed buggy code to an LLM you’ll get back buggy suggestions — and you’ll pay for them. The secret: fix as much as possible locally first, then send the smallest, most precise context the LLM needs. This guide gives a practical system you can adopt now:
- Local preflight: lint, unit tests, minimal reproducible example (MRE) generator.
- Prompt hygiene: diff-only prompts, test-driven prompts, and strict output formats.
- CI gating: only call LLM from CI when pre-checks pass or when a focused, failing-test payload is published.
- Token-aware engineering: estimate tokens, calculate cost, and budget.
- Developer tooling & templates: pre-commit hooks, Python/Node scripts, GitHub Actions examples.
Follow this and you’ll cut wasted tokens, shorten review cycles, and produce higher-quality LLM outputs.
Why engineers waste tokens
Common anti-patterns:
- Dumping entire repositories into the prompt.
- Asking for “fix my code” without failing tests or clear error output.
- Sending code with unresolved syntax errors or missing imports.
- No preflight: you ask the LLM first, then debug its output manually.
Consequences: wasted tokens (money), longer iteration times, lower signal-to-noise from LLMs, and risky incorrect code merged into prod.
The CyberDudeBivash 5-Step Workflow
- Local preflight — lint + run tests + reproduce error.
- Minimize context — produce a minimal reproducible example (MRE).
- Prompt for a patch — use a strict template asking for patch only or diff only.
- Validate — run returned patch inside sandbox/tests automatically.
- CI gate & telemetry — only accept LLM-assisted changes when tests pass and token-cost budget respected.
Practical toolset
- Linters:
flake8/pylint(Python),eslint(JS/TS). - Formatters:
black,prettier. - Unit tests:
pytest,unittest,jest. - Local sandbox: Docker +
docker-composeor ephemeral VMs. - Pre-commit:
pre-commithooks. - Token estimation helper: small script (below).
- CI: GitHub Actions (examples later).
Affiliate picks (recommended — use our affiliate links on your site):
- JetBrains Fleet / IntelliJ (IDE productivity; affiliate link placeholder).
- GitHub Copilot (assist, but use after preflight).
- Replit / Gitpod (ephemeral dev sandboxes).
(Include affiliate disclosure on publish.)
Preflight scripts & pre-prompt checklist
Pre-prompt checklist
- Code compiles / lints locally (
flake8/eslint) - Unit tests reproduce the failing behavior (
pytest/jest) - Minimal Reproducible Example (MRE) created — unrelated code removed
- Expected vs actual output logged (include traceback)
- Token budget estimated for the prompt (see calculator below)
- CI/CD gating strategy defined (where LLM patch will be validated)
Minimal reproducible example (MRE) template
Create mre.py that contains only:
- the function(s) under test
- the failing test case (assert)
- any minimal setup data (no large binary blobs)
Example (mre.py):
# mre.py
def add(a, b):
return a + b # failing due to edge-case elsewhere
def test_add():
assert add(1, "2") == 3 # shows type error / failing case
Always include the test runner output (stack trace) with your prompt.
Prompt templates — be strict: ask for diff only
Template: “Patch-only prompt”
CONTEXT:
- Language: Python 3.11
- File: add_utils.py (shown below)
- Test: test_add_fails.py (shown below)
- Failing pytest output: (paste entire traceback)
TASK:
Return a unified diff (git-style) patch that fixes the bug so that `pytest -q` passes for the provided test. Only return the patch, nothing else.
FILES:
<<insert only the minimal files: add_utils.py, test_add_fails.py >>
Important: insist Only return the patch — not explanations. That avoids extra tokens and speeds up programmatic application.
Example — small Python patch flow
- Developer reproduces failing test:
$ pytest tests/test_add.py -q
F
================================= FAILURES ===================================
___________________________ test_add_with_string ____________________________
tests/test_add.py:5: AssertionError
> assert add(1, "2") == 3
E TypeError: unsupported operand type(s)
- Build the MRE and include only
add.pyandtests/test_add.pyin the prompt. - Send the Patch-only prompt (above). LLM returns unified diff:
*** Begin Patch
*** Update File: add_utils.py
@@
-def add(a, b):
- return a + b
+def add(a, b):
+ try:
+ return int(a) + int(b)
+ except Exception:
+ raise TypeError("add: both args must be numeric or numeric-strings")
*** End Patch
- Apply patch and run tests automatically in CI.
Pre-commit & local automation
Add a pre-commit hook that runs lint and tests before letting you call the LLM:
.pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/mirrors-eslint
rev: v8.40.0
hooks:
- id: eslint
- repo: https://github.com/psf/black
rev: 23.9.1
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
call-llm.sh (only after lint/tests pass)
#!/usr/bin/env bash
pytest -q || { echo "Tests fail — fix locally first"; exit 1; }
python estimate_tokens.py --files add_utils.py tests/test_add.py --prompt-template prompt.txt
# if token budget OK, call LLM
# call your LLM client here (curl / openai sdk)
CI pattern: GitHub Actions — only call LLM when tests reproduce AND MRE provided
/.github/workflows/llm-assist.yml
name: LLM Assist Patch Flow
on:
workflow_dispatch:
inputs:
token_budget: { required: true, default: 2000 }
jobs:
preflight:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v4
with: python-version: '3.11'
- name: Install deps
run: pip install -r requirements.txt
- name: Run tests
run: pytest -q
- name: Check MRE present
run: test -f mre.py || (echo "MRE missing" && exit 1)
llm-call:
needs: preflight
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Estimate tokens
run: python estimate_tokens.py --files mre.py --prompt-template prompt.txt --budget "${{ github.event.inputs.token_budget }}"
- name: Call LLM
if: success()
run: |
# call your LLM using secured token
python call_llm.py --prompt-file prompt.txt
- name: Apply patch and run tests
run: |
git apply patch.diff
pytest -q
This enforces: tests must reproduce locally (preflight), MRE must exist, token budget checked, LLM only called from CI with secure keys.
Token estimation & cost calculator (simple, exact arithmetic)
Estimating tokens from characters
A practical rule: 1 token ≈ 4 characters in English (approximation, use your tokenizer for exact).
Formula: estimated_tokens = ceil(total_chars / 4)
Example calculation (step-by-step):
Suppose your prompt (files + traces) is 42,372 characters long.
- Divide by 4: 42,372 / 4 = 10,593.
- Round up (if needed): estimated_tokens = 10,593 tokens.
Cost example
Assume model price = $0.02 per 1,000 tokens (example pricing used solely for illustration).
- Tokens = 10,593.
- Thousands-of-tokens = 10,593 / 1000 = 10.593.
- Cost = 10.593 * $0.02 = $0.21186.
- Rounded to cents = $0.21.
(Every arithmetic step above computed explicitly.)
Tip: keep prompts ≤ 2,000–3,000 tokens when possible to reduce cost and improve latency.
Smart prompt compression strategies
- Send failure + single-file MRE, not whole repo.
- Remove comments, large whitespace, and long sample data.
- Send only failing test and relevant functions.
- Send diffs instead of full files. If you must send file, gzip and include only essential parts.
- Use function signatures + types rather than full code when asking for algorithmic logic.
Prompt engineering patterns that save tokens
- Test-first prompt
I have the following failing pytest output (paste). Provide a git-style patch that fixes only the code necessary so tests pass. Only return the patch. - Diff-only prompt
Provide the current file and the desired behavior; ask for a unified diff patch. - Small-step prompt
Ask for a single small change (e.g., function fix) rather than end-to-end rewrite. - Strict format enforcement
“Return JSON only with fields {patch, tests_run, success}” — easier to parse and validate.
Validation harness — run returned patch automatically
validate_patch.py (conceptual)
import subprocess, sys
# apply patch
subprocess.run(["git", "apply", "patch.diff"], check=True)
# run tests
r = subprocess.run(["pytest", "-q"], capture_output=True, text=True)
print(r.stdout)
if r.returncode != 0:
print("Patch failed tests", r.stderr)
sys.exit(2)
print("Patch validated")
Use this in CI step immediately after receiving the patch.
Defensive prompts & guardrails (reduce hallucinations)
- Ask LLM to not invent imports or API calls. Provide the exact dependency list or require code to only use the existing project imports.
- Request executable code only; require
pytestto pass in CI. - If the LLM returns explanations, automatically reject and re-run with
Only return patchenforcement.
Common real-world patterns & examples
Pattern: Runtime type errors in Python
- Preflight: run
mypy/pytest. - Prompt: include failing traceback and function signature.
- Patch: LLM suggests type coercion or validation.
- Validation: run tests — success -> merge.
Pattern: Frontend CSS/JS regressions
- Preflight: run
npm run test,eslint, and visual regression (percy) or unit tests. - Prompt: include failing test and minimal component snippet.
- Patch: LLM returns specific component diff.
FAQ
Q: When should I NOT use an LLM for code?
A: Don’t use it to fix failing tests if you can’t produce an MRE, or when code involves secrets/crypto primitives you cannot validate locally. Use LLMs more for design/boilerplate than for security-critical code unless heavily validated.
Q: How often should I call an LLM?
A: Prefer fewer, highly focused calls. Use local automation to reduce repetitive prompts.
Q: What about using LLMs as pair-programming assistants?
A: Great, but keep the same disciplines: run tests locally first, then ask LLM to suggest concise changes.
Metrics & KPIs to track
- Tokens consumed per merged PR (baseline vs. post-adoption).
- % of LLM-assisted patches that pass CI on first application.
- Mean time to first green build (MTTFGB) for LLM-assisted PRs.
- Token cost saved per sprint.
Integration checklist
- Pre-commit hooks installed and enforced.
- MRE template created in
/mre/and required for LLM requests. - CI workflow includes
estimate_tokens.pyandvalidate_patch.py. - Token budget per PR set and monitored.
- Post-merge telemetry enabled (tokens/PR, success rate).
Leave a comment