Prompt Engineering Is Software Engineering Now

A year ago, prompt engineering was copy-pasting from Twitter threads. Now it's the most impactful part of our LLM applications — and we treat it with the same rigor as code.

How we manage prompts

Version controlled. Every prompt lives in a file, in Git, with a changelog. No prompts in code strings.
Tested. We have a test suite of 50+ input/output pairs for each critical prompt. When we change a prompt, we run the suite.
Reviewed. Prompt changes go through PR review, just like code changes. A teammate reviews the intent, structure, and test results.
Documented. Each prompt has a README explaining what it does, what edge cases it handles, and what it's known to fail on.

Prompt architecture patterns

System → Context → Task → Format. This four-part structure works for 90% of our use cases. System sets the role, context provides the background, task defines the action, format specifies the output.
Few-shot over zero-shot. 3-5 examples in the prompt consistently outperform instructions alone. Pick examples that cover edge cases, not just the happy path.
Chain of thought for complex tasks. Ask the model to reason step by step for multi-part tasks. Not for simple extraction — it just adds latency without value there.

The testing problem

LLM outputs are non-deterministic. You can't write assertEqual tests. Instead, we use:

Schema validation — did the output match the expected JSON structure?
Keyword checks — does the output contain required fields or phrases?
LLM-as-judge — use a second model to evaluate whether the output is correct (surprisingly effective for subjective quality)