Prompt Control – The mind behind the machine — Part 4Structured Output generation is the untold technique to tame your language model and get what you want. And here is how…In the past weeks we discovered together the power of system prompts. What is the real difference among huge LLM and Small Language Models? I believe every one of us can give a reply to this question. Certainly ChatGPT or Claude or Qwen are powerful AI, running 50 billion plus parameters under the hood. But when you force the output structure of a Small Language Model, even a mere 3 Billion model can achieve the same performances of the mentioned above. And they can run even on your computer. You don’t need to pay a subscription, or pray that your free API quota (like an openrouter.ai one) is depleted. I am putting together a special package for the upcoming Black Friday. You can join the wait-list to be the first to know about it… why?
Meantime… Today I will show you how to do it yourself with two practical (and useful) examples. What you need is only:
To recap Taming the Beast — Controlling LLM OutputThe reality is that little I knew about all of this before I started asking questions in forums, replying to Medium articles, and running endless tests on my local machine. Sure, prompting helps. But if you want predictable, programmable responses — not just poetic guesses — you need more than vibes and trial-and-error. You need output control. And there are actually three powerful ways to force an LLM to generate exactly what you want, when you want it: 1. Grammar-Based Sampling (GGUF-style)This one’s for the hardcore tinkerers. With tools like Example: Want only
2. Logit BiasHere, you manually boost or suppress specific tokens. For instance, if you don’t want the model saying “maybe,” you can penalize those words at the logits level before sampling. It’s like tuning a radio — you reduce noise by turning down unwanted frequencies.
3. Structured Output (my preferred option)This is the structured output generation , an elegant middle ground. Instead of fighting tokens or writing Backus-Naur Form grammars (honestly more difficult than ancient Greek grammar…), you tell the model:
Behind the scenes, the API (like OpenAI’s
|