New

A Beginner's Guide To Prompt Engineering

For the past few years people in the AI community have been promoting that prompt engineering is the future and the only career option you'll soon have. And if you don't learn it now, you're going to be out of luck when finding a job.

As someone who's been spending the past few months working on a new AI startup, I can say that it definitely is a complex skill to acquire and to get good at.

So in this post I'll break down much of what I've learned. The more technical side of prompt engineering that often doesn't get talked about. We're talking structured schemas, tokenization, temperature and parameter configurations. The fun stuff.

What people think it is

Let's start here, because I often talk to many people that proclaim they are prompt engineering, but really, they're just typing prompts into ChatGPT and then storing them in an Excel sheet.

Technically not 'wrong', but I'd consider that maybe a Jr. level prompt engineer. The following is indeed a prompt that I engineered.

Write a JavaScript function that validates an email :)

And on the more complex side, you often times see massive prompts that span thousands of tokens with the hope that the output is somehow 100% perfect.

# Modern Next.js Portfolio Site Template  
  
Please create a modern, responsive portfolio website using Next.js 14  
 with the following specifications:  
  
## Project Structure  
- Use the Next.js App Router  
- Implement TypeScript for type safety  
- Structure the project with the following directories:  
  - app/ (for routes and layouts)  
  - components/ (reusable UI components)  
  - lib/ (utility functions and configurations)  
  - public/ (static assets)  
  - styles/ (global styles and CSS modules)  
...hundreds of lines more...

There's a chance that you'll actually end up with a more accurate response in the simple prompt above than the complex one. I've seen plenty of long-form prompts online that claim to do miracles, like build and launch an entire SaaS product in an hour. But when you actually go and paste that prompt into any model, the results are random and chaotic at best.

Now that we've covered the common misconceptions, let's dive into the core technical components that truly define prompt engineering

Understanding the models

Every AI company implements their own models, versions, and modalities. Some offer multimodal capabilities spanning text, speech, and image recognition, while others specialize in text-only processing but may excel in particular domains.

Context window sizes vary dramatically between models—some only accept up to 4,096 tokens, while others handle 100,000 or more. However, a larger context window doesn't automatically guarantee better results. Smaller context models can sometimes deliver more focused responses for specific tasks, while larger models can maintain comparable accuracy while processing much more information.

When comparing model quality, specialization matters. Domain-specific models trained on relevant data often outperform general-purpose models on targeted tasks, even with fewer parameters or smaller context windows.

Pricing structures also differ significantly across providers. More expensive models don't necessarily deliver higher accuracy; sometimes you're paying for faster inference times, higher reliability, specialized knowledge, or enhanced capabilities rather than raw performance.

The optimal choice depends on your specific use case, required response quality, and operational constraints.

Costs

If you're doing all of your prompt engineering on a browser client, like ChatGPT or Claude or Gemini, then your costs are essentially the monthly subscription fee.

If your using any of the API's however, then pricing can get tricky. Most providers offer different tiers of pricing depending on the specific model that you use.

Anthropic API models and pricing
OpenAI models and pricing

And pretty much every provider charges based on the number of tokens that you're using per request. For most developers just getting started, the pricing is typically so low that you won't have to stress too much.

As an example, running a few dozen tests per day with moderate token sizes costs me a few cents on average.

But if you know that you're going to be pushing volume at some point, then you definitely need to take this into consideration when creating your prompts.

Most people don't realize that the input prompt goes towards your token limit, meaning if you have a 2000 token prompt, then you only have 2096 tokens left for output (on a 4096 max model).

Tokenization

Tokenization is how AI models break down text into processable units called tokens. These tokens serve as the fundamental building blocks that language models can understand and manipulate.

From a technical standpoint, tokens vary significantly across models. A token might represent a single character, a complete word, a word fragment, or even punctuation. This variability stems from each model's specific tokenization algorithm, which is designed to optimize processing efficiency and language understanding.

For example, the sentence "I love programming" might be tokenized as:

["I", " love", " programming"]

While a complex word like "unbelievable" might be broken down as:

["un", "believ", "able"]

Understanding tokenization becomes crucial when working with models that have token limits. These limits constrain how much text you can include in a single prompt (input) and how much the model can generate in response (output).

English text typically averages about 0.75 tokens per word, but this ratio varies by language and writing style. Technical content with specialized terminology often consumes more tokens than conversational text.

Token usage directly impacts:

Costs: Most commercial AI services charge by token count
Response time: Longer prompts generally take more time to process
Context utilization: Effective use of available context window

Prompt engineers often need to balance comprehensiveness with conciseness, crafting prompts that provide sufficient guidance while minimizing token consumption. This becomes particularly important when working with complex tasks requiring detailed instructions or when processing large documents within token limitations.

Temperature

Temperature is a key parameter that controls the randomness and creativity in AI-generated text. Ranging typically from 0 to 1 (or sometimes 0 to 2), this setting fundamentally alters how the model selects its next tokens when generating responses.

At a low temperature (0-0.3):

The model becomes highly deterministic, consistently selecting the most probable tokens
Responses are more predictable, factual, and conservative
Ideal for tasks requiring accuracy and consistency: factual Q&A, technical documentation, or logical reasoning

At a medium temperature (0.4-0.7):

The model balances predictability with variety
Outputs maintain coherence while introducing some creative elements
Well-suited for conversational interfaces, explanations, and general-purpose applications

At a high temperature (0.8-1.0+):

The model explores lower-probability token selections
Responses become more diverse, surprising, and creative
Excellent for creative writing, brainstorming, poetry, and unconventional thinking

Temperature doesn't just affect creativity. It impacts the risk profile of your applications. Higher temperatures can produce more engaging content but may also increase the likelihood of errors, logical inconsistencies, or off-topic diversions.

The optimal temperature setting depends heavily on your specific use case and quality requirements. For mission-critical applications, consider implementing human review for high-temperature outputs or using multiple generation attempts with varied temperatures and selecting the best result.

Handling Edge Cases in AI Systems

Edge cases represent the boundary conditions where AI systems are most likely to fail or produce unexpected results. While your prompt engineering might work flawlessly 90% of the time, that remaining 10% can significantly impact user experience, reliability, and the overall viability of your application.

For production-grade applications with substantial user bases, addressing these edge cases becomes critical. Consider implementing:

Prompt Guardrails:

Include explicit instructions to handle known problematic patterns
Provide clear format specifications: "Return only JSON with these exact keys..."
Set boundaries on content: "Provide exactly three recommendations, no more and no less"
Add fail-safe instructions: "If you're uncertain about any answer, acknowledge this uncertainty"

Post-Processing Validation:

Implement programmatic validation of AI outputs before displaying them to users
Verify structural integrity (JSON parsing, expected fields)
Check content against business rules and constraints
Sanitize outputs to remove potentially harmful or malformed content

Robust Error Handling:

Develop graceful fallback mechanisms when the AI response doesn't meet quality thresholds
Consider multi-pass approaches where initial outputs are refined through follow-up prompts
Implement automatic retry logic with modified prompts when failures are detected

Remember that even with extensive guardrails, some edge cases will inevitably emerge in production.

Design your systems with the assumption that AI components will occasionally fail, and ensure these failures don't compromise the core user experience or system integrity.

Contextual priming

Priming is a technique used to guide an AI models responses by providing some kind of relevant background information that it may need. It's important to remember that AI models don't have memory by default, and your prompt should contain any and all data needed in order to produce a specific output.

So if you're writing a prompt that helps someone to learn quantum physics, instead of writing:

"Explain quantum physics"

You would rather prompt:

"Explain quantum physics to a high school   
student using simple language and focus on superposition"

The first prompt above might return an overly complex explanation that gives a bad user experience, while the more specific prompt primes the model ahead of time, in order to produce a more directive response.

You can also prime the model by giving it a specific persona as well, such as a known figure or a known role:

Take on the role of a quantum physics   
professor, who uses humor to teach complex topics.

Few-shot learning

Few-shot learning is a technique used to improve the accuracy of a response by giving the model a few examples in order to teach it how to output content in a specific way.

The more examples that you give, the better the chance that you'll get a response closer to what you are looking for.

And you can do so by specifying any sample input that's required, and the expected output of that sample input, followed by the real data and a placeholder for the output.

For example:

Sample Transcript: "The CEO emphasized the importance   
of customer satisfaction.   
A new product line will be introduced   
in Q2, and employees are encouraged   
to share ideas for innovation."  
Key Points:  
  
Customer satisfaction is a top priority.  
A new product line will be launched in Q2.  
Employees are encouraged to contribute innovative ideas.  
Transcript: {transcript}  
Key Points:"*

In the snippet above, the sample transcript and the key points after it are not at all related to the real output values. They are just used to show the model what it should be generating.

You then provide the model with the real information and the model assumes what it must do next.

It's definitely a clever technique that can help you get more exact with how you want to return data.

Chain of thought thinking

Chain-of-thought prompting is a technique where the AI model is encouraged to "think step-by-step" by explicitly guiding it through intermediate reasoning steps before arriving at a final answer.

This method is particularly effective for tasks that require logical reasoning, complex problem-solving, or multi-step calculations.

Instead of asking the model for a direct answer, you structure the prompt to encourage a progressive breakdown of the problem. The intermediate steps provide a clearer pathway for the model, improving its reasoning and accuracy.

For example, if you wanted to ask a simple math question without chain of thought thinking, you might prompt something like the following:

Prompt: "If a store has 7 apples and sells 3,   
how many are left?"  
 Response: 4

With chain-of-thought thinking, your prompt might look something like the following:

Prompt: "A store has 7 apples. It sells 3   
apples. First, determine how many   
apples are sold. Then subtract that   
number from the total apples.   
How many are left?"  
  
Response: The store starts with 7 apples. It sells 3 apples. Subtracting 3 from 7 gives 4 apples left.

By specifying the particular steps that you want the model to follow, you are able to get a more detailed and accurate response.

This technique might be more beneficial in cases where you have to process user input in a very specific way in order to get an exact response.

Response formatting

Most people are used to AI models responding to a prompt in a single response output and mainly as a giant string of text.

The latest models however are able to respond in more complex ways, such as structured JSON schemas that you yourself specify in the prompt.

For example, let's say that you were working on an app that generated random coding questions for people to practice with. It would be difficult if a model responded with a giant string that you would then have to parse yourself.

Instead, you can specify the JSON schema that you would like the response in and tell the model to only return the data in that format.

Generate 3 technical programming questions and   
return then in JSON with the following schema:  
[{title: 'title goes here', question: 'question goes here'}]

This is really the heart of prompt engineering. In a real-world application, you're going to need structured data in order to do anything with the UI/UX.

If you're building someone a resume using AI then you need to know what data corresponds to their past experience, education, skills, etc.

Do note though, that this isn't guaranteed to work 100% of the time either. From my experiments so far, it works 'most' of the time. But on occasion, the model might respond with a "You got it. Here's the data! [{...". Which will make it difficult to parse out.

But most of these edge cases can also be fixed by specifying to the model that it shouldn't return anything else except for valid JSON. But again, this seems to work 'most' of the time.

A few last words

Prompt engineering is evolving rapidly. The rules change as models improve, meaning that the best engineers are those who stay adaptable and continuously refine their methods.

Being able to craft concise input strings for an AI model in order to generate exact and accurate data on the other end is going to be huge in the near future as people interact more and more with artificial intelligence.

Back to Listing

credit: