AI Glossary Guide | AI Terminology Cheat Sheet

AI is complex, technical and has swiftly developed its own language.

Keeping up with AI terminology and knowing which concepts matter to you and your context can be tough. Some of the terminology can come across as AI jargon for things you already know.

But much of it is specific to the AI and encodes concepts and technical details that you need to understand to build. The upside of investing time in AI terminology is that understanding different concepts properly helps you be a better PM and builder.

In this article, we’re going to deep dive into 20 of the most critical AI concepts for PMs, covering key vibe coding vocabulary, eval terms, agent language and other key terms. For each term, we’ve not only explained what it means, but how it matters to PMs, and what, specifically, you need to care about.

Let’s get into the top vibe coding, eval and agent terminology for PMs now.

AI Terminology Cheat Sheet

We’ve also created a 90+ AI Terminology Cheat Sheet as an additional resource. It covers terms, definitions and things to know about the concepts.

AI Terminology Cheat Sheet

See this article as a how to guide, and the cheat sheet as a primer.

We’ve organized our AI Terminology Cheatsheet into 10 categories covering:

Prompting and Context Management: working well with LLMs
Evals and Quality: how to make sure things work at scale
Agents and Workflows: automation, products, and ways to build
Career and AI roles: AI assisted PMs vs AI product PMs
Model Behavior and Safety: where things often break
Foundational Concepts and Model Types: how they work
Data and Retrieval (RAG): creating libraries and guardrails
Training and Model Development: customizing and training models
Deployment and Operations: using AI in the wild
Neural Networks and Architecture: how models are built and work

5 Vibe Coding Concepts

Vibe Coding

Using AI assistants to write, debug, and refactor code through natural language conversations. Describe what you want in plain English; the AI generates working code you review and iterate on.

More information:

Vibe coding means using a model to produce quick code and working mini products or sites. It can also include data transformation pipelines and models.

Common tools include Cursor, Windsurf, and Replit for writing code, plus platforms like Lovable for rapid prototyping.

Vibe coding 101: Tools

Why you should care:

You can make working simple products, effective prototypes and improve alignment with customers and stakeholders. You can solve internal data problems.

Vibe coding is fun and semi-magical. You input a prompt and a clickable prototype appears in seconds.

In practical terms, think of it as changing the pace of product discovery and time to value.

You can validate user value before committing a sprint, and you can surface hidden work earlier, such as authentication, permissions, data quality, and latency. It also forces clearer requirements because ambiguity becomes visible immediately.

Examples:

Clickable prototype to validate a workflow and collect failure cases
An internal tool that either works right away or engineers can iterate on later
Generating multiple UX variants to test with users
Exploring requirements, including what data the feature actually needs

PMs who invest time in vibe coding typically improve technical skills, as well as working more efficiently by articulating and visualizing features.

That’s because you’re working with code and things go wrong. Common vibe coding failures include things like:

Dependency conflicts where the AI installs incompatible package versions
Authentication or output errors when accessing APIs
Environment mismatches where code works on one machine but not another.

When this happens, work with LLMs to fix it, and you’re technical understanding and competency will pick up fast.

Become a more technical product manager

Garbage in, garbage out is particularly true of vibe coding.

The speed with which you can output is incredible. But if you don’t know what you want, why it’s valuable, or how it should be built, that’s what you’ll get.

Another limitation is user context and bias.

A prototype can appear reliable because the builder knows the happy path. Real users will use the product differently, click different buttons, and explore edge cases. Hostile agents will look for weaknesses and exploit them.

Treat vibe coded output as a learning artifact. Just make a simple working thing first. Then you can move onto more complex products, plus work collaboratively with engineers to improve output with structured prompts, narrow tool interfaces, and eval coverage.

Getting started:

A simple thing: Build a minimal end to end flow or very simple product that works
Write down what you learned: Capture the assumptions required and learnings
Get better with the tools: Invest time in working more effectively and efficiently with the tooling (credit management, ping pong between LLMs, structured prompts etc)

Master vibe coding in a cohort: Build with AI

Prompt Engineering

The practice of designing and refining inputs to AI models to get better, more consistent outputs. Includes structuring instructions, providing examples, setting constraints, and iterating based on results.

More information:

Think of this as getting good at telling LLMs exactly what you want. All prompt engineering is creating and refining effective prompts. Done right, really good prompts improve outputs by reducing ambiguity.

A strong prompt makes the task concrete, names the allowed inputs, states constraints, defines the output shape, and defines what to do when required information is missing. Without clarity on context and what good looks like, the model will default to the mean.

Why you should care:

The better the prompt, the better the output, most of the time. This is especially the case when building features or agents with AI, as the prompt acts as the task list and PRD.

Getting started:

Job description mentality: Define the assistant task, scope, and constraints – you’re giving out work
Specify the output format: What format do you want the output in? Different systems have different requirements
What good looks like: Add examples of the sort of thing you want, including a hard edge case
Test: Create prompt variants and compare them using an eval process

Improving your skills here will have lots of knock on effects, including personal productivity or just better conversations with chatbots.

However things aren’t always going to work, no matter how great you get at this. Prompting hits limits when you need consistent behavior across thousands of examples. It will improve instruction following and consistency, but doesn’t guarantee factual correctness, or that going off script or hallucinating won’t occur. In short, a great prompt helps your average but never replaces validation.

If the output drives downstream automation, you still need schema validation, and fall backs based on risk profile.

ChatGPT Power Skills

System Prompt

The initial instructions given to an AI model that define its behavior, personality, constraints, and output format. Set once at the start of a conversation and persist throughout.

More information:

A system prompt is the highest priority instruction in most chat based stacks.

It defines the assistant role, what it should optimize for, what it must refuse, and how it should behave when uncertain.

It often includes rules for tool use, which matters because tools are where AI products become systems.

Why you should care:

A strong system prompt improves consistency and policy adherence. Inconsistency increases complexity, and complexity becomes untestable. If humans cannot test and audit it, you cannot maintain it safely.

Small edits can change downstream behavior in surprising ways. A system prompt that is too permissive can create compliance issues. A system prompt that is too rigid can block legitimate requests and degrade user experience.

Over time as you iterate and add clauses, system prompts can accumulate contradictions and become inconsistent. Regular cleanup and simplification helps maintain stable behavior. In plain language, as the conversation goes

on, it can become inconsistent.

You can manage this by:

Setting rules: Encoding safety, privacy, and compliance boundaries in your prompt as explicit rules
If this, then that: Defining how the assistant should behave when uncertain or blocked
Checks: Using human-in-the-loop steps before action
Consistency: Centralize tone and experience principles so behavior is consistent across multiple user interfaces and agents

Context Window

The maximum amount of text, measured in tokens, that a model can process in a single interaction. Includes your prompt, examples, retrieved documents, conversation history, and the model’s response.

More information:

The context window is the total input the model receives, including system prompt, user message, included history, retrieved documents, and tool outputs.

Think of this as the model working memory for a single run. If critical information is missing, buried, too big or noisy, expect failures. The model cannot use what it cannot see.

It helps to understand the size of context windows. Here’s some numbers: Claude (200K tokens), GPT 4 (128K tokens), GPT 4o mini (128K tokens). This article is roughly 6,000 tokens, for reference.

Why you should care:

Context decisions shape product viability for long workflows and long documents, because hitting context limits means you can’t include all necessary information.

More context can reduce hallucination when it provides the right input, but it can also increase confusion when it adds noise. So it’s not as simple as writing a lot, since maximum context doesn’t solve all problems. Your goal should be the minimum context that reliably produces the right behavior.

Context also drives cost and latency because larger inputs require more processing. Even under the limit, poorly organized context can be ignored or misread, and have knock on latency and cost effects.

How to use context windows effectively:

Bottom line up front: Keep a short fixed block of critical constraints near the top
Be selective: Include only the most relevant inputs and label them clearly
Ask for clarification: Demand a question first pattern when required information is missing.

JSON Mode

Feature in some LLM APIs that constrains the model to output machine readable JSON. It improves parseability (computer read reliability) compared with plain prompting, but still requires validation.

More information:

Think of this as a super helpful formatting mode. JSON is an abbreviation of Javascript Object Notation. It’s a way to send data easily between different programming languages.

When you’re building with AI, the language in which models output can have knock on effects to the reliability. It’s likely that you’ll produce outputs that are consumed by software, like databases. Free form text has limitations: ambiguity, consistency and so on.

Requesting JSON mode constrains models to format and then output a predictable structure, which in turn makes it work better with other softwares.

Why you should care:

If you want automation you need structure. If you want reliable monitoring you need consistency. JSON mode takes you from a prototype demo to a dependable component inside a larger system.

Benefits include:

Structured fields for routing, triage, and automation
Strict schemas with validation and controlled fallbacks
Structured output so monitoring can count and trend failure classes

One major limitation you need to be aware of when requesting JSON mode is that syntactically valid JSON is not the same as correct JSON. It’s a format, not a guarantee of expected code. A model can output perfectly valid JSON structure with incorrect content. Ping ponging between LLMs and conducting checks can help with this.

5 AI Eval Terms

AI Evals & AI Eval Systems

A structured framework combining automated checks and human review to measure whether your AI product works as intended. Includes code based tests, LLM as judge assessments, and user feedback loops.

More information:

This is quality assurance for AI agents and products. It’s how you monitor how well they work today and tomorrow

Continuous monitoring is particularly important for AI because it’s probabilistic (outcomes can change) rather than deterministic (if this, then that). That means models can regress and products can drift.

Probabilistic systems are inherently unstable, and outputs can change wildly from small changes to the inputs or any component of the product. Like all features or products, you need a way to measure improvements or drops as you make edits. Otherwise you’ve no idea if you’re actually improving the user experience or just making random edits.

AI evals work by running a representative sample of inputs through your product, and then checking the output quality, based on criteria you set. An AI eval system does this continuously, rather than once.

How to write effective AI evals

Why you should care:

AI behavior shifts with prompt edits, model swaps, context, and tooling, all of which happen a lot. Eval systems are your safety valve. When you make adjustments, you can iterate in a disciplined manner and understand trade offs and performance clearly.

They work by scoring output quality continuously according to preset criteria, and producing data outputs that can be monitored and reviewed by humans.

A typical eval starts with 50 to 200 real examples. These can come from pilot runs or real production data.

You then need to define clear pass fail criteria, and what you expect to see per example. The next thing to do is drill down into the data and understand failure modes that drive the overall quality score.

Eval limitations are the same as any other piece of analysis: if you’re using segmented or dummy data, you’re at risk of garbage in, garbage out. Evals are only as good as the dataset and criteria, and if either of these are biased or skewed you get misleading outputs.

Be aware that you don’t always need evals, especially if you’re still in early prototyping or MVP stages. Wait until you have real users or clear requirements, since building evals takes effort.

Traces

Complete record of an AI system’s execution for a single interaction. This is a logged record of one end to end run of your AI system for a single request (inputs, prompts or messages, retrieved context, tool calls, outputs, timing, errors, and metadata).

More information:

A trace is a log of an AI product’s internal workings for a single interaction. Collecting and monitoring traces is key for understanding the quality of your product, and what’s required for evals.

Traces show what inputs were provided, what context was assembled, what tools were called, what the tools returned, and what output was produced. In essence, they log what the system actually did for a specific request, at every stage of the journey. Without traces you only have the final response, which is rarely enough to understand what’s going on and how to improve it.

Why you should care:

When a user says the assistant was wrong, you need to know why:

Was the required data missing?.
Did a tool time out?
Did the prompt fail to enforce constraints?

You can use traces to diagnose how failures happened, reproduce issues, fix them and convert them into eval cases.

Be aware though that traces have many of the same constraints as non-AI data (i.e. GDPR). For example, traces can contain individual user data, and you therefore need to mitigate for privacy and who can access.

Finally, traces are heavy data. That means: they accumulate fast – thousands per day for active products – and a single trace might be 200-1000 lines of JSON logging each step: input received, context assembled, tools called, tool responses, model reasoning, final output. Many teams use specialized logging tools (LangSmith, Weights & Biases, Helicone) to make traces readable, and they have data warehousing implications.

AI evals course

Error Analysis

The systematic process of reviewing AI outputs, identifying patterns in failures, and categorizing them into distinct failure modes. Involves examining actual model outputs (traces) to understand what’s going wrong.

More information:

Error analysis starts with sampling traces, then classifying what went wrong using a consistent taxonomy. The goal is to discover recurring patterns that correspond to fixable interventions. This is all about collecting and reviewing failures, labeling them consistently, and translating them into targeted fixes and tests.

Why you should care:

Proper error analysis removes subjectivity, by defining and sizing real issue.

This helps you know what matters and where you should put your time.

You can do this by

Quantifying size and frequency of problem
Clustering failures into categories and mapping them to potential mitigations
Estimating the impact from fixing specific issues so the business case for your roadmap is clear
Adjust evals to monitor for problem types effectively

Impact modelling for your roadmap

Failure Modes

Distinct, non overlapping categories of errors identified through error analysis. Each represents a coherent type of problem the system exhibits. Often 4-6 categories emerge from analyzing traces.

More information:

A failure mode is a category of error with a clear definition that different reviewers can apply consistently when analyzing traces. In an ideal world, it should map to a likely fix.

Common modes include missing required fields, wrong tool choice, incorrect retrieval, refusal when it should answer, unsafe compliance, hallucination, and formatting violations.

Why you should care:

AI products involve trade-offs. Adjustments often improve some failure modes while increasing others. You can improve helpfulness while increasing unsafe output, or improve correctness while increasing refusal rates.

Single output and blended quality scores hide these tradeoffs. By identifying and monitoring failure mode metrics, you can show these shifts and support better prioritization.

Using failure modes helps you

Benchmark: Understand mode by mode performance, mitigations and assign mode owners
Understand and detect trade offs: i.e. higher assistant helpfulness but lower policy compliance
Change specific things: Do targeted testing and interventions
Communicate: Explain black box options in a measurable and comprehensible way to stakeholders and executives

Once again, usual data rules apply: if you create too many categories, you create complexity and inconsistency. If you have too few, you miss things. You’re aiming to capture high risk failure modes, and create categories that allow effective ownership and good work.

LLM as Judge Evals

Using a secondary, specialist LLM to evaluate outputs from your primary model. The judge assesses whether the output meets specific criteria, returning binary pass or fail decisions for each failure mode.

More information:

LLM-as-judge allows you to assess the output of an AI system qualitatively and at scale. You train a specialist LLM to categorise outputs from your AI product as passing or failing specific criteria, such as tone of voice, being helpful or being accurate. This scales evaluation, but is still ultimately grounded in human judgement.

To configure an LLM as a judge, a human must first review AI outputs and identify whether they are good or not. Poor outputs must be categorised into failure modes, and these defined in a way that another LLM can then accurately classify the outputs as passing or failing this test.

Judges are most reliable when they are narrow, focused on one dimension, and validated against human labels. Periodically the LLM judge must also be calibrated against human judgement, confirming that it still categorises the AI outputs in the same way as a human would.

Why you should care:

Judge evals allow scaled checks and more coverage of AI systems.

Ways to make them useful:

Set narrow rubrics
Triage volumes of outputs to find most expensive failure clusters
Track drift after model and prompt changes

Judges can be biased, drift when models change, and be gamed by optimizing for judge approval rather than user value. You have to keep a permanent human audit stream and treat judge scores as indications, not truth.

5 Agent Terms

AI Agent

A system where an LLM controls its own workflow to complete tasks. Makes decisions about which tools to use, when to use them, and how to adapt its approach based on results.

More information:

An AI agent uses a model to decide how to reach a goal. Tasks include planning and acting across multiple steps, making tool calls, text outputs, or actions. This unlocks automation, but it increases risk because the system is making decisions independently of human input.

Agents can decompose a request into subtasks, call tools, interpret results, and loop until they reach the conditions which tells them to stop.

Why you should care:

Agents can do things for you on your behalf, which can be a massive time saver. Agents often are at the core of AI product development, performing tasks on behalf of the customer and the company. But AI agents are also one of the highest risk applications of AI.

Risks include unintended actions, runaway cost, unclear accountability, and bad things, like the agent hallucinating when replying to customer emails.

Things can go wrong with agents, so understand a) when you need a true AI agent to reason probabilistically versus a deterministic workflow, b) where risk is high and when to keep a human in the loop.

AI Agents 101

Agents also tend to work best on very structured, clear goals, with reliable tooling, in a reliable environment, and on well structured data.

Tool Use

The ability for an LLM to call external functions, APIs, or services during its execution. Model decides when to use tools, which ones to call, and what parameters to pass.

More information:

This is about connecting the model to external systems so it can fetch facts and take actions. It’s what turns AI into a real product capability.

How it works: the model produces a structured request to an external function, such as a database, an API call, or an internal service. The system executes the tool, returns the result, and the model incorporates it into the next step or final response.

Why you should care:

All the big stuff requires access to real data. Product quality depends on tool access and tool design as much as prompting.

However tools also create integration work: authentication, authorization, rate limits, auditing, and error handling. They have knock on effects on product latency and reliability. Sometimes tools cause as many issues as the model. These are all things to know and manage for.

Agentic Workflows

Automated multi step processes where LLMs and tools are connected through predefined logic. Steps are set in code, but LLMs add intelligence by handling inputs, making decisions, and generating content.

More information:

Deterministic workflows have existed for a while. The difference today is that the modern generation of AI agent builder makes it a lot easier to set these up, via a mix of native API connections, natural language prompting and pre-set templates, as well as to include AI steps within these workflows.

The advantage of workflows over AI agents is that you can set up automations, but not sacrifice governance. You preset decisions at each stage of the task, and the system executes the flow, enforcing the boundaries.

Why you should care:

Workflows are often the answer to top down automation drives, where risk is involved. They give control and places to insert checks and approvals. Debugging is often a bit easier because you can see where and when they break easily.

Prompt Chaining

Breaking complex tasks into smaller steps, each with its own prompt. Output from one step becomes input to the next. Sequential processing where each prompt handles one focused task.

More information:

Prompt chaining breaks a complex task into steps. Here’s how it works: multiple model calls, each with a narrow objective. For example:

Extract structured facts
Apply business rules
Write a user facing response
Verify constraints

In essence this is the Henry Ford Model T mentality: 1 simple task per factory belt section. By the end you have a whole car, which has been put together faster and better, since every stage is specialized and tuned to that specific action.

Prompt chaining therefore improves reliability, and makes it easier to debug. The downside is that things take longer (increased latency) and operations are more complex.

A prompt chain is similar to an agentic workflow, but only using AI steps, rather than also including deterministic steps in the workflow.

Why you should care:

More complex prompts lead to more ambiguity and more failure. It’s harder then to understand what caused the issue. Chains split topics and output artifacts along the way that you can check.

The flip side of this is that early errors add up if they’re not caught. Additionally chains add cost and increase latency. As a result the best use cases are when you need reliability and auditability.

Model Context Protocol

Open standard defining how AI models connect to external tools, APIs, and data sources consistently and securely. Standardized interface for tool integration.

More information:

Model Context Protocol standardizes how models and agents connect to tools and data sources. It reduces bespoke integration work and supports centralized governance for what the model can access and do.

It does this by providing a common interface for exposing tools, instead of building custom connectors for each model and tool pair. Tools are described, discovered, and invoked through a consistent shape. This can make tool access portable across assistants and workflows.

Why you should care:

Tool integration costs compound the more you expand product and agent sets. Standardization reduces marginal integration effort and can simplify governance patterns such as approvals, logging, and permission scoping at the tool layer.

Benefits include:

Accessing tools through a consistent interface across assistants
Reducing duplicated connector work when adding new systems
Centralized governance for permissions, logging, and approvals
The ability to limit and curate tool catalogs to keep agent behavior predictable

However a protocol does not fix unclear tool semantics. Tools still need clear inputs, explicit failure behavior, and well defined permissions.

5 more key AI terms

Inference

The process of using a trained model to generate outputs on new inputs. When you send a prompt and get a response, that’s inference. Distinct from training, which is how the model learned.

More information:

The process of generating an output from a trained model. Inference includes assembling context, calling tools, possibly running multiple steps, validating output, and returning a final result. Most product tradeoffs show up here: cost, latency, reliability, and how you handle uncertainty.

Why you should care:

Two teams can use the same model and deliver very different outcomes because they make different inference design choices.

Ultimately inference design is the product. Choices about what context to include, whether to require grounding, and how to handle missing information determine user trust and operational cost. Inference is where you can iterate quickly, but it is also where complexity accumulates, with all the downstream implications.

Inference has costs. Most APIs charge per token: input tokens (what you send) plus output tokens (what you receive). Prices vary by model.

You can manage costs by using smaller models for simple queries, caching repeated context (like system prompts), batching requests when real-time isn’t needed, and implementing request queuing to prevent spikes during traffic bursts.

Latency

The time delay between sending a request and receiving a complete response. Measured in seconds or milliseconds. Includes network time, processing time, and generation time.

More information:

Think of this as how long you wait. Latency is time to a useful response. In AI products latency is a core dimension of user experience: if you’re waiting a long time for an output, the value of the output to you diminishes.

Why it matters to PMs:

If your system is slow, it can look broken. It therefore shapes what experiences are viable. For example, multi-step agent behavior might be acceptable for back office workflows but not for customer facing chats.

When working with AI you need to be cognizant of latency, and set different goals for different product types. You can use UI tricks like streaming or progress bars to reduce the perception of error. Technical levers like caching and smart tool calls can also help.

However speed can conflict with depth and quality. You need to balance latency with user value. You should see latency as a core product metric and track it quantitatively and qualitatively when it comes to user experience.

AI Terminology Cheat Sheet

Retrieval Augmented Generation (RAG)

Technique where the system first searches external sources (databases, documents, vector stores, often internal to a company) to find relevant information, then uses that as context for the model to generate a response.

More information:

Retrieval Augmented Generation retrieves relevant content from a corpus, inserts it into the model context, and instructs the model to answer using that evidence.

This separates content updates from model training, meaning that RAG is a process that consistently gives the model relevant and current context at runtime. It is the best approach when outputs need to use proprietary knowledge, policies change, or where hallucination risk is unacceptable.

Why you should care:

Many assistants fail because they lack the right information when an output is requested. Retrieval grounds the system in a specific context. It sets specific guardrails that support safer behavior when the model must not guess.

The downside is garbage in, garbage out again. If your corpus is out of date or stale, the assistant won’t return desired outputs. Retrieval is as much about content hygiene and access control as it is about modeling. Things that can go wrong include:

Retrieval failures: wrong documents returned (query doesn’t match document embeddings well), too many irrelevant results (poor chunking or ranking), or missing documents (content not in database).
Generation failures: model ignores retrieved context, hallucinates despite provided evidence, or can’t synthesize multiple sources coherently

Fine Tuning

Additional training on smaller, task specific dataset to specialize model behavior. Adjusts model weights for particular style, domain, or format without full retraining.

More information:

Fine tuning means adapting a base model to specific use cases.

It works by training a pretrained model on example inputs and desired outputs so it learns task specific behavior. It’s useful when you want to improve consistency, style adherence, and performance on a narrow task. It can also reduce prompt complexity when prompts become fragile.

It’s a useful tool when you have stable patterns, good examples, and strong evals. You shouldn’t see it as a way to keep the model current on facts – that’s what RAG is for.

Why you should care:

It’s a useful weapon in your armoury, but fine tuning is also an operational commitment. You need data, labeling quality, training runs, versioning, and monitoring. It can improve stability, but it increases surface area for regression if not managed with disciplined evals.

You should also be aware that fine tuning can bake in mistakes if the dataset is biased or low quality. It also does not reliably teach new factual knowledge in a safe way. Again – use RAG for that.

Hallucination

When a language model generates information that sounds plausible but is factually incorrect or completely invented. The model presents false information with confidence.

More information:

Hallucination is confident sounding or acting output that is not real. It’s what you call the behavior that occurs when the model generates plausible content that is not supported by provided context or tools.

It often appears when the user asks for facts the system does not have, when retrieval fails, or when prompts implicitly reward confident answers.

Why you should care:

You need to treat hallucination as a product risk to manage through system design. It not only can happen, it will happen. Hallucinations destroy trust quickly and can create brand, legal, and safety risk.

You will not eliminate hallucination completely, but you can reduce frequency and impact with grounding, validation, and UX that encourages the system to say what it knows and what it does not. Where you cannot have hallucination risk, set guardrails using RAG. Otherwise you should track hallucinations as a failure mode via evals, and work to minimize via other levers.

Wrap up on AI Glossary Terms

AI is complex, technical, and has its own language. Investing time in understanding key AI terminology helps demystify the space, and makes you a better PM and builder.

Use guides to learn terms, and tools like our AI Terminology Cheat Sheet to help build common understanding across your organization.

AI Maturity Model

More Hustle Badger Resources

Cohorts

On demand courses

Bitesize: 1 hour, 1 skill

In depth courses

Articles

Other Resources

Lenny Rachitsky, AI Glossary

FAQs on AI Glossary Terms

Where can I find an AI Terminology Cheat Sheet?

Get a free 98+ AI Terminology cheat sheet here. It’s organized into 10 categories:
* Prompting and Context Management: working well with LLMs
* Evals and Quality: how to make sure things work at scale
* Agents and Workflows: automation, products, and ways to build
* Career and AI roles: AI assisted PMs vs AI product PMs
* Model Behavior and Safety: where things often break
* Foundational Concepts and Model Types: how they work
* Data and Retrieval (RAG): creating libraries and guardrails
* Training and Model Development: customizing and training models
* Deployment and Operations: using AI in the wild
* Neural Networks and Architecture: how models are built and work
The cheat sheet is intended to help you communicate and create common understanding across your organization. If you have any feedback or suggestions, let us know at contact@hustlebadger.com

AI Glossary Guide | AI Terminology Cheat Sheet

Table of Contents

AI Terminology Cheat Sheet

5 Vibe Coding Concepts

Vibe Coding

Prompt Engineering

System Prompt

Context Window

JSON Mode

5 AI Eval Terms

AI Evals & AI Eval Systems

Traces

Error Analysis

Failure Modes

LLM as Judge Evals

5 Agent Terms

AI Agent

Tool Use

Agentic Workflows

Prompt Chaining

Model Context Protocol

5 more key AI terms

Inference

Latency

Retrieval Augmented Generation (RAG)

Fine Tuning

Hallucination

Wrap up on AI Glossary Terms

More Hustle Badger Resources

Other Resources

FAQs on AI Glossary Terms

Where can I find an AI Terminology Cheat Sheet?

Membership options

Monthly

Annual

Live

Business

Top articles

Subscribe and get our Ultimate Roadmap Template