AI hallucinations in education: how Examplary prevents them

One of the most common worries about AI in education is as simple as it is fair: what if it just makes something up? A test question with a date that's wrong. A "model answer" that's factually incorrect. A citation to a book that doesn't exist. A single error in an assessment undermines trust in the whole instrument.

That worry deserves an honest answer. In this article we explain why AI sometimes invents things, and how Examplary is built to prevent that: by grounding every question and every scoring criterion in your own source material.

What is a "hallucination", really?

A language model, the technology behind tools like Mistral, ChatGPT and Claude, is not a database of facts. Put simply, it's an extraordinarily sophisticated system that predicts which word is likely to follow the last one. That works remarkably well for fluent, convincing text, but the model doesn't "know" whether what it's saying is actually true. When it hits a gap in its knowledge, it fills that gap with something that sounds plausible. We call that a hallucination: an answer that looks confident and credible, but is invented.

The danger lies precisely in that persuasiveness. A hallucination looks just as tidy as a correct answer. Naturally, in education that's exactly the risk you don't want to take. Language models have of course become steadily better over the past few years and hallucinate less as a result, but for now this risk remains real whenever you use them.

Your teaching material as the foundation

Examplary works fundamentally differently here than a general-purpose chatbot. We do not let the AI draw on its own, diffuse world knowledge. Instead, we force the model to stay within the lines of your source material. This works in much the same way as creating materials with NotebookLM, for those who have used it.

It starts with what you provide. You decide the source, and you can do that in several ways:

Upload documents: PDFs, Word files and presentations, such as your textbook, a chapter or a hand-out.
Paste text: copy in the exact material you want to assess.
Scan with your phone: scan a QR code and photograph paper material, such as a workbook, an article or your own notes.
Reuse earlier material: build on sources you've already added, or specific chapters from them.

Whatever you choose, the principle stays the same: the assessment is about your material, not about whatever an AI happens to think it knows.

How we keep the AI on topic

Behind the scenes, Examplary doesn't work as one big question to a chatbot, but as a series of controlled steps we call an agentic workflow. The most important rule within it: the model may only reason from the source material, not from itself.

Concretely, it works like this. For each topic, we extract all the relevant facts and context from your source and store them separately. The questions and their scoring criteria are then generated per topic.

That has an important side effect. Because every question has a trail back to the original knowledge, you as a teacher can always see why a question was asked and gain insight into why a marking criterion was scored right or wrong, and even better feedback can be generated. In short, this is how we ensure AI suggestions always remain verifiable. And as we wrote earlier in AI-assisted grading: suggestions, not autonomous decisions, the teacher always remains the one who makes the final call.

What does the research say?

This form of "grounding the AI in a reliable source" is a well-established principle, also known in technical jargon as retrieval-augmented generation (RAG): instead of drawing on the model's memory, the right information is retrieved first, and the answer must be based on it.

A that pooled 20 separate studies found that grounding AI in retrieved source material significantly improved performance compared to the same models without a source.

In a , the share of hallucinated components fell from around 21% without a source to under 4.5% with one: a reduction of roughly 80%. And this study is already two years old; the AI models have improved so much since then that they hallucinate less to begin with.

An AI that has to stick to a source therefore invents considerably less than an AI answering off the top of its head.

Trust starts with control

Responsible AI in education isn't about an AI that never makes a mistake; no one can promise that. It's about control: you know where the content comes from, you can check it, and you have the final word. By anchoring everything in your source material, Examplary drastically reduces the risk of invented content while keeping you in charge.

On top of that, Examplary is fully GDPR and AI Act compliant: we don't train on your data, personal data is never fed to AI models, and all personal data is securely stored in Europe.

Curious how it works with your own teaching material? Create a free account or, for a school account, get in touch at hi@examplary.ai.

Examplary: AI made for teachers.

Preventing AI hallucinations: how Examplary grounds every question in your teaching material

What is a "hallucination", really?

Your teaching material as the foundation

How we keep the AI on topic

What does the research say?

Trust starts with control