EU AI Act compliance

Application Owner: Examplary AI Team (hi@examplary.ai)
Document Version: 1.0.0
Last Updated: 2 November 2025

Key Links

General Information

Purpose and Intended Use

Primary Purpose: Examplary is an AI-powered examination platform that enables educators, assessors and institutions to create, administer, and grade exams, assessments, essays and assignments efficiently. The system leverages several AI models, including Google Gemini 2.5 Flash and Pro models to generate exam questions, provide automated grading suggestions, and offer feedback to students.
Sector of Deployment: Education (Higher Education, Professional Training, K-12)
Problem Statement: Examplary addresses the time-consuming nature of exam creation and grading, helping educators focus more on teaching and student support rather than administrative tasks.
Target Users and Stakeholders:
- Primary Users: Educators, teachers, assessors, and instructors
- Secondary Users: Educational administrators
- End Users: Students taking exams and practice tests, as well as anyone taking any type of assessment
- Stakeholders: Educational institutions, learning management system administrators
Key Performance Indicators (KPIs):
- Question generation quality and relevance
- Grading accuracy and consistency
- Time saved in exam creation and grading
- Feedback accuracy and relevance
- User satisfaction scores
- System uptime and reliability
Ethical Considerations:
- Fair and unbiased assessment of student performance
- Protection of student data and privacy
- Transparency in AI-assisted grading decisions
- Accessibility for students with diverse needs
- Prevention of academic misconduct
Regulatory Constraints:
- GDPR compliance for EU users
- FERPA compliance for US educational institutions
- Accessibility standards (WCAG 2.1)
- EU AI Act requirements for high-risk AI systems in education
Prohibited Uses:
- Using the system for high-stakes decisions without human oversight
- Social scoring or profiling of students beyond academic performance
- Discriminatory practices based on protected characteristics
- Unauthorized sharing of student data with third parties
- Using generated content for purposes outside the educational context
Operational Environment:
- Cloud-based platform (Amazon Web Services)
- Web browser interface for desktop and mobile devices
- API integration with Learning Management Systems (Canvas, Moodle, etc.)
- Multi-tenant SaaS architecture

Risk Classification

Examplary is classified as a High-Risk AI system under the EU AI Act for the following reasons:

Educational Context (Article 6, Annex III, Section 3): The system is used to:
- Determine access to educational institutions or other organizations (through assessment)
- Evaluate learning outcomes and student performance
- Influence decisions on educational paths and grading
Automated Decision-Making: While human oversight is maintained, the system provides AI-generated:
- Test, exam and assessment questions and content
- Automated grading suggestions
- Performance assessments in practice tests
- Rubrics for essays and assignments
- Feedback and comment suggestions
Potential Impact: Educational assessments can significantly affect:
- Student advancement and graduation
- Access to further educational opportunities
- Career prospects and professional development
- Student self-perception and confidence

Application Functionality

Instructions for Use for Deployers

Examplary should be deployed with the following considerations:

Human Oversight Required: All AI-generated content and grading must be reviewed by qualified educators before final decisions
Training: Educators must be trained on the system's capabilities, limitations, and proper use
Context Awareness: Results should be interpreted within the broader educational context
Regular Review: Periodic assessment of system performance and outcomes
Student Communication: Clear communication to students about AI use in assessments, feedback and grading

Model Capabilities

What Examplary Can Do:

Generate exam questions based on course materials and learning objectives
Create questions in multiple formats (multiple choice, short answer, essay, etc.)
Provide automated grading for objective question types
Suggest grades and feedback for subjective responses
Analyze exam difficulty and question quality
Generate variations of questions for academic integrity
Generate rubrics for assignments and essays
Provide automated grading of assignments and essays based on rubrics
Import and export questions in standard formats (QTI, Moodle)
Integrate with Learning Management Systems

Limitations:

Cannot fully evaluate complex critical thinking without human review
May not capture nuanced or creative answers outside training patterns
Requires source and reference materials for context-specific question generation
Performance depends on quality and clarity of input materials
Cannot assess non-textual elements without additional context
May require adjustment for specialized or highly technical domains

Input Data Requirements

Format Expectations:

Source and reference materials: PDF, DOCX, TXT, Markdown, web page content
Learning objectives: Structured text format
Student responses: Text-based submissions
Grading rubrics: Structured text format

Output Explanation

Exam Outline Generation Outputs:

Exam intro and outro
Question suggestions with question type and title, based on selected taxonomy

Question Generation Outputs:

Generated questions include difficulty level indicators
Bloom's taxonomy or other taxonomy classification for each question
Suggested scoring criteria per question, aligning with learning objectives and chosen taxonomy

Grading suggestions Outputs:

Numerical scores with confidence levels
Feedback comments and suggestions
Rubric alignment indicators
Certainty scores to help educators identify areas needing attention

Rubric Generation Outputs:

Detailed rubrics with criteria and performance levels
Alignment with learning objectives
Suggestions for weighting and scoring

System Architecture Overview

Core Components:

Content Processing Engine
- Document parsing and analysis
- Content extraction and structuring
- Learning objective mapping
AI Generation Module
- Google Gemini 2.5 for question generation and editing
- Google Gemini 2.5 for grading and feedback suggestions
- Prompt engineering and template management
Assessment Management System
- Exam creation and configuration
- Question bank management
- Rubric creation and management
- Source and reference materials management
- Session and submission tracking
- Results analytics and reporting
Integration Layer
- LMS connectors (Canvas, Moodle)
- API endpoints for third-party systems
- Import/export functionality (QTI, custom formats)
- Ability to develop custom question types
User Interface
- Web-based educator dashboard
- Student assessment interface
- Administrative controls and settings

Models and Datasets

Models

Model	Provider	Version	Documentation	Application Usage
Google Gemini 2.5 Flash	Google	2.5	Link	Question generation, content analysis, initial grading suggestions
Google Gemini 2.5 Pro	Google	2.5	Link	Complex reasoning, essay grading, advanced feedback generation

Datasets

Dataset	Source	Application Usage
User-Provided Course Materials	Educational Institutions	Source content for question generation and feedback/feedforward specific to courses
Grading Rubrics	Educational Institutions	Assessment criteria for automated and assisted grading
Taxonomies of Learning	Examplary	Classification frameworks for organizing educational content

Data Characteristics:

Provenance: User-uploaded course materials, internally developed templates
Scope: Educational content across multiple domains and difficulty levels
Collection Method: User uploads, API integrations with LMS systems
Labeling: Manual validation by educators
Privacy: All user data processed in compliance with GDPR and data protection regulations

Deployment

Infrastructure and Environment Details

Cloud Setup:

Provider: Amazon Web Services (AWS)
Regions: Europe (eu-central-1)

Integration with External Systems

External Dependencies:

Google Gemini API (AI generation)
Learning Management Systems (Canvas, Moodle, BrightSpace, etc.)
Identity providers (AWS Cognito, OAuth, SAML)
Email service providers (AWS SES)
Payment processing (Stripe)

Error Handling:

Retry logic with exponential backoff for API calls
Fallback to cached responses when possible
Graceful degradation for non-critical features
User notification of processing errors
Detailed error logging for debugging

Deployment Plan

Environments:

Development: For feature development and testing
Staging: Pre-production testing and validation
Production: Live user environment with multi-region deployment

Infrastructure Scaling:

Serverless architecture for automatic scaling
CDN caching for static assets

User Information:

SaaS deployment accessible via https://app.examplary.ai
Private cloud options available for enterprise customers

Lifecycle Management

Risk Management System

Risk Assessment Methodology:

ISO 31000 Risk Management Framework
NIST AI Risk Management Framework
Regular risk assessments conducted quarterly
Continuous monitoring of system performance and user feedback

Identified Risks:

1. Bias in Question Generation or Grading

Potential Harm: Unfair assessment outcomes for certain student groups based on demographics, language proficiency, or cultural background.

Likelihood: Medium | Severity: High

Mitigation Measures:

Human review required for all assessments
Teacher feedback mechanism for reporting potential bias
Regular updates to prompts and model configurations

2. Inaccurate Grading or Feedback

Potential Harm: Students receiving incorrect grades due to incorrect AI judgement or incorrect interpretation of handwritten responses, affecting their academic progress, self-confidence, and future opportunities.

Likelihood: Medium | Severity: High

Mitigation Measures:

Confidence scoring on all automated grades
Mandatory human review for all grading suggestions
Clear communication of AI suggestions vs. human grading
Teacher feedback mechanism
Regular updates to prompts and model configurations

3. Privacy Breaches and Data Leakage

Potential Harm: Unauthorized access to student exam responses, grades, or personal information.

Likelihood: Low | Severity: Critical

Mitigation Measures:

End-to-end encryption for data in transit
Strict access controls and authentication
GDPR-compliant data processing agreements
Data minimization and retention policies

4. Academic Integrity Concerns

Potential Harm: Students misusing the system or generated questions being leaked, compromising exam validity.

Likelihood: Medium | Severity: Medium

Mitigation Measures:

Question variation and randomization
Access controls and security
Usage monitoring
Educator controls for exam release

5. Pedagogic integrity of generated content

Potential Harm: Generated questions or feedback not aligning with learning objectives or educational standards.

Likelihood: Medium | Severity: Medium

Mitigation Measures:

Alignment checks with curriculum standards
Educator review and approval workflows
Continuous updates based on pedagogical research

5. Model Drift and Performance Degradation

Potential Harm: Decreased quality of generated questions or grading accuracy over time.

Likelihood: Medium | Severity: Medium

Mitigation Measures:

Regular validation against benchmark datasets
User feedback mechanism
Regular updates to prompts and model configurations

6. System Availability and Reliability

Potential Harm: System downtime during critical exam periods affecting student assessments.

Likelihood: Low | Severity: High

Mitigation Measures:

99.9% uptime SLA commitment
Load testing and capacity planning
Scheduled maintenance during low-usage periods

Monitoring and Maintenance

Performance Metrics:

Application Performance:

Response time (p50, p95, p99)
Error rate and types
API success rate

Model Performance:

Grading accuracy vs. human assessments
Generation success rate
Model latency and throughput

Monitoring Procedures:

Real-time dashboards for all key metrics
Automated alerting for anomalies and threshold breaches

Change Log Maintenance:

All changes are documented with:

Version number and release date
Description of new features added
Updates to existing functionality
Deprecated features with migration path
Removed features and rationale
Bug fixes and issue resolution
Security patches and vulnerability fixes
Performance improvements
Model updates and retraining

Versioning Strategy:

Incremental integer version numbers
API versioning with deprecation notices
Change log published for all releases

Testing and Validation

Accuracy Throughout the Lifecycle

Data Quality and Management:

High-Quality Training Data: Source materials validated by educators
Data Preprocessing: Normalization, format standardization
Data Validation: Automated checks for completeness and consistency
Continuous Monitoring: Regular assessment of input data quality

Model Selection and Optimization:

Algorithm Selection: Google Gemini models chosen for educational reasoning capabilities
Prompt Engineering: Iterative refinement of prompts for optimal outputs

Feedback Mechanisms:

Real-time educator feedback collection
Comparison of AI suggestions with final versions (grades, questions)
Error tracking and root cause analysis
Continuous improvement based on usage patterns

Robustness

Robustness Measures:

Adversarial Testing: Regular testing with edge cases and unusual inputs
Stress Testing: Load testing with concurrent users and requests
Error Handling: Graceful degradation when encountering unexpected inputs
Domain Adaptation: Testing across diverse subjects and educational levels

Scenario-Based Testing:

Ambiguous or poorly structured input materials
Student responses with unconventional formatting
Non-standard language use or dialects
Content in mixed languages
Extremely short or long responses
Special characters and formatting edge cases

Uncertainty Estimation:

Confidence scores for all generated grades
Human review required for all grading suggestions

Cybersecurity

Data Security:

End-to-end encryption (TLS 1.3) for data in transit
Encrypted backups with secure key management
Regular security audits and vulnerability assessments

Access Control:

Role-based access control (RBAC) with least privilege principle
OAuth 2.0 and SAML for identity federation
Session management and timeout controls
Audit logging of all access and changes

Threat Modeling:

Regular threat assessments following STRIDE methodology
Security code reviews for all releases
Dependency scanning for known vulnerabilities

Incident Response:

24/7 security monitoring and alerting
Post-incident analysis and remediation tracking

Secure Development Practices:

Security training for all developers
Secure coding guidelines and automated checks
Code review requirements including security review
Dependency updates and patch management

Human Oversight

Human-in-the-Loop Mechanisms:

Question Generation Review:
- All generated questions and scoring criteria are reviewed by educators before use
- Ability to edit, approve, or reject generated content
- Version history and change tracking
Grading Oversight:
- AI grading presented as suggestions, not final grades
- Mandatory human review for all grading suggestions
- AI grading suggestions based on transparent scoring criteria, ensuring equal treatment
Quality Assurance:
- Random sampling of AI outputs for manual review
- Educator feedback integration into system improvements

Limitations and Constraints:

What the System Cannot Do:

Make final grading decisions without educator approval
Assess non-textual elements (diagrams, calculations) without context
Evaluate interpersonal skills or practical demonstrations
Account for individual student circumstances or accommodations
Replace pedagogical judgment and teaching expertise
Guarantee perfect accuracy in subjective assessment

Known Weaknesses:

May struggle with highly specialized or technical terminology
Performance varies with input quality and clarity
Limited context for individual student learning trajectories
May not capture creative or unconventional correct answers
Requires regular human calibration and validation

Performance Degradation Scenarios:

Very long or very short student responses
Mixed-language or code-switching in responses
Highly ambiguous or poorly worded questions
Responses requiring external knowledge not in source materials
New or emerging topics not well-represented in training data

EU Declaration of Conformity

Conformity Assessment Status: In Progress

As a high-risk AI system under the EU AI Act, Examplary is undergoing conformity assessment procedures. Upon completion, a formal EU Declaration of Conformity will be issued including:

System name and version
Provider name and address (Examplary AI)
Statement of conformity with EU AI Act requirements
Compliance with GDPR (Regulation (EU) 2016/679)
Reference to harmonized standards applied
Conformity assessment procedure description
Notified body information (when applicable)
Declaration signature and date

Expected Completion: Q2 2026 (aligned with EU AI Act enforcement timeline)

Documentation Metadata

Template Version

Based on: TechOps Application Documentation Template
Adapted: 2 November 2025

Documentation Authors

Examplary AI Team (Owner)

Review Schedule

Updates triggered by:
- Major system changes
- Model updates
- Regulatory changes
- Significant incidents
- User feedback trends

Version History

v1.0.0 (2 November 2025): Initial EU AI Act compliance documentation

This document is maintained in accordance with EU AI Act requirements for high-risk AI systems. For questions or updates, please contact the team at hi@examplary.ai

EU AI Act compliance

Key Links

General Information

Purpose and Intended Use

Risk Classification

Application Functionality

Instructions for Use for Deployers

Model Capabilities

Input Data Requirements

Output Explanation

System Architecture Overview

Models and Datasets

Models

Datasets

Deployment

Infrastructure and Environment Details

Integration with External Systems

Deployment Plan

Lifecycle Management

Risk Management System

1. Bias in Question Generation or Grading

2. Inaccurate Grading or Feedback

3. Privacy Breaches and Data Leakage

4. Academic Integrity Concerns

5. Pedagogic integrity of generated content

5. Model Drift and Performance Degradation

6. System Availability and Reliability

Monitoring and Maintenance

Testing and Validation

Accuracy Throughout the Lifecycle

Robustness

Cybersecurity

Human Oversight

EU Declaration of Conformity

Documentation Metadata

Template Version

Documentation Authors

Review Schedule

Version History

More on legal & compliance