Examplary
  • Start for free
    All guides

    Legal & compliance

    EU AI Act compliance

    • Application Owner: Examplary AI Team (hi@examplary.ai)
    • Document Version: 1.0.0
    • Last Updated: 2 November 2025

    General Information

    Purpose and Intended Use

    • Primary Purpose: Examplary is an AI-powered examination platform that enables educators, assessors and institutions to create, administer, and grade exams, assessments, essays and assignments efficiently. The system leverages several AI models, including Google Gemini 2.5 Flash and Pro models to generate exam questions, provide automated grading suggestions, and offer feedback to students.

    • Sector of Deployment: Education (Higher Education, Professional Training, K-12)

    • Problem Statement: Examplary addresses the time-consuming nature of exam creation and grading, helping educators focus more on teaching and student support rather than administrative tasks.

    • Target Users and Stakeholders:

      • Primary Users: Educators, teachers, assessors, and instructors
      • Secondary Users: Educational administrators
      • End Users: Students taking exams and practice tests, as well as anyone taking any type of assessment
      • Stakeholders: Educational institutions, learning management system administrators
    • Key Performance Indicators (KPIs):

      • Question generation quality and relevance
      • Grading accuracy and consistency
      • Time saved in exam creation and grading
      • Feedback accuracy and relevance
      • User satisfaction scores
      • System uptime and reliability
    • Ethical Considerations:

      • Fair and unbiased assessment of student performance
      • Protection of student data and privacy
      • Transparency in AI-assisted grading decisions
      • Accessibility for students with diverse needs
      • Prevention of academic misconduct
    • Regulatory Constraints:

      • GDPR compliance for EU users
      • FERPA compliance for US educational institutions
      • Accessibility standards (WCAG 2.1)
      • EU AI Act requirements for high-risk AI systems in education
    • Prohibited Uses:

      • Using the system for high-stakes decisions without human oversight
      • Social scoring or profiling of students beyond academic performance
      • Discriminatory practices based on protected characteristics
      • Unauthorized sharing of student data with third parties
      • Using generated content for purposes outside the educational context
    • Operational Environment:

      • Cloud-based platform (Amazon Web Services)
      • Web browser interface for desktop and mobile devices
      • API integration with Learning Management Systems (Canvas, Moodle, etc.)
      • Multi-tenant SaaS architecture

    Risk Classification

    Examplary is classified as a High-Risk AI system under the EU AI Act for the following reasons:

    1. Educational Context (Article 6, Annex III, Section 3): The system is used to:

      • Determine access to educational institutions or other organizations (through assessment)
      • Evaluate learning outcomes and student performance
      • Influence decisions on educational paths and grading
    2. Automated Decision-Making: While human oversight is maintained, the system provides AI-generated:

      • Test, exam and assessment questions and content
      • Automated grading suggestions
      • Performance assessments in practice tests
      • Rubrics for essays and assignments
      • Feedback and comment suggestions
    3. Potential Impact: Educational assessments can significantly affect:

      • Student advancement and graduation
      • Access to further educational opportunities
      • Career prospects and professional development
      • Student self-perception and confidence

    Application Functionality

    Instructions for Use for Deployers

    Examplary should be deployed with the following considerations:

    • Human Oversight Required: All AI-generated content and grading must be reviewed by qualified educators before final decisions
    • Training: Educators must be trained on the system's capabilities, limitations, and proper use
    • Context Awareness: Results should be interpreted within the broader educational context
    • Regular Review: Periodic assessment of system performance and outcomes
    • Student Communication: Clear communication to students about AI use in assessments, feedback and grading

    Model Capabilities

    What Examplary Can Do:

    • Generate exam questions based on course materials and learning objectives
    • Create questions in multiple formats (multiple choice, short answer, essay, etc.)
    • Provide automated grading for objective question types
    • Suggest grades and feedback for subjective responses
    • Analyze exam difficulty and question quality
    • Generate variations of questions for academic integrity
    • Generate rubrics for assignments and essays
    • Provide automated grading of assignments and essays based on rubrics
    • Import and export questions in standard formats (QTI, Moodle)
    • Integrate with Learning Management Systems

    Limitations:

    • Cannot fully evaluate complex critical thinking without human review
    • May not capture nuanced or creative answers outside training patterns
    • Requires source and reference materials for context-specific question generation
    • Performance depends on quality and clarity of input materials
    • Cannot assess non-textual elements without additional context
    • May require adjustment for specialized or highly technical domains

    Input Data Requirements

    Format Expectations:

    • Source and reference materials: PDF, DOCX, TXT, Markdown, web page content
    • Learning objectives: Structured text format
    • Student responses: Text-based submissions
    • Grading rubrics: Structured text format

    Output Explanation

    Exam Outline Generation Outputs:

    • Exam intro and outro
    • Question suggestions with question type and title, based on selected taxonomy

    Question Generation Outputs:

    • Generated questions include difficulty level indicators
    • Bloom's taxonomy or other taxonomy classification for each question
    • Suggested scoring criteria per question, aligning with learning objectives and chosen taxonomy

    Grading suggestions Outputs:

    • Numerical scores with confidence levels
    • Feedback comments and suggestions
    • Rubric alignment indicators
    • Certainty scores to help educators identify areas needing attention

    Rubric Generation Outputs:

    • Detailed rubrics with criteria and performance levels
    • Alignment with learning objectives
    • Suggestions for weighting and scoring

    System Architecture Overview

    Core Components:

    1. Content Processing Engine

      • Document parsing and analysis
      • Content extraction and structuring
      • Learning objective mapping
    2. AI Generation Module

      • Google Gemini 2.5 for question generation and editing
      • Google Gemini 2.5 for grading and feedback suggestions
      • Prompt engineering and template management
    3. Assessment Management System

      • Exam creation and configuration
      • Question bank management
      • Rubric creation and management
      • Source and reference materials management
      • Session and submission tracking
      • Results analytics and reporting
    4. Integration Layer

      • LMS connectors (Canvas, Moodle)
      • API endpoints for third-party systems
      • Import/export functionality (QTI, custom formats)
      • Ability to develop custom question types
    5. User Interface

      • Web-based educator dashboard
      • Student assessment interface
      • Administrative controls and settings

    Models and Datasets

    Models

    ModelProviderVersionDocumentationApplication Usage
    Google Gemini 2.5 FlashGoogle2.5LinkQuestion generation, content analysis, initial grading suggestions
    Google Gemini 2.5 ProGoogle2.5LinkComplex reasoning, essay grading, advanced feedback generation

    Datasets

    DatasetSourceApplication Usage
    User-Provided Course MaterialsEducational InstitutionsSource content for question generation and feedback/feedforward specific to courses
    Grading RubricsEducational InstitutionsAssessment criteria for automated and assisted grading
    Taxonomies of LearningExamplaryClassification frameworks for organizing educational content

    Data Characteristics:

    • Provenance: User-uploaded course materials, internally developed templates
    • Scope: Educational content across multiple domains and difficulty levels
    • Collection Method: User uploads, API integrations with LMS systems
    • Labeling: Manual validation by educators
    • Privacy: All user data processed in compliance with GDPR and data protection regulations

    Deployment

    Infrastructure and Environment Details

    Cloud Setup:

    • Provider: Amazon Web Services (AWS)
    • Regions: Europe (eu-central-1)

    Integration with External Systems

    External Dependencies:

    • Google Gemini API (AI generation)
    • Learning Management Systems (Canvas, Moodle, BrightSpace, etc.)
    • Identity providers (AWS Cognito, OAuth, SAML)
    • Email service providers (AWS SES)
    • Payment processing (Stripe)

    Error Handling:

    • Retry logic with exponential backoff for API calls
    • Fallback to cached responses when possible
    • Graceful degradation for non-critical features
    • User notification of processing errors
    • Detailed error logging for debugging

    Deployment Plan

    Environments:

    • Development: For feature development and testing
    • Staging: Pre-production testing and validation
    • Production: Live user environment with multi-region deployment

    Infrastructure Scaling:

    • Serverless architecture for automatic scaling
    • CDN caching for static assets

    User Information:

    Lifecycle Management

    Risk Management System

    Risk Assessment Methodology:

    • ISO 31000 Risk Management Framework
    • NIST AI Risk Management Framework
    • Regular risk assessments conducted quarterly
    • Continuous monitoring of system performance and user feedback

    Identified Risks:

    1. Bias in Question Generation or Grading

    Potential Harm: Unfair assessment outcomes for certain student groups based on demographics, language proficiency, or cultural background.

    Likelihood: Medium | Severity: High

    Mitigation Measures:

    • Human review required for all assessments
    • Teacher feedback mechanism for reporting potential bias
    • Regular updates to prompts and model configurations

    2. Inaccurate Grading or Feedback

    Potential Harm: Students receiving incorrect grades due to incorrect AI judgement or incorrect interpretation of handwritten responses, affecting their academic progress, self-confidence, and future opportunities.

    Likelihood: Medium | Severity: High

    Mitigation Measures:

    • Confidence scoring on all automated grades
    • Mandatory human review for all grading suggestions
    • Clear communication of AI suggestions vs. human grading
    • Teacher feedback mechanism
    • Regular updates to prompts and model configurations

    3. Privacy Breaches and Data Leakage

    Potential Harm: Unauthorized access to student exam responses, grades, or personal information.

    Likelihood: Low | Severity: Critical

    Mitigation Measures:

    • End-to-end encryption for data in transit
    • Strict access controls and authentication
    • GDPR-compliant data processing agreements
    • Data minimization and retention policies

    4. Academic Integrity Concerns

    Potential Harm: Students misusing the system or generated questions being leaked, compromising exam validity.

    Likelihood: Medium | Severity: Medium

    Mitigation Measures:

    • Question variation and randomization
    • Access controls and security
    • Usage monitoring
    • Educator controls for exam release

    5. Pedagogic integrity of generated content

    Potential Harm: Generated questions or feedback not aligning with learning objectives or educational standards.

    Likelihood: Medium | Severity: Medium

    Mitigation Measures:

    • Alignment checks with curriculum standards
    • Educator review and approval workflows
    • Continuous updates based on pedagogical research

    5. Model Drift and Performance Degradation

    Potential Harm: Decreased quality of generated questions or grading accuracy over time.

    Likelihood: Medium | Severity: Medium

    Mitigation Measures:

    • Regular validation against benchmark datasets
    • User feedback mechanism
    • Regular updates to prompts and model configurations

    6. System Availability and Reliability

    Potential Harm: System downtime during critical exam periods affecting student assessments.

    Likelihood: Low | Severity: High

    Mitigation Measures:

    • 99.9% uptime SLA commitment
    • Load testing and capacity planning
    • Scheduled maintenance during low-usage periods

    Monitoring and Maintenance

    Performance Metrics:

    Application Performance:

    • Response time (p50, p95, p99)
    • Error rate and types
    • API success rate

    Model Performance:

    • Grading accuracy vs. human assessments
    • Generation success rate
    • Model latency and throughput

    Monitoring Procedures:

    • Real-time dashboards for all key metrics
    • Automated alerting for anomalies and threshold breaches

    Change Log Maintenance:

    All changes are documented with:

    • Version number and release date
    • Description of new features added
    • Updates to existing functionality
    • Deprecated features with migration path
    • Removed features and rationale
    • Bug fixes and issue resolution
    • Security patches and vulnerability fixes
    • Performance improvements
    • Model updates and retraining

    Versioning Strategy:

    • Incremental integer version numbers
    • API versioning with deprecation notices
    • Change log published for all releases

    Testing and Validation

    Accuracy Throughout the Lifecycle

    Data Quality and Management:

    • High-Quality Training Data: Source materials validated by educators
    • Data Preprocessing: Normalization, format standardization
    • Data Validation: Automated checks for completeness and consistency
    • Continuous Monitoring: Regular assessment of input data quality

    Model Selection and Optimization:

    • Algorithm Selection: Google Gemini models chosen for educational reasoning capabilities
    • Prompt Engineering: Iterative refinement of prompts for optimal outputs

    Feedback Mechanisms:

    • Real-time educator feedback collection
    • Comparison of AI suggestions with final versions (grades, questions)
    • Error tracking and root cause analysis
    • Continuous improvement based on usage patterns

    Robustness

    Robustness Measures:

    • Adversarial Testing: Regular testing with edge cases and unusual inputs
    • Stress Testing: Load testing with concurrent users and requests
    • Error Handling: Graceful degradation when encountering unexpected inputs
    • Domain Adaptation: Testing across diverse subjects and educational levels

    Scenario-Based Testing:

    • Ambiguous or poorly structured input materials
    • Student responses with unconventional formatting
    • Non-standard language use or dialects
    • Content in mixed languages
    • Extremely short or long responses
    • Special characters and formatting edge cases

    Uncertainty Estimation:

    • Confidence scores for all generated grades
    • Human review required for all grading suggestions

    Cybersecurity

    Data Security:

    • End-to-end encryption (TLS 1.3) for data in transit
    • Encrypted backups with secure key management
    • Regular security audits and vulnerability assessments

    Access Control:

    • Role-based access control (RBAC) with least privilege principle
    • OAuth 2.0 and SAML for identity federation
    • Session management and timeout controls
    • Audit logging of all access and changes

    Threat Modeling:

    • Regular threat assessments following STRIDE methodology
    • Security code reviews for all releases
    • Dependency scanning for known vulnerabilities

    Incident Response:

    • 24/7 security monitoring and alerting
    • Post-incident analysis and remediation tracking

    Secure Development Practices:

    • Security training for all developers
    • Secure coding guidelines and automated checks
    • Code review requirements including security review
    • Dependency updates and patch management

    Human Oversight

    Human-in-the-Loop Mechanisms:

    1. Question Generation Review:

      • All generated questions and scoring criteria are reviewed by educators before use
      • Ability to edit, approve, or reject generated content
      • Version history and change tracking
    2. Grading Oversight:

      • AI grading presented as suggestions, not final grades
      • Mandatory human review for all grading suggestions
      • AI grading suggestions based on transparent scoring criteria, ensuring equal treatment
    3. Quality Assurance:

      • Random sampling of AI outputs for manual review
      • Educator feedback integration into system improvements

    Limitations and Constraints:

    What the System Cannot Do:

    • Make final grading decisions without educator approval
    • Assess non-textual elements (diagrams, calculations) without context
    • Evaluate interpersonal skills or practical demonstrations
    • Account for individual student circumstances or accommodations
    • Replace pedagogical judgment and teaching expertise
    • Guarantee perfect accuracy in subjective assessment

    Known Weaknesses:

    • May struggle with highly specialized or technical terminology
    • Performance varies with input quality and clarity
    • Limited context for individual student learning trajectories
    • May not capture creative or unconventional correct answers
    • Requires regular human calibration and validation

    Performance Degradation Scenarios:

    • Very long or very short student responses
    • Mixed-language or code-switching in responses
    • Highly ambiguous or poorly worded questions
    • Responses requiring external knowledge not in source materials
    • New or emerging topics not well-represented in training data

    EU Declaration of Conformity

    Conformity Assessment Status: In Progress

    As a high-risk AI system under the EU AI Act, Examplary is undergoing conformity assessment procedures. Upon completion, a formal EU Declaration of Conformity will be issued including:

    • System name and version
    • Provider name and address (Examplary AI)
    • Statement of conformity with EU AI Act requirements
    • Compliance with GDPR (Regulation (EU) 2016/679)
    • Reference to harmonized standards applied
    • Conformity assessment procedure description
    • Notified body information (when applicable)
    • Declaration signature and date

    Expected Completion: Q2 2026 (aligned with EU AI Act enforcement timeline)

    Documentation Metadata

    Template Version

    Documentation Authors

    • Examplary AI Team (Owner)

    Review Schedule

    • Updates triggered by:
      • Major system changes
      • Model updates
      • Regulatory changes
      • Significant incidents
      • User feedback trends

    Version History

    • v1.0.0 (2 November 2025): Initial EU AI Act compliance documentation

    This document is maintained in accordance with EU AI Act requirements for high-risk AI systems. For questions or updates, please contact the team at hi@examplary.ai