AI Engineering
AI Ethics & Legal: EU AI Act, Copyright, Liability - The Legal Side of AI
Цели урока
- Understand the EU AI Act: risk classification, requirements for each level, timeline
- Grasp the copyright landscape: training on copyrighted data, AI output ownership, C2PA
- Learn bias detection: types of bias, testing, fairness metrics
- Apply a practical compliance checklist to an AI product
2023. Samsung allows engineers to use ChatGPT for code review. Three weeks later: semiconductor source code, internal meeting notes, and performance data had leaked through prompts. Samsung immediately bans ChatGPT company-wide. Meanwhile, Getty Images files suit against Stability AI for training on 12 million licensed photos without permission. And in Europe, GDPR hits Meta with a USD 1.3B fine. AI Ethics isn't activism - it's concrete legal exposure that materializes on the first lawsuit.
- Samsung data leak (2023): three ChatGPT incidents in three weeks - chip source code, meeting transcripts, test data. Result: full corporate ban
- Getty vs Stability AI: suit over `12M` photos used without a license - settled 2025, terms undisclosed
- Meta GDPR fine USD 1.3B (2023) - largest in GDPR history, for transferring EU user data to the US without adequate safeguards
- Amazon AI recruiting tool (2018): trained on 10 years of hiring data - reproduced gender bias and was quietly shut down
The year AI law became real
For a decade AI regulation lived in guidelines and ethics boards. That changed fast. On December 27, 2023, The New York Times sued OpenAI and Microsoft in federal court, alleging that millions of its articles were copied to train ChatGPT - the highest-profile copyright case of the AI era, still unresolved. Months later the European Union passed the AI Act (Regulation 2024/1689, in force August 1, 2024), the world's first comprehensive AI law, sorting systems into four risk tiers from minimal to prohibited. Layered on top of GDPR (in force since 2018), these moves turned AI compliance from a nice-to-have into a hard engineering constraint.
Предварительные знания
EU AI Act: The World's First AI Law
The **EU AI Act** came into force on August 1, 2024, with phased implementation through 2027. The world's first comprehensive AI regulation - and unlike GDPR, it places compliance responsibility on the deployer, not just the model developer. Fines for GPAI violations are already written into law: up to `35M EUR` or `7%` of global turnover, directly comparable to GDPR scale.
**General Purpose AI Models (GPAI)** - a separate category covering foundation models (GPT-4, Claude, Llama). From August 2025, required:
- Technical documentation (model card)
- Compliance with EU copyright law during training
- Disclosure of a training data summary
- For **systemic risk** models (>10^25 FLOP during training): adversarial testing, incident reporting, cybersecurity measures
**The EU AI Act applies to ANY company whose AI systems operate in the EU.** It doesn't matter where the company is registered - if the product is available in the EU, the law applies. Think of it as GDPR for AI.
**Implementation timeline:**
| Date | What takes effect |
|---|---|
| February 2025 | Ban on unacceptable risk AI systems |
| August 2025 | Requirements for GPAI (foundation models) |
| August 2026 | Requirements for high-risk AI systems |
| August 2027 | Full enforcement of all requirements |
An AI system for screening job candidates' resumes in the EU is classified as:
Copyright & AI: Who Owns AI-Generated Content
Two independent copyright questions in AI: 1. **Can copyrighted data be used for training?** (2) **Who owns AI-generated content?** The answers differ by jurisdiction and get rewritten with each ruling. Getty Images vs Stability AI (settled 2025) made one thing clear: scraping `12 million` licensed photos isn't fair use - it's commercial harm. NYT vs OpenAI (still ongoing) could fundamentally reshape whether training on web data is viable at all.
**Practical decisions for AI products - what to actually do right now:**
- **Terms of Service:** directly define rights to AI-generated content. Who owns the output - the user or the platform?
- **Disclosure:** label AI-generated content ("Generated with AI" or a metadata tag). The EU AI Act requires this for limited risk systems
- **Training data audit:** document data sources. EU GPAI rules require a training data summary
- **Opt-out mechanism:** support robots.txt / ai.txt for web crawling. Respect do-not-train requests
- **Human-in-the-loop for copyrightable output:** if content needs to be protected, a human must have substantially contributed
**C2PA (Coalition for Content Provenance and Authenticity)** - a standard by Adobe, Microsoft, and the BBC for labeling AI content. It embeds cryptographic metadata into files: who created it, with what tool, and when. Camera manufacturers (Nikon, Sony) already support it for photos.
According to the US Copyright Office (2023), an AI-generated image created solely from a prompt:
Bias & Fairness: AI System Prejudice
AI models inherit bias from training data - mechanically, without intent. Amazon's recruiting tool was trained on 10 years of hiring history where senior roles went predominantly to men. The result: the system downgraded resumes mentioning the word "women's" (as in "women's chess club"). For **high-risk systems** (HR, credit scoring, medical diagnostics), bias testing is mandatory under the EU AI Act - and must be continuous, not a one-time check.
**Bias testing is not a one-time procedure.** Models get updated, data changes, use cases expand. Bias audits should be part of the CI/CD pipeline for high-risk AI systems. The EU AI Act requires ongoing monitoring.
An AI resume screening system shows bias: it rates male resumes 15% higher. The most likely cause?
Practical Compliance Checklist for an AI Product
After the Samsung data leak, the Getty lawsuit, and the Meta GDPR fine, the pattern is clear: legal risk in AI doesn't materialize "someday" - it shows up on the first serious scale-up. Below is a **concrete checklist** applicable to any AI product. Structured by priority: must-have (cannot launch in the EU without these), should-have (strong recommendation), nice-to-have (competitive differentiator).
**Liability** - who gets sued when AI causes harm. The EU AI Liability Directive (in progress) introduces a presumption of causality: if an AI system violated regulatory requirements, the victim doesn't need to prove direct causation. This flips the burden of proof:
| Scenario | Who is responsible | Why |
|---|---|---|
| AI chatbot gives harmful medical advice | Product developer | The deployer must provide safety guardrails |
| AI model hallucinates false information | Product developer | Duty of care: output validation is mandatory |
| AI resume screener discriminates | Developer + deployer | High-risk system: bias testing is mandatory |
| User uses AI for harm | User (primarily) | But the deployer must have moderation in place |
| Model provider (OpenAI/Anthropic) outputs harmful content | Shared liability | Provider: safety training, Deployer: guardrails |
**Practical advice:** for most AI products (chatbot, summarization, code assistant), the MUST-HAVE checklist + basic bias testing is sufficient. Full compliance is only needed for high-risk systems. Don't over-engineer compliance for low-risk products.
AI chatbot for customer support. From the checklist, what is MUST-HAVE?
Key Takeaways
- EU AI Act risk mapping: before shipping an AI feature, classify the risk level. HR, credit, medical = high risk; full compliance required from August 2026
- Copyright: pure AI output has no copyright protection (US Copyright Office, 2023). Training on unlicensed data without an opt-out mechanism is the path Getty vs Stability AI took
- PII filtering before LLM: the Samsung incident shows what happens without it - proprietary data leaks through prompts. A Data Processing Agreement with the provider is the legal baseline
- Bias audits in CI/CD: Amazon ran its recruiting tool for four years before catching the bias. One-time testing is not enough - models update, data drifts
- Liability chain: model providers are responsible for safety training; deployers are responsible for guardrails and moderation. When an AI chatbot causes harm, the deployer gets the first lawsuit
What's Next
The legal and ethical frameworks are clear. The final lesson is the capstone project: designing and building a complete AI application from requirements to deployment, applying everything learned.
- Capstone Project — Final project - a design doc for a production AI application
- Guardrails — Technical safeguards for AI systems that complement legal compliance
Связанные уроки
- aie-33-guardrails — Compliance enforces guardrails on outputs
- aie-35-observability — Audit logs need observability and tracing
- aie-64-synthetic-data — Synthetic data reduces privacy and copyright risk
- sd-23-security — Data privacy reuses security and isolation controls
- stat-20-causal — Bias auditing leans on causal fairness analysis
- stat-05-hypothesis