AI Engineering

AI Ethics & Legal: EU AI Act, Copyright, Liability - The Legal Side of AI

Цели урока

Understand the EU AI Act: risk classification, requirements for each level, timeline
Grasp the copyright landscape: training on copyrighted data, AI output ownership, C2PA
Learn bias detection: types of bias, testing, fairness metrics
Apply a practical compliance checklist to an AI product

2023. Samsung allows engineers to use ChatGPT for code review. Three weeks later: semiconductor source code, internal meeting notes, and performance data had leaked through prompts. Samsung immediately bans ChatGPT company-wide. Meanwhile, Getty Images files suit against Stability AI for training on 12 million licensed photos without permission. And in Europe, GDPR hits Meta with a USD 1.3B fine. AI Ethics isn't activism - it's concrete legal exposure that materializes on the first lawsuit.

Samsung data leak (2023): three ChatGPT incidents in three weeks - chip source code, meeting transcripts, test data. Result: full corporate ban
Getty vs Stability AI: suit over `12M` photos used without a license - settled 2025, terms undisclosed
Meta GDPR fine USD 1.3B (2023) - largest in GDPR history, for transferring EU user data to the US without adequate safeguards
Amazon AI recruiting tool (2018): trained on 10 years of hiring data - reproduced gender bias and was quietly shut down

The year AI law became real

For a decade AI regulation lived in guidelines and ethics boards. That changed fast. On December 27, 2023, The New York Times sued OpenAI and Microsoft in federal court, alleging that millions of its articles were copied to train ChatGPT - the highest-profile copyright case of the AI era, still unresolved. Months later the European Union passed the AI Act (Regulation 2024/1689, in force August 1, 2024), the world's first comprehensive AI law, sorting systems into four risk tiers from minimal to prohibited. Layered on top of GDPR (in force since 2018), these moves turned AI compliance from a nice-to-have into a hard engineering constraint.

Предварительные знания

Guardrails: LLM Security - Prompt Injection, Jailbreak, Content Filtering

EU AI Act: The World's First AI Law

The **EU AI Act** came into force on August 1, 2024, with phased implementation through 2027. The world's first comprehensive AI regulation - and unlike GDPR, it places compliance responsibility on the deployer, not just the model developer. Fines for GPAI violations are already written into law: up to `35M EUR` or `7%` of global turnover, directly comparable to GDPR scale.

**General Purpose AI Models (GPAI)** - a separate category covering foundation models (GPT-4, Claude, Llama). From August 2025, required:

Technical documentation (model card)
Compliance with EU copyright law during training
Disclosure of a training data summary
For **systemic risk** models (>10^25 FLOP during training): adversarial testing, incident reporting, cybersecurity measures

**The EU AI Act applies to ANY company whose AI systems operate in the EU.** It doesn't matter where the company is registered - if the product is available in the EU, the law applies. Think of it as GDPR for AI.

**Implementation timeline:**

Date	What takes effect
February 2025	Ban on unacceptable risk AI systems
August 2025	Requirements for GPAI (foundation models)
August 2026	Requirements for high-risk AI systems
August 2027	Full enforcement of all requirements

An AI system for screening job candidates' resumes in the EU is classified as:

Copyright & AI: Who Owns AI-Generated Content

Two independent copyright questions in AI: 1. **Can copyrighted data be used for training?** (2) **Who owns AI-generated content?** The answers differ by jurisdiction and get rewritten with each ruling. Getty Images vs Stability AI (settled 2025) made one thing clear: scraping `12 million` licensed photos isn't fair use - it's commercial harm. NYT vs OpenAI (still ongoing) could fundamentally reshape whether training on web data is viable at all.

**Practical decisions for AI products - what to actually do right now:**

**Terms of Service:** directly define rights to AI-generated content. Who owns the output - the user or the platform?
**Disclosure:** label AI-generated content ("Generated with AI" or a metadata tag). The EU AI Act requires this for limited risk systems
**Training data audit:** document data sources. EU GPAI rules require a training data summary
**Opt-out mechanism:** support robots.txt / ai.txt for web crawling. Respect do-not-train requests
**Human-in-the-loop for copyrightable output:** if content needs to be protected, a human must have substantially contributed

**C2PA (Coalition for Content Provenance and Authenticity)** - a standard by Adobe, Microsoft, and the BBC for labeling AI content. It embeds cryptographic metadata into files: who created it, with what tool, and when. Camera manufacturers (Nikon, Sony) already support it for photos.

According to the US Copyright Office (2023), an AI-generated image created solely from a prompt:

Bias & Fairness: AI System Prejudice

AI models inherit bias from training data - mechanically, without intent. Amazon's recruiting tool was trained on 10 years of hiring history where senior roles went predominantly to men. The result: the system downgraded resumes mentioning the word "women's" (as in "women's chess club"). For **high-risk systems** (HR, credit scoring, medical diagnostics), bias testing is mandatory under the EU AI Act - and must be continuous, not a one-time check.

**Bias testing is not a one-time procedure.** Models get updated, data changes, use cases expand. Bias audits should be part of the CI/CD pipeline for high-risk AI systems. The EU AI Act requires ongoing monitoring.

An AI resume screening system shows bias: it rates male resumes 15% higher. The most likely cause?

Practical Compliance Checklist for an AI Product

After the Samsung data leak, the Getty lawsuit, and the Meta GDPR fine, the pattern is clear: legal risk in AI doesn't materialize "someday" - it shows up on the first serious scale-up. Below is a **concrete checklist** applicable to any AI product. Structured by priority: must-have (cannot launch in the EU without these), should-have (strong recommendation), nice-to-have (competitive differentiator).

**Liability** - who gets sued when AI causes harm. The EU AI Liability Directive (in progress) introduces a presumption of causality: if an AI system violated regulatory requirements, the victim doesn't need to prove direct causation. This flips the burden of proof:

Scenario	Who is responsible	Why
AI chatbot gives harmful medical advice	Product developer	The deployer must provide safety guardrails
AI model hallucinates false information	Product developer	Duty of care: output validation is mandatory
AI resume screener discriminates	Developer + deployer	High-risk system: bias testing is mandatory
User uses AI for harm	User (primarily)	But the deployer must have moderation in place
Model provider (OpenAI/Anthropic) outputs harmful content	Shared liability	Provider: safety training, Deployer: guardrails

**Practical advice:** for most AI products (chatbot, summarization, code assistant), the MUST-HAVE checklist + basic bias testing is sufficient. Full compliance is only needed for high-risk systems. Don't over-engineer compliance for low-risk products.

AI chatbot for customer support. From the checklist, what is MUST-HAVE?

Key Takeaways

EU AI Act risk mapping: before shipping an AI feature, classify the risk level. HR, credit, medical = high risk; full compliance required from August 2026
Copyright: pure AI output has no copyright protection (US Copyright Office, 2023). Training on unlicensed data without an opt-out mechanism is the path Getty vs Stability AI took
PII filtering before LLM: the Samsung incident shows what happens without it - proprietary data leaks through prompts. A Data Processing Agreement with the provider is the legal baseline
Bias audits in CI/CD: Amazon ran its recruiting tool for four years before catching the bias. One-time testing is not enough - models update, data drifts
Liability chain: model providers are responsible for safety training; deployers are responsible for guardrails and moderation. When an AI chatbot causes harm, the deployer gets the first lawsuit

What's Next

The legal and ethical frameworks are clear. The final lesson is the capstone project: designing and building a complete AI application from requirements to deployment, applying everything learned.

Capstone Project — Final project - a design doc for a production AI application
Guardrails — Technical safeguards for AI systems that complement legal compliance

Связанные уроки

aie-33-guardrails — Compliance enforces guardrails on outputs
aie-35-observability — Audit logs need observability and tracing
aie-64-synthetic-data — Synthetic data reduces privacy and copyright risk
sd-23-security — Data privacy reuses security and isolation controls
stat-20-causal — Bias auditing leans on causal fairness analysis
stat-05-hypothesis