Databases
MongoDB: Document Database in Production
When Forbes migrated its site to a new platform, it needed a database for articles with unpredictable metadata structure. Thirty years ago an article was text and an author. Today it is text, author, tags, SEO metadata, embed codes, A/B headline variants, and translation versions. MongoDB allowed schema changes without painful migrations.
- **Cisco**: MongoDB for IoT analytics - 500 million events per day from devices worldwide.
- **Adobe Experience Manager**: document model for managing content across millions of media files.
- **eBay**: MongoDB for the listings system - the flexible schema handles different product types with unique attributes.
The Document Data Model
MongoDB stores data as BSON (Binary JSON) documents in collections. BSON extends JSON with types such as Date, ObjectId, Binary, and Decimal128. An ObjectId is a 12-byte identifier that embeds a creation timestamp, enabling time-based sorting without an additional field.
The central data modeling decision is embed vs reference. If data is read together - embed (nested document). If data is updated independently or its size is unbounded - use a reference. Airbnb embeds listing photos in the document (read together), but stores reviews in a separate collection (potentially thousands of them).
The 16 MB rule: a MongoDB document is limited to 16 MB. eBay discovered this the hard way: a bid_history collection was accumulating bids directly inside the listing document. After thousands of bids, the document hit the limit. Fix: move bids to a separate collection with a reference to the listing.
When is it correct to use a reference instead of an embed in MongoDB?