- Published on
MongoDB Data Modeling Best Practices
- Authors
- Name
- Mamun Rashid
- @mmncit
MongoDB Data Modeling Best Practices
Welcome to Part 4 of our MongoDB Zero to Hero series. After mastering CRUD operations, it's time to understand how to structure your data effectively in MongoDB.
Understanding Document-Oriented Design
Unlike relational databases with rigid schemas, MongoDB offers flexible document-based data modeling. This flexibility is both powerful and potentially problematic if not used wisely.
Key Principles
- Model for Your Application: Design around your application's data access patterns
- Embrace Denormalization: It's often better than complex joins
- Think in Terms of Documents: Not tables and rows
- Consider Read/Write Patterns: Optimize for your most common operations
Data Modeling Patterns
1. Embedding (One-to-One and One-to-Few)
When to Use: When you have related data that's accessed together and doesn't grow unbounded.
// User with embedded address (One-to-One)
{
_id: ObjectId("..."),
name: "John Doe",
email: "john@example.com",
address: {
street: "123 Main St",
city: "New York",
state: "NY",
zipCode: "10001",
country: "USA"
},
createdAt: ISODate("2024-01-15")
}
// Blog post with embedded comments (One-to-Few)
{
_id: ObjectId("..."),
title: "MongoDB Data Modeling",
content: "Content of the blog post...",
author: "Jane Smith",
tags: ["mongodb", "database", "nosql"],
comments: [
{
_id: ObjectId("..."),
author: "Reader1",
text: "Great post!",
date: ISODate("2024-01-16")
},
{
_id: ObjectId("..."),
author: "Reader2",
text: "Very helpful, thanks!",
date: ISODate("2024-01-17")
}
],
publishedAt: ISODate("2024-01-15")
}
Advantages:
- Single query to get all related data
- Atomic updates
- Better performance for read operations
Disadvantages:
- Document size can grow (16MB limit)
- Potential for data duplication
- Complex updates when embedded data changes
2. Referencing (One-to-Many and Many-to-Many)
When to Use: When you have large amounts of related data or many-to-many relationships.
// User document
{
_id: ObjectId("user1"),
name: "Alice Johnson",
email: "alice@example.com",
createdAt: ISODate("2024-01-15")
}
// Order documents (One-to-Many)
{
_id: ObjectId("order1"),
userId: ObjectId("user1"), // Reference to user
items: [
{
productId: ObjectId("product1"),
name: "MongoDB Book",
price: 29.99,
quantity: 1
}
],
total: 29.99,
status: "completed",
orderDate: ISODate("2024-01-16")
}
// Many-to-Many: Users and Roles
// User document
{
_id: ObjectId("user1"),
name: "Bob Smith",
email: "bob@example.com",
roleIds: [ObjectId("role1"), ObjectId("role2")] // Array of references
}
// Role documents
{
_id: ObjectId("role1"),
name: "admin",
permissions: ["read", "write", "delete"]
}
{
_id: ObjectId("role2"),
name: "editor",
permissions: ["read", "write"]
}
Advantages:
- Avoids data duplication
- Better for frequently changing data
- Supports large datasets
- Easier to maintain consistency
Disadvantages:
- Requires multiple queries or $lookup
- No foreign key constraints
- More complex application logic
3. Hybrid Approach
Often, the best solution combines embedding and referencing:
// E-commerce product with embedded variants but referenced reviews
{
_id: ObjectId("product1"),
name: "Laptop",
brand: "TechBrand",
category: "Electronics",
// Embedded variants (few, stable)
variants: [
{
sku: "LAPTOP-001-BLK",
color: "Black",
storage: "256GB",
price: 999.99,
inventory: 50
},
{
sku: "LAPTOP-001-SLV",
color: "Silver",
storage: "512GB",
price: 1199.99,
inventory: 30
}
],
// Basic review stats (frequently accessed)
reviewSummary: {
averageRating: 4.2,
totalReviews: 156,
lastUpdated: ISODate("2024-01-16")
},
// Reference to detailed reviews (many, growing)
// Reviews stored in separate collection
createdAt: ISODate("2024-01-10")
}
// Separate reviews collection
{
_id: ObjectId("review1"),
productId: ObjectId("product1"),
userId: ObjectId("user1"),
rating: 5,
title: "Excellent laptop!",
content: "Fast performance, great build quality...",
helpful: 23,
verified: true,
createdAt: ISODate("2024-01-16")
}
Schema Design Patterns
1. Polymorphic Pattern
Store different types of entities in the same collection:
// Events collection with different event types
{
_id: ObjectId("event1"),
type: "user_registration",
timestamp: ISODate("2024-01-16"),
userId: ObjectId("user1"),
data: {
email: "user@example.com",
source: "website"
}
}
{
_id: ObjectId("event2"),
type: "purchase",
timestamp: ISODate("2024-01-16"),
userId: ObjectId("user1"),
data: {
orderId: ObjectId("order1"),
amount: 99.99,
paymentMethod: "credit_card"
}
}
{
_id: ObjectId("event3"),
type: "page_view",
timestamp: ISODate("2024-01-16"),
userId: ObjectId("user1"),
data: {
page: "/products/laptop",
referrer: "https://google.com",
duration: 45
}
}
2. Attribute Pattern
Handle documents with many similar fields or sparse data:
// Traditional approach (sparse, many null values)
{
_id: ObjectId("product1"),
name: "Laptop",
color: "Black",
weight: 2.5,
screenSize: 15.6,
ramSize: 16,
storageSize: 512,
cpuSpeed: 2.4,
// ... potentially hundreds of other attributes
batteryLife: null,
waterproof: null,
// ... many null values
}
// Attribute pattern (flexible, efficient)
{
_id: ObjectId("product1"),
name: "Laptop",
category: "Electronics",
attributes: [
{ name: "color", value: "Black", type: "string" },
{ name: "weight", value: 2.5, type: "number", unit: "kg" },
{ name: "screenSize", value: 15.6, type: "number", unit: "inches" },
{ name: "ramSize", value: 16, type: "number", unit: "GB" },
{ name: "storageSize", value: 512, type: "number", unit: "GB" },
{ name: "cpuSpeed", value: 2.4, type: "number", unit: "GHz" }
]
}
// Create index for efficient attribute queries
db.products.createIndex({ "attributes.name": 1, "attributes.value": 1 })
// Query example
db.products.find({
"attributes": {
$elemMatch: {
"name": "ramSize",
"value": { $gte: 16 }
}
}
})
3. Bucket Pattern
Aggregate time-series or similar data:
// Instead of one document per data point
{
_id: ObjectId("reading1"),
sensorId: "sensor001",
timestamp: ISODate("2024-01-16T10:00:00Z"),
temperature: 23.5,
humidity: 45.2
}
// Bucket pattern: group multiple readings
{
_id: ObjectId("bucket1"),
sensorId: "sensor001",
date: ISODate("2024-01-16"),
hour: 10,
readings: [
{
minute: 0,
temperature: 23.5,
humidity: 45.2
},
{
minute: 1,
temperature: 23.6,
humidity: 45.1
},
// ... up to 60 readings per hour
],
count: 60,
averageTemp: 23.8,
averageHumidity: 45.0
}
4. Outlier Pattern
Handle documents that don't fit the normal pattern:
// Normal social media post
{
_id: ObjectId("post1"),
userId: ObjectId("user1"),
content: "Just learned MongoDB data modeling!",
likes: ["user2", "user3", "user4"], // Few likes, can embed
comments: [
{ userId: ObjectId("user2"), text: "Great!" },
{ userId: ObjectId("user3"), text: "Awesome!" }
],
createdAt: ISODate("2024-01-16")
}
// Viral post (outlier with many likes)
{
_id: ObjectId("post2"),
userId: ObjectId("user1"),
content: "Viral post content...",
likes: {
count: 50000,
isOverflow: true // Flag indicating likes are in separate collection
},
comments: {
count: 5000,
isOverflow: true // Comments also in separate collection
},
createdAt: ISODate("2024-01-16")
}
// Separate collections for overflow data
// likes_overflow collection
{
_id: ObjectId("likes1"),
postId: ObjectId("post2"),
userIds: ["user1", "user2", ..., "user1000"] // Batch of 1000 user IDs
}
// comments_overflow collection
{
_id: ObjectId("comments1"),
postId: ObjectId("post2"),
comments: [
{ userId: ObjectId("user1"), text: "Amazing!", date: ISODate("...") },
// ... more comments
]
}
Relationships in MongoDB
One-to-One Relationships
// Embed when data is accessed together
{
_id: ObjectId("user1"),
name: "John Doe",
email: "john@example.com",
profile: { // One-to-one embedded
bio: "Software developer...",
avatar: "https://example.com/avatar.jpg",
preferences: {
theme: "dark",
notifications: true
}
}
}
// Reference when data is large or accessed separately
{
_id: ObjectId("user1"),
name: "John Doe",
email: "john@example.com",
profileId: ObjectId("profile1") // Reference to separate profile document
}
One-to-Many Relationships
// Embed Many in One (when "many" is limited)
{
_id: ObjectId("order1"),
customerId: ObjectId("customer1"),
items: [ // Embedded line items
{
productId: ObjectId("product1"),
name: "Product Name",
price: 19.99,
quantity: 2
}
],
total: 39.98
}
// Reference One from Many (when "many" is unlimited)
// Customer document
{
_id: ObjectId("customer1"),
name: "Customer Name",
email: "customer@example.com"
}
// Many order documents
{
_id: ObjectId("order1"),
customerId: ObjectId("customer1"), // Reference to customer
total: 39.98,
date: ISODate("2024-01-16")
}
Many-to-Many Relationships
// Students and Courses (embed array of references)
// Student document
{
_id: ObjectId("student1"),
name: "Alice Smith",
email: "alice@university.edu",
courseIds: [ // Array of course references
ObjectId("course1"),
ObjectId("course2"),
ObjectId("course3")
]
}
// Course document
{
_id: ObjectId("course1"),
name: "Database Systems",
code: "CS301",
instructor: "Dr. Johnson",
studentIds: [ // Array of student references
ObjectId("student1"),
ObjectId("student2"),
// ... more students
]
}
// Alternative: Junction collection for complex many-to-many
// enrollment collection
{
_id: ObjectId("enrollment1"),
studentId: ObjectId("student1"),
courseId: ObjectId("course1"),
enrollmentDate: ISODate("2024-01-16"),
grade: null,
status: "active"
}
Denormalization Strategies
When to Denormalize
- Frequently accessed together: Data that's always read together
- Read-heavy workloads: Optimize for query performance
- Stable data: Information that doesn't change often
- Acceptable redundancy: When storage cost is less than query complexity
Denormalization Example
// Normalized approach (requires multiple queries)
// Users collection
{
_id: ObjectId("user1"),
name: "Alice Johnson",
email: "alice@example.com"
}
// Posts collection
{
_id: ObjectId("post1"),
title: "My First Post",
content: "Post content...",
authorId: ObjectId("user1"),
createdAt: ISODate("2024-01-16")
}
// Denormalized approach (single query)
// Posts collection with embedded author info
{
_id: ObjectId("post1"),
title: "My First Post",
content: "Post content...",
author: { // Denormalized author data
_id: ObjectId("user1"),
name: "Alice Johnson",
email: "alice@example.com"
},
createdAt: ISODate("2024-01-16")
}
Managing Denormalized Data
// Update user name in both users and posts collections
function updateUserName(userId, newName) {
// Update users collection
db.users.updateOne({ _id: userId }, { $set: { name: newName } });
// Update denormalized data in posts
db.posts.updateMany({ 'author._id': userId }, { $set: { 'author.name': newName } });
}
Performance Considerations
Document Size Limits
- 16MB limit: Keep documents under this limit
- Working set: Frequently accessed documents should fit in memory
- Index size: Consider index size when designing schema
Query Patterns
// Design for your most common queries
// If you frequently query posts by author and date:
{
_id: ObjectId("post1"),
authorId: ObjectId("user1"), // Index: { authorId: 1, publishedAt: -1 }
title: "Post Title",
publishedAt: ISODate("2024-01-16"),
// ... other fields
}
// Create compound index
db.posts.createIndex({ authorId: 1, publishedAt: -1 })
Write Patterns
// Optimize for write-heavy workloads
// Time-series data with bucketing
{
_id: ObjectId("metrics_2024_01_16_10"),
date: ISODate("2024-01-16"),
hour: 10,
data: [
{ minute: 0, cpu: 45.2, memory: 67.8 },
{ minute: 1, cpu: 46.1, memory: 68.2 },
// ... more data points
]
}
Schema Evolution
Versioning Strategy
// Version field approach
{
_id: ObjectId("user1"),
schemaVersion: 2,
name: "John Doe",
email: "john@example.com",
// New fields in version 2
preferences: {
theme: "dark",
notifications: true
}
}
// Handle different versions in application code
function getUser(userId) {
const user = db.users.findOne({ _id: userId });
if (user.schemaVersion === 1) {
// Migrate or provide defaults
user.preferences = {
theme: "light",
notifications: true
};
}
return user;
}
Migration Strategies
// Lazy migration: Update documents as they're accessed
db.users.updateMany(
{ schemaVersion: { $exists: false } },
{
$set: {
schemaVersion: 2,
preferences: {
theme: 'light',
notifications: true,
},
},
},
);
// Progressive migration script
const cursor = db.users.find({ schemaVersion: 1 });
while (cursor.hasNext()) {
const user = cursor.next();
// Perform migration
const updatedUser = migrateUserToV2(user);
db.users.replaceOne({ _id: user._id }, updatedUser);
}
Common Anti-Patterns to Avoid
1. Unnecessary Normalization
// Anti-pattern: Over-normalization
{
_id: ObjectId("address1"),
street: "123 Main St",
cityId: ObjectId("city1") // Unnecessary reference
}
{
_id: ObjectId("city1"),
name: "New York",
stateId: ObjectId("state1") // Another unnecessary reference
}
// Better: Embed stable data
{
_id: ObjectId("user1"),
name: "John Doe",
address: {
street: "123 Main St",
city: "New York",
state: "NY",
country: "USA"
}
}
2. Massive Arrays
// Anti-pattern: Unbounded array growth
{
_id: ObjectId("post1"),
title: "Popular Post",
likes: [userId1, userId2, ..., userId50000] // Too many elements
}
// Better: Use separate collection or bucketing
{
_id: ObjectId("post1"),
title: "Popular Post",
likeCount: 50000,
// Store likes in separate collection
}
3. Inappropriate Embedding
// Anti-pattern: Embedding frequently changing data
{
_id: ObjectId("user1"),
name: "John Doe",
orders: [ // Orders change frequently, grow unbounded
{ orderId: ObjectId("order1"), total: 99.99, status: "shipped" },
{ orderId: ObjectId("order2"), total: 149.99, status: "pending" },
// ... potentially thousands of orders
]
}
// Better: Use references
{
_id: ObjectId("order1"),
userId: ObjectId("user1"),
total: 99.99,
status: "shipped"
}
Best Practices Summary
- Understand your data access patterns
- Favor embedding for one-to-few relationships
- Use references for one-to-many and many-to-many
- Denormalize frequently accessed data
- Keep document size reasonable
- Plan for schema evolution
- Index your queries
- Monitor and optimize performance
What's Next?
Now that you understand data modeling, it's time to optimize your queries with Indexing and Performance, or learn how to process your data with the Aggregation Pipeline.
Series Navigation
- Previous: MongoDB CRUD Operations
- Next: MongoDB Indexing and Performance
- Hub: MongoDB Zero to Hero - Complete Guide
This is Part 4 of the MongoDB Zero to Hero series. Data modeling is crucial for building scalable MongoDB applications - take time to understand these patterns before moving to more advanced topics.
Enjoyed this post?
Subscribe to get notified about new posts and updates. No spam, unsubscribe anytime.
By subscribing, you agree to our Privacy Policy. You can unsubscribe at any time.
Discussion (0)
This website is still under development. If you encounter any issues, please contact me