Published on

MongoDB Indexing and Performance Optimization

Authors

MongoDB Indexing and Performance Optimization

Welcome to Part 5 of our MongoDB Zero to Hero series. After understanding data modeling, it's crucial to learn how to make your MongoDB queries fast and efficient through proper indexing and performance optimization.

Understanding MongoDB Indexes

Indexes are data structures that improve query performance by creating shortcuts to your data. Without indexes, MongoDB must scan every document in a collection (collection scan) to find matching documents.

How Indexes Work

// Without index: MongoDB scans all documents
db.users.find({ email: 'alice@example.com' }); // Scans entire collection

// With index on email field: MongoDB uses index for fast lookup
db.users.createIndex({ email: 1 });
db.users.find({ email: 'alice@example.com' }); // Uses index for fast lookup

Think of indexes like a book's index - instead of reading every page to find a topic, you look it up in the index and jump directly to the relevant pages.

Types of Indexes

1. Single Field Indexes

The most basic type of index on a single field:

// Create ascending index
db.users.createIndex({ name: 1 });

// Create descending index
db.users.createIndex({ age: -1 });

// Create index on embedded document field
db.users.createIndex({ 'address.city': 1 });

// Create index on array field
db.posts.createIndex({ tags: 1 });

2. Compound Indexes

Indexes on multiple fields, order matters:

// Compound index: age ascending, name ascending
db.users.createIndex({ age: 1, name: 1 });

// This index supports queries on:
// 1. { age: ... }
// 2. { age: ..., name: ... }
// But NOT efficiently on: { name: ... } alone

// Example queries that use this index efficiently:
db.users.find({ age: 25 });
db.users.find({ age: 25, name: 'John' });
db.users.find({ age: { $gt: 25 } }).sort({ name: 1 });

Index Prefix Rule: A compound index can support queries on any prefix of the indexed fields.

// Index: { a: 1, b: 1, c: 1 }
// Supports queries on:
// - { a: ... }
// - { a: ..., b: ... }
// - { a: ..., b: ..., c: ... }
// Does NOT efficiently support:
// - { b: ... }
// - { c: ... }
// - { b: ..., c: ... }

3. Multikey Indexes

Automatically created when indexing array fields:

// Document with array field
{
  _id: ObjectId("..."),
  name: "Alice",
  hobbies: ["reading", "photography", "hiking"]
}

// Create index on array field
db.users.createIndex({ hobbies: 1 })

// Queries that use the multikey index
db.users.find({ hobbies: "reading" })
db.users.find({ hobbies: { $in: ["reading", "gaming"] } })

Important: Compound indexes with multiple array fields are not allowed.

// This will fail if both tags and categories are arrays
db.posts.createIndex({ tags: 1, categories: 1 });

4. Text Indexes

For text search functionality:

// Create text index on single field
db.articles.createIndex({ title: 'text' });

// Create text index on multiple fields
db.articles.createIndex({
    title: 'text',
    content: 'text',
});

// Text search queries
db.articles.find({ $text: { $search: 'mongodb database' } });

// Text search with phrase
db.articles.find({ $text: { $search: '"mongodb database"' } });

// Text search with exclusion
db.articles.find({ $text: { $search: 'mongodb -database' } });

5. Geospatial Indexes

For location-based queries:

// 2dsphere index for modern geospatial queries
db.places.createIndex({ location: "2dsphere" })

// Sample document with GeoJSON
{
  _id: ObjectId("..."),
  name: "Central Park",
  location: {
    type: "Point",
    coordinates: [-73.965355, 40.782865]  // [longitude, latitude]
  }
}

// Find places near a point
db.places.find({
  location: {
    $near: {
      $geometry: {
        type: "Point",
        coordinates: [-73.9857, 40.7484]
      },
      $maxDistance: 1000  // meters
    }
  }
})

// Find places within a polygon
db.places.find({
  location: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [[
          [-74.0, 40.7], [-74.0, 40.8],
          [-73.9, 40.8], [-73.9, 40.7],
          [-74.0, 40.7]
        ]]
      }
    }
  }
})

6. Sparse Indexes

Only index documents that have the indexed field:

// Create sparse index (only indexes documents with email field)
db.users.createIndex({ email: 1 }, { sparse: true });

// Regular index would index documents with null/missing email
// Sparse index skips them entirely

7. Partial Indexes

Index only documents that meet specified criteria:

// Index only active users over 18
db.users.createIndex(
    { name: 1, age: 1 },
    {
        partialFilterExpression: {
            age: { $gt: 18 },
            active: true,
        },
    },
);

// This query can use the partial index
db.users.find({ name: 'John', age: 25, active: true });

// This query cannot use the partial index (missing active: true)
db.users.find({ name: 'John', age: 25 });

8. TTL (Time To Live) Indexes

Automatically delete documents after a certain time:

// Delete documents 24 hours after createdAt
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 86400 }  // 24 hours
)

// Delete documents at specific date
db.logs.createIndex({ expireAt: 1 }, { expireAfterSeconds: 0 })

// Document with expiration date
{
  _id: ObjectId("..."),
  message: "This will be deleted",
  expireAt: new Date("2024-01-20T00:00:00Z")
}

9. Unique Indexes

Ensure field values are unique:

// Create unique index
db.users.createIndex({ email: 1 }, { unique: true });

// Compound unique index
db.products.createIndex({ name: 1, category: 1 }, { unique: true });

// Unique sparse index (allows multiple null values)
db.users.createIndex({ socialSecurityNumber: 1 }, { unique: true, sparse: true });

Index Management

Creating Indexes

// Basic syntax
db.collection.createIndex(
    { field1: 1, field2: -1 }, // Index specification
    {
        name: 'custom_index_name', // Optional custom name
        background: true, // Build in background (deprecated in 4.2+)
        unique: true, // Unique constraint
        sparse: true, // Sparse index
        expireAfterSeconds: 3600, // TTL
    },
);

// Create multiple indexes at once
db.users.createIndexes([{ name: 1 }, { email: 1 }, { age: 1, city: 1 }]);

Background vs Foreground Index Building

// Foreground (default): Faster build, blocks database
db.users.createIndex({ name: 1 });

// Background (deprecated in 4.2+): Slower build, doesn't block
db.users.createIndex({ name: 1 }, { background: true });

// MongoDB 4.2+: All index builds are background by default
// No blocking unless using special syntax

Listing Indexes

// List all indexes in collection
db.users.getIndexes();

// List index stats
db.users.stats().indexSizes;

// Get index usage statistics
db.users.aggregate([{ $indexStats: {} }]);

Dropping Indexes

// Drop specific index by name
db.users.dropIndex('name_1');

// Drop specific index by specification
db.users.dropIndex({ name: 1 });

// Drop all indexes except _id
db.users.dropIndexes();

// Drop multiple specific indexes
db.users.dropIndexes(['name_1', 'email_1']);

Query Optimization

Using explain()

The explain() method shows how MongoDB executes queries:

// Basic explain
db.users.find({ age: 25 }).explain();

// Execution stats (more detailed)
db.users.find({ age: 25 }).explain('executionStats');

// All plans considered (most detailed)
db.users.find({ age: 25 }).explain('allPlansExecution');

Understanding Explain Output

// Sample explain output
{
  "queryPlanner": {
    "plannerVersion": 1,
    "indexFilterSet": false,
    "parsedQuery": { "age": { "$eq": 25 } },
    "winningPlan": {
      "stage": "IXSCAN",      // Index scan (good!)
      "indexName": "age_1",
      "direction": "forward"
    },
    "rejectedPlans": []
  },
  "executionStats": {
    "totalDocsExamined": 5,   // Documents examined
    "totalDocsReturned": 5,   // Documents returned
    "executionTimeMillis": 1, // Execution time
    "indexBounds": {
      "age": ["[25.0, 25.0]"]
    }
  }
}

Key fields to watch:

  • stage: IXSCAN (good) vs COLLSCAN (bad for large collections)
  • totalDocsExamined vs totalDocsReturned: Lower ratio is better
  • executionTimeMillis: Lower is better

Common Query Patterns

Equality Queries

// Single field equality
db.users.find({ status: 'active' });
// Index: { status: 1 }

// Multiple field equality
db.users.find({ status: 'active', city: 'New York' });
// Index: { status: 1, city: 1 }

Range Queries

// Single field range
db.products.find({ price: { $gte: 10, $lte: 50 } });
// Index: { price: 1 }

// Range + equality
db.products.find({
    category: 'electronics',
    price: { $gte: 10, $lte: 50 },
});
// Index: { category: 1, price: 1 }

Sort Queries

// Sort on indexed field
db.users.find().sort({ age: 1 });
// Index: { age: 1 }

// Sort with equality filter
db.users.find({ status: 'active' }).sort({ age: 1 });
// Index: { status: 1, age: 1 }

// Sort with range filter
db.products
    .find({
        price: { $gte: 10, $lte: 50 },
    })
    .sort({ createdAt: -1 });
// Index: { price: 1, createdAt: -1 }

Index Strategy Guidelines

ESR Rule (Equality, Sort, Range)

For compound indexes, order fields by:

  1. Equality fields first
  2. Sort fields second
  3. Range fields last
// Query pattern
db.products
    .find({
        category: 'electronics', // Equality
        price: { $gte: 10, $lte: 50 }, // Range
    })
    .sort({ createdAt: -1 }); // Sort

// Optimal index following ESR rule
db.products.createIndex({
    category: 1, // Equality
    createdAt: -1, // Sort
    price: 1, // Range
});

Index Cardinality

Higher cardinality (more unique values) should come first in compound indexes:

// Good: High cardinality first
db.users.createIndex({ email: 1, status: 1 }); // email is unique, status has few values

// Less optimal: Low cardinality first
db.users.createIndex({ status: 1, email: 1 }); // status has only few values

Index Selectivity

Create indexes that filter out the most documents first:

// Query: Find active premium users in New York
db.users.find({
    status: 'active', // 80% of users
    type: 'premium', // 10% of users
    city: 'New York', // 5% of users
});

// Better index: Most selective first
db.users.createIndex({ city: 1, type: 1, status: 1 });

Performance Monitoring

Index Usage Statistics

// Check index usage
db.users.aggregate([
  { $indexStats: {} }
])

// Sample output
{
  "name": "email_1",
  "key": { "email": 1 },
  "host": "hostname:27017",
  "accesses": {
    "ops": 1000,           // Number of operations using this index
    "since": ISODate("...")  // Since when stats were collected
  }
}

Slow Query Profiling

// Enable profiling for slow queries (>100ms)
db.setProfilingLevel(1, { slowms: 100 });

// Profile all queries
db.setProfilingLevel(2);

// Check profiling status
db.getProfilingStatus();

// View slow queries
db.system.profile.find().sort({ ts: -1 }).limit(5);

// Disable profiling
db.setProfilingLevel(0);

Database Statistics

// Collection statistics
db.users.stats();

// Database statistics
db.stats();

// Index statistics
db.users.totalIndexSize();

// Current operations
db.currentOp();

Performance Best Practices

1. Index Design

// ✅ Good: Support your query patterns
db.orders.createIndex({ customerId: 1, orderDate: -1 });

// Query this index supports well
db.orders.find({ customerId: ObjectId('...') }).sort({ orderDate: -1 });

// ❌ Bad: Too many single-field indexes
db.users.createIndex({ name: 1 });
db.users.createIndex({ email: 1 });
db.users.createIndex({ age: 1 });
db.users.createIndex({ city: 1 });
// Better: Create compound indexes for actual query patterns

2. Query Optimization

// ✅ Use projection to limit returned fields
db.users.find(
    { status: 'active' },
    { name: 1, email: 1, _id: 0 }, // Only return name and email
);

// ✅ Use limit() for large result sets
db.products.find({ category: 'electronics' }).sort({ price: 1 }).limit(20);

// ✅ Use hint() to force specific index (when needed)
db.users.find({ name: 'John' }).hint({ name: 1, age: 1 });

// ❌ Avoid regex at the beginning of string
db.users.find({ name: /^John/ }); // Can't use index efficiently

// ✅ Better: Use text search or redesign
db.users.find({ $text: { $search: 'John' } });

3. Aggregation Optimization

// ✅ Put $match early in pipeline
db.orders.aggregate([
    { $match: { status: 'completed' } }, // Filter first
    { $group: { _id: '$customerId', total: { $sum: '$amount' } } },
    { $sort: { total: -1 } },
]);

// ✅ Use indexes in aggregation
db.orders.aggregate([
    { $match: { customerId: ObjectId('...') } }, // Uses index
    { $sort: { orderDate: -1 } }, // Uses index if properly indexed
]);

// Create supporting index
db.orders.createIndex({ customerId: 1, orderDate: -1 });

4. Schema Design for Performance

// ✅ Embed frequently accessed data
{
  _id: ObjectId("order1"),
  customerId: ObjectId("customer1"),
  customerName: "John Doe",        // Denormalized for performance
  customerEmail: "john@example.com", // Avoid join for common data
  items: [...],
  total: 199.99
}

// ✅ Use bucketing for time-series data
{
  _id: ObjectId("metrics_2024_01_17_14"),
  date: ISODate("2024-01-17"),
  hour: 14,
  measurements: [
    { minute: 0, cpu: 45.2, memory: 67.8 },
    { minute: 1, cpu: 46.1, memory: 68.2 },
    // ... up to 60 measurements
  ]
}

Common Performance Pitfalls

1. Missing Indexes

// ❌ No index on frequently queried field
db.users.find({ email: 'user@example.com' }); // Collection scan

// ✅ Create index
db.users.createIndex({ email: 1 });

2. Unused Indexes

// ❌ Creating indexes that are never used
db.users.createIndex({ middleName: 1 }); // If you never query by middleName

// Monitor index usage and drop unused indexes
db.users.aggregate([{ $indexStats: {} }]);

3. Over-Indexing

// ❌ Too many indexes slow down writes
db.users.createIndex({ name: 1 });
db.users.createIndex({ email: 1 });
db.users.createIndex({ age: 1 });
db.users.createIndex({ city: 1 });
db.users.createIndex({ status: 1 });
// Each insert/update needs to update all these indexes

// ✅ Create compound indexes for actual query patterns
db.users.createIndex({ status: 1, city: 1, age: 1 });

4. Inefficient Queries

// ❌ Large skip operations
db.users.find().skip(10000).limit(10); // Slow for large skips

// ✅ Use range queries for pagination
db.users.find({ _id: { $gt: lastSeenId } }).limit(10);

// ❌ Inefficient regex
db.users.find({ name: /john/i }); // Case-insensitive, can't use index well

// ✅ Use text index
db.users.createIndex({ name: 'text' });
db.users.find({ $text: { $search: 'john' } });

Index Maintenance

Reindexing

// Rebuild all indexes (use carefully in production)
db.users.reIndex();

// Check index consistency
db.users.validate();

Index Build Strategies

// For large collections, consider:
// 1. Build indexes during off-peak hours
// 2. Use replica sets to minimize downtime
// 3. Monitor index build progress

// Check current index builds
db.currentOp({
    'command.createIndexes': { $exists: true },
});

Monitoring Tools

MongoDB Compass

  • Visual query performance analysis
  • Index usage statistics
  • Real-time performance metrics

MongoDB Atlas

  • Performance Advisor
  • Query profiler
  • Index suggestions

Command Line Tools

// mongostat: Real-time MongoDB statistics
mongostat --host localhost:27017

// mongotop: Track MongoDB read/write activity
mongotop --host localhost:27017

What's Next?

Now that you understand indexing and performance, you're ready to learn about MongoDB's powerful Aggregation Pipeline for complex data processing, or explore Node.js integration to build applications.

Series Navigation


This is Part 5 of the MongoDB Zero to Hero series. Proper indexing is crucial for MongoDB performance - master these concepts before building production applications.

Enjoyed this post?

Subscribe to get notified about new posts and updates. No spam, unsubscribe anytime.

By subscribing, you agree to our Privacy Policy. You can unsubscribe at any time.

Discussion (0)

This website is still under development. If you encounter any issues, please contact me