Title: “How I Optimized My Mongoose Schema and Aggregation for Better Performance”

When working with databases, it’s easy to overlook the impact of schema design on query performance. I learned this lesson the hard way. Here’s my story of how I optimized my Mongoose schema and aggregation logic to handle thousands of records efficiently.

The Problem: Slow Queries and Lookup Overhead

In my application, I needed to fetch CollectionOrder documents, which are linked to several other collections: RetailStore, DeliveryMan, and CollectionCompany. Each CollectionOrder had fields referencing these collections, which required multiple $lookup stages to fetch related data.

Initially, my schema design looked like this:

@Prop({
  required: true,
  type: mongoose.Schema.Types.ObjectId,
  ref: 'RetailStore',
})
retailStore: mongoose.Schema.Types.ObjectId;

@Prop({
  required: true,
  type: mongoose.Schema.Types.ObjectId,
  ref: 'DeliveryMan',
})
deliveryMan: mongoose.Schema.Types.ObjectId;

This design seemed fine at first, but as my database grew, the queries became unbearably slow. Why? Because each $lookup operation introduced significant overhead, especially with over 10,000 RetailStores and CollectionCompanies.

The Realization: Embedding Subschemas

It was clear that relying on $lookup for every query wasn’t scalable. I needed a more efficient approach, one that would eliminate the need for additional lookups. That’s when I decided to embed the related data directly into the CollectionOrder schema as subschemas.

Here’s how the updated schema looks:

@Prop({ type: RetailStoreSchema })
retailStore: RetailStore;

@Prop({ type: DeliveryManSchema })
deliveryMan: DeliveryMan;
@Prop({ type: CollectionCompanySchema })
collectionCompany?: CollectionCompany;

Instead of storing just the IDs, I now store the entire RetailStore, DeliveryMan, and CollectionCompany objects. This means the data is denormalized, but the performance gains make it worthwhile.

The Benefits: Faster Queries, Simpler Aggregations

With the updated schema, I no longer need $lookup stages in my aggregations. When fetching CollectionOrder, all the related data is already embedded, making the queries significantly faster.

Here’s an example query before and after the change:

Before:

[
  { "$lookup": { "from": "retailstores", "localField": "retailStore", "foreignField": "_id", "as": "retailStore" } },
  { "$unwind": "$retailStore" }
]

After:

[
  { "$match": { "status": "active" } }
]

By embedding the data, the query complexity reduced dramatically. This change alone shaved seconds off my query execution time.

Challenges: Keeping Data Up-to-Date

Of course, this approach isn’t without its challenges. One major drawback of embedding data is keeping it up-to-date. For instance, if the name of a RetailStore changes, I need to ensure that all related CollectionOrder documents are updated.

To handle this, I implemented update hooks in my RetailStore and DeliveryMan services. These hooks propagate changes to all embedded documents whenever a related field is updated.

My Takeaway

This experience taught me the importance of designing schemas with performance in mind. While normalization is often the default choice, it’s not always the best solution — especially for queries that require joining large collections.

Switching to embedded schemas transformed how my application handles data. Queries that used to take several seconds now return results in milliseconds. This optimization not only improved performance but also simplified my codebase.

Final Thoughts

If you’re facing similar challenges with slow queries in Mongoose, consider revisiting your schema design. While embedding data isn’t a one-size-fits-all solution, it can be a game-changer in scenarios like mine. And remember, always benchmark your changes to measure the impact.

Happy coding!

J World

Search This Blog