When working with databases, it’s easy to overlook the impact of schema design on query performance. I learned this lesson the hard way. Here’s my story of how I optimized my Mongoose schema and aggregation logic to handle thousands of records efficiently.
The Problem: Slow Queries and Lookup Overhead
In my application, I needed to fetch CollectionOrder
documents, which are linked to several other collections: RetailStore
, DeliveryMan
, and CollectionCompany
. Each CollectionOrder
had fields referencing these collections, which required multiple $lookup
stages to fetch related data.
Initially, my schema design looked like this:
@Prop({
required: true,
type: mongoose.Schema.Types.ObjectId,
ref: 'RetailStore',
})
retailStore: mongoose.Schema.Types.ObjectId;
@Prop({
required: true,
type: mongoose.Schema.Types.ObjectId,
ref: 'DeliveryMan',
})
deliveryMan: mongoose.Schema.Types.ObjectId;
This design seemed fine at first, but as my database grew, the queries became unbearably slow. Why? Because each $lookup
operation introduced significant overhead, especially with over 10,000 RetailStores
and CollectionCompanies
.
The Realization: Embedding Subschemas
It was clear that relying on $lookup
for every query wasn’t scalable. I needed a more efficient approach, one that would eliminate the need for additional lookups. That’s when I decided to embed the related data directly into the CollectionOrder
schema as subschemas.
Here’s how the updated schema looks:
@Prop({ type: RetailStoreSchema })
retailStore: RetailStore;
@Prop({ type: DeliveryManSchema })
deliveryMan: DeliveryMan;
@Prop({ type: CollectionCompanySchema })
collectionCompany?: CollectionCompany;
Instead of storing just the IDs, I now store the entire RetailStore
, DeliveryMan
, and CollectionCompany
objects. This means the data is denormalized, but the performance gains make it worthwhile.
The Benefits: Faster Queries, Simpler Aggregations
With the updated schema, I no longer need $lookup
stages in my aggregations. When fetching CollectionOrder
, all the related data is already embedded, making the queries significantly faster.
Here’s an example query before and after the change:
Before:
[
{ "$lookup": { "from": "retailstores", "localField": "retailStore", "foreignField": "_id", "as": "retailStore" } },
{ "$unwind": "$retailStore" }
]
After:
[
{ "$match": { "status": "active" } }
]
By embedding the data, the query complexity reduced dramatically. This change alone shaved seconds off my query execution time.
Challenges: Keeping Data Up-to-Date
Of course, this approach isn’t without its challenges. One major drawback of embedding data is keeping it up-to-date. For instance, if the name of a RetailStore
changes, I need to ensure that all related CollectionOrder
documents are updated.
To handle this, I implemented update hooks in my RetailStore
and DeliveryMan
services. These hooks propagate changes to all embedded documents whenever a related field is updated.
My Takeaway
This experience taught me the importance of designing schemas with performance in mind. While normalization is often the default choice, it’s not always the best solution — especially for queries that require joining large collections.
Switching to embedded schemas transformed how my application handles data. Queries that used to take several seconds now return results in milliseconds. This optimization not only improved performance but also simplified my codebase.
Final Thoughts
If you’re facing similar challenges with slow queries in Mongoose, consider revisiting your schema design. While embedding data isn’t a one-size-fits-all solution, it can be a game-changer in scenarios like mine. And remember, always benchmark your changes to measure the impact.
Happy coding!