So the easy solution is to add indexes to everything, then just observe if they’re being used.” One problem with the “index everything” strategy is that indexes can take up lots of disk space, and they also increase the number of writes to the database.But, as we discovered here at Rap Genius, when you have a large-scale app that’s not all that can go wrong.
I recently learned this from rosser, who shared this technique with everyone on Hacker News. Hopefully you will have gleaned a thing or two from this post, and are now ready to venture out and write some robust and awesome data migrations of your own. Voila: sensible defaults, better shell and overall Postgres experience.
Here’s a concrete example of how we can safely upgrade our most trusted “moderator” users to the “admin” role on a large internet forum, without contending with concurrent writes or incurring substantial I/O all at once: immediately locks the rows being retrieved in the first step (as if they were to be updated), and this prevents them from being modified or deleted by other transactions until the current transaction ends. I tried to keep this post fairly concise, but for the boldest of explorers, here is an excellent wiki page containing many other Postgres gems that you also may not have known about.
A single index can wreak tremendous havoc with Postgre SQL's query planner, which we’ve found sometimes operates unpredictably.
To avoid this pitfall, do your own analysis to make sure that your indexes match your usage patterns, and be sure to automatically monitor slow queries, which might crop up at any moment.
In plain language, we're asking the database to find which annotation was updated most recently from a list of 9 candidates, and the database's response is, “Okay, let me get all annotations, order them by using an index, then go through that whole list of results to find the ones that match your 9 IDs.” This is hugely inefficient, because there are millions of records in the table.
It might be fast to order the whole table by , but that's fine: there aren't that many results from the query—a few dozen at most—so there's no need to use an index to sort.
If you plan to keep history for long periods (many months), I suggest having a look at partitioning options - may be one partition for each day or week and so on.
It does depend on the access patterns of you history table also (do you run queries that access data across dates? Have a look at materialized views for storing aggregates/summaries.
In our case, the consequences of having the wrong index were much more severe.