Friday, October 18, 2013

Managing Large Transactional Tables Part I of III

As transactional tables grow, it becomes increasingly difficult to perform DDL operations on those tables, like adding columns or adding indexes. These operations take hours, and various gymnastics are to be performed to ensure tables are not locked and the application is available always. I will cover the "gymnastics" in part II, and this post will cover the sharding strategies.

Sharding is the primary strategy for scalability of the Internet giants, but should be implemented by smaller applications as well for operational efficiency. There are excellent articles on sharding available on the Internet, like this one: http://www.codefutures.com/database-sharding/.

In summary, sharding involves using multiple database instances, where the reference data are replicated in all instances and large transactional tables are horizontally partitioned across instances.

Reference data are smallish tables, not exceeding 100k rows per table, and typically much less - in 100s or 1000s. Transactional data are in tables that grow to 10s or 100s of millions of rows per table and keep growing day by day.

Key deciding factor is, based on what criteria transaction tables are to be partitioned. Key constraint is, no SQL query should span across database instances. Partitioned data, together with the reference data in an instance should be self contained to run the application.

In our case, which is a multi-tenant SaaS application, the decision is simple. Tenant account is the deciding criteria for partitioning. Each database instance holds transactional data for a group of tenant accounts, and the grouping is currently based on geographies.

Currently we are targeting to keep about 10-20 million rows per table per database instance. By the way, we do need certain relational database features in our application like joins (both inner and outer), aggregation, sorting and searching based on different columns, sub-queries, etc., so NoSQL databases are not an option.

No comments:

Post a Comment