TL;DR:
Database changes are different from code changes and require more planning. Especially when you do a gradual migration with feature toggles. In this article I’ll talk about the things you should know before doing the migration.
This photo by Joshua Sortino on Unsplash
Feature Toggles in Wix
At Wix we all work on the master branch (main) and with full CI/CD so our production environment is constantly integrated. It's definitely part of our development and deployments. Feature toggles have many great advantages like internal testing, gradual rollout, and more (you can read more about how we use it in this article).
But when it comes to DB changes, it can be tricky.
Our use case
In our case, we have a standard MySQL table with one column as primary key. For a certain business need, our goal is to change the PK to a composite primary key consisting of two columns. And guess what? We are going to use a feature toggle.
Our table before the migration
First, let’s create a table for customers:
Our query will look like this:
Now we can write some code
As mentioned above, our goal is to migrate the “customer” table into a composite key table by adding the customer_id as primary key as well as session (which is already a PK). Before making the DB changes, we want to add the feature toggle. So when the toggle is “OFF” it will query by the “session” while when the toggle is “ON” it will query by both customer_id and session.
Let’s create two different queries in Java, and use them based on the toggle:
Naive Solution
Let’s run the following statement:
Take a moment and see if you can notice the problems above…
For the following example, I inserted ~13 million dummy rows using this script.
I used “Explain Analyze” (IntelliJ as my MySQL client), which will execute the query and provide us with information about the plan and cost.
Problem #1
When feature toggle is “ON”
We’ll query both session and customer_id columns:
The output will be:
Great!
The number of rows that will be fetched is 1 row (the query took ~50 ms).
When the feature toggle is “OFF”
We will query only by the session column:
Then the output will be:
And BOOM!!!
Image by Gerd Altmann from Pixabay
We can see that the query will require a full table scan on the table and the query took ~4.2 seconds!
Why?
Because we created the wrong order of the primary key.
Instead of
When you define a PRIMARY KEY on a composite key table, InnoDB uses it as a clustered index. Which will define how the order of the data is physically stored. This often saves a disk I/O operation when compared to a non-clustered index. Therefore, the order of the PK definition is super important (as well as when defining indexes).
If we define the following
and then query by B, it will then perform a full table scan which may lead to potential performance issues.
However if we define the following
and query by B (or both by A and B together) we will then avoid a full table scan.
Problem #2
We made the migration in such a destructive way that now we don’t actually have a rollback plan.
Once we DROP PRIMARY KEY, we can’t rollback with our feature toggle and it requires us to make another DB migration statement.
This might take time and in the meantime our users may experience performance issues.
Possible Solution
So far our code is ready. We have set a feature toggle so we can deploy our changes and support backward compatibility. The only thing that remains is the DB changes.
For that, a possible solution is to add a temporary secondary index on the session column, And after fully moving to the new feature we can remove the index.
Now we can fully test our feature with both scenarios when FT is ON/OFF.
And then remove the index and make our DB changes.
Conclusion
It's worth mentioning that it’s not only about the queries, since we also have inserts and other database operations.
We learned how our performance might vary depending on how we defined our primary key on multiple columns tables. However, what you should take from here is that you must take extra planning when doing DB changes with feature toggles.
You must do the migration in a non-destructive way so you’ll have a rollback plan.
It’s better to use the “EXPLAIN” keyword to show its statement execution plan, And see how our queries will be affected before and after the migration.
By knowing that, we can take more precautionary steps during development and understand the impact of our software when it comes to database migration and feature toggles.
P.S
At Wix we use gh-ost, an OS util, to alter tables in a non-blocking way.
Want to read more about DB changes at Wix? Read this interesting article by Stas Wishnevetsky.
This post was written by Idan Dagan
You can follow him on Twitter
For more engineering updates and insights:
Join our Telegram channel
Visit us on GitHub
Subscribe to our YouTube channel