Breaking an Endless Cycle - Huge Migrations Done Right

Oct 23, 20226 min read

Updated: Feb 6, 2023

A few months ago my team and I made a brave decision - we decided to rewrite our entire API layer towards a new, better API which was much more suited for our future needs. I've been working at Wix Stores for about 4 years, and that decision was the biggest one I have ever faced.

Photo by Patrick Hendry on Unsplash

But, before we start, a little bit about Wix Stores.

We are responsible for all the stores(eCommerce)-related services on your Wix site - basically, all you need to enable shopping on your website, from handling your product inventory, your site stores widgets (like products gallery or cart), to your checkout process and the user's order.

Why am I telling you about it?

Though it might sound simple, “enabling shopping on your site” sometimes can get complicated. For example:

Meet David. He is a yoga instructor who has a yoga studio. He created a website to enable registrations for his yoga classes - just a simple booking service. So when Maya, a yoga student, picks a yoga class, we need to handle her selected class until payment is received from her (sounds like a “cart” maybe?). Then, when she completes the checkout process, she needs to get a confirmation email for her yoga class registration - and we will have to handle that too. Sounds familiar? Like any other order handling and product checkout.

But let’s spice things up. David also wants to start selling yoga mats on his site. Now sometimes we will have to handle two different entities in the same checkout - a booking service and a product purchase.

As obvious as it sounds, to support it we will need to create one checkout service which could handle all of the different entities - and that was exactly where I came in.

We worked on creating a new checkout service with a new API that could enable the support of all the different entities that exist out there. One of the Wix principales is “eat your own dog food” - meaning your first decision needs to be the simplest. Thus, our first “big” user of the new API will be ourselves - the Wix Stores.

My task was simple: replace the current API with a new API. Essentially, I had to migrate the entire API usage in our client's projects.

Sounds easy, no? Just replace X with Y,

Well…

Meet the checkout service

Let's first talk about the importance of the checkout service. Checkout is an integral part of every store - you basically can’t start earning a single dollar without it. What does that mean for us as developers? As checkout is the heart of every store, we cannot afford even a small downtime, not even one mistake - it will affect our users’ earrings straight away.

Now as we understand the importance of the checkout service, let's jump into understanding the migration process.

The migration process

My first intention was to jump straight away into the code, add a feature toggle in each API call, just to add support for the new API, sounds good and simple, right?

So before I rushed into it, I took another breath and thought about some important points regarding the code environment.

Here are some of the details I know about legacy checkout API usage:

We control everything in client-side services
We don’t have a mapper layer for API data from the server. Let me try to explain what is it a mapper layer in one sentence: It's a layer that is in charge of transferring data from a persistent data store to in-memory data representation and also isolating them from each other. In our case, it is the data we transfer from the server to the client.
The checkout object is huge and complex - it holds the data needed to complete a purchase

The heart of the entire project is a huge and complex object - which we will need to replace.

One more detail before we continue, the new API doesn’t have any side effects that could change and affect the data we consume in the checkout client.

The cycle of migration

I’ve always seen the migration process as a cycle, therefore I would like to introduce to you the cycle of migration:

First of all, it all starts with a new feature toggle, then:

Split the code - new and old API support
Test all new scenarios
Pass the QA checks
Open to users

But wait, something went wrong! We got new user complaints! So now we need to start the whole cycle again…

Linear migration

And now I want to tell you how I managed to avoid this hellish cycle, and even make the migration process faster, safer, and simpler.

Let me introduce to you the linear migration way. To avoid the cycle, we will have 4 phases in the migration:

Legacy phase (current production stage)
Compare and return legacy
Compare and return new
New (feature toggle is now open in production)

What do all those phrases even mean? I’ll explain it step by step and together we will achieve the goal of completing a linear migration.

First, to avoid a huge client code change (it could get up to an entire project rewrite), we will add a Frontend server (FES) - a server that will serve as an API bridge from the old API to the new API, the checkout client in turn will call FES, which will then be in charge of calling the new API.

The FES will hold a layer of mappers, that way, the client-side won’t need any code change - we will return the new object as the legacy object, that way it will be seamless to the client. Each phase will be responsible for deciding whether to call the legacy or the new API.

Another key factor in achieving the linear migration is the data - with the help of FES, we will be able to return both, a new API object and a legacy object in the same structure. This will bring us to the main part of linear migration - real data comparison.

The secret ingredient of linear migration is real data comparison, to be certain of our new API readiness for production, we will use a data comparison as our guide.

Wait, what does it mean?

The compare phase

We will send both objects for a comparison before returning one of them to the client.

As you saw above there are two compare phases. In both, we will call both APIs but will return only one result as the response. Both results will be sent to a comparable engine.

With the help of the comparable engine, we will be able to tell what’s missing/what needs to be fixed in the new API. Our goal will be to get to 100% of equality between both objects, that way we will be certain that the new API is ready to be released to our users.

There are cases where there will be differences between the API’s payloads, we could solve them with a couple of simple approaches.

When we will encounter a missing field, for example, the new API has a new field that the old API doesn`t have, we will be able to simply ignore it in the comparable engine, or just add defaults in order to support the missing field during comparison.

As we all know, nothing is stronger as a source of truth than real production data - we will recruit the power of millions of users who use our systems to make sure our migration is done well, without any chance of hurting them in any way.

By creating the compare phases we moved the “Pass the QA” and “Open to users” phases in the cycle to be part of the development process and entirely got rid of the “incident” part.

Furthermore, when we get to 100% parity, we get a level of certainty that we would have never been able to accomplish before, which means we will be able to open the feature toggle, and behind the scenes, our users will get the new API while we will be able to sleep well at night.

Let's go briefly through all the steps again:

We created the FES - a server that serves as an API bridge from the old API to the new API
We moved the checkout client to work with the FES instead of working directly with the server API
We created 4 steps to pass in order to be fully migrated safely: The legacy phase, two Compare phases, and the New phase
The FES will hold a layer of mappers. that way, the client-side won’t need any code changes
In order to progress along the steps, we will use a comparable engine, when our goal will be 100% of equality
We will open the new feature toggle to our users - the migration will be done!

Conclusion

To conclude, the Linear migration way is based on data, and lots of it.

We used the biggest challenge in a huge migration - supporting millions of users while it lasts - as our biggest leverage. Nothing is better than a huge chunk of data as our ready-to-production guide.

Our goals were to refactor as little as possible, migrate as fast as possible, and keep production safe. And with the help of linear migration we’ve done it.

This post was written by Ariel Livshits

You can follow him on Twitter

For more engineering updates and insights:

Follow us on: Twitter | Facebook | LinkedIn
Join our Telegram channel
Visit us on GitHub
Subscribe to our monthly newsletter
Subscribe to our YouTube channel
Follow our Medium publication
Listen to our podcast on Apple, Spotify or Google