Migration Strategy From MySQL to ScyllaDB

6 min readNov 6, 2023

I want to share a little story when i worked at my former company, some clients complained about why the application was so slow, when want to download report data and after that my colleague checked what the problem suddenly happened, it was because the database is not works or down. Hmm it because we used the wrong indexing strategy?? or because we don’t cluster database??

In other cases, actually my company also use MongoDB to split and records another data but sometime we still find it very slow and still haven’t solve the problems. This is because in the beginning start project my old company is did not implement a database cluster or separate data into new database every got new client. Which is using multiple database can improve performance and load. This is grow until we have collected data more than 30 million data. This is absolutely happens because we have several clients in the Goverment and big companies in my country and each clients has an average of more than one hundred thousand customers.

If someday i have the same problems like this i would like to preventing that. Currently i would like to learn about how to mitigate that and because this is just for learning to me maybe i would like to explore with trying move to modern database storage like ScyllaDB. Maybe all of us have heard a lot of stories about the performance, lower latency and scalable and what i heard, now many companies want to try migrating their databases to ScyllaDB. Is not relational database and document storage i must be change paradigm about it.

Strategy

There are two very popular strategies called Cold and Hot Migration or commonly called Offline and Live migration. Cold migration involves transferring data or services from one system to another when the source system is temporarily shut down. It often results in a planned downtime window during the migration process

Use Cases

Non-critical applications or systems with scheduled maintenance windows.
Migrating legacy systems to new hardware or platforms.
Data center relocations that allow for planned downtime.

source: https://university.scylladb.com/courses/scylla-operations/lessons/migrating-to-scylla/topic/migrating-to-scylla/

Hot migration involves transferring data or services from one system to another while the source system is still running and actively serving users or applications. It usually implies minimal to no downtime during the migration process.

Use Cases:

Mission-critical applications that cannot afford downtime.
Continuous service availability during data or resource migration.
Cloud computing environments where resources can be dynamically allocated and migrated while applications are active.

Planning or Preparation

Before start the migration process, should plan to backup the database and clean up the current database from stale data. Then identify the schema, data size, and the specific needs and constraints of the application. ScyllaDB uses a wide-column store data model, which is quite different from MySQL’s relational model. Required to design a schema that fits for data requirements in ScyllaDB. Consider partitioning, denormalization, and data modeling best practices.

Data Extraction and Synchronization

Use the MySQL command-line utility or any MySQL client to export data from your MySQL database tables to CSV files. Replace the placeholders in the command with your actual database and table names you can see examples and let’s say I have customer data like below

mysql> SELECT * FROM customers;
+------+------------+-----------+
| id   | first_name | last_name | 
+------+------------+-----------+
| 1001 | Sally      | Thomas    |
| 1002 | George     | Bailey    | 
| 1003 | Edward     | Walker    |
+------+------------+-----------+

$ mysql -u username -p database-name -e "SELECT * FROM customers" | sed 's/\t/","/g;s/^/"/;s/$/"/' > customers.csv

Repeat this process for each table you want to migrate, creating a separate CSV file for each and then create a keyspace and table in ScyllaDB that matches the schema of the MySQL table you are migrating

CREATE KEYSPACE IF NOT EXISTS inventory WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

CREATE TABLE IF NOT EXISTS inventory.customers (
   id UUID PRIMARY KEY, 
   firstname text,
   lastname text
);

Import Data into ScyllaDB from CSV we can use the CQL COPY utility to import data from the CSV files into ScyllaDB. However, CQL COPY has the disadvantage that it cannot be used on large data sizes.

COPY inventory.customers (id,firstname,lastname) FROM '/path/to/customers.csv' WITH HEADER = TRUE ;

If you want to using Hot Migration i’ll recommendation to use Kafka or you can check Debezium. Debezium is an open source distributed platform for change data capture. Debezium has provided docker container images to run the required services. Configure Debezium to connect to your MySQL database. You can do this by creating a Debezium connector configuration file. An example configuration might look like this :

{
  "name": "mysql-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mysql", // mysql container
    "database.port": "3306",
    "database.user": "root",
    "database.password": "debezium",
    "database.server.id": "184054",
    "topic.prefix": "dbserver1",
    "database.include.list": "inventory",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-changes.inventory"
  }
}

To communicate with the Kafka Connect service, you can use the curl command to send API requests

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d @mysql-connector.json

And after that create a connector configuration file for ScyllaDB depending on your needs. This file specifies the ScyllaDB connection details and the topic from which changes will be consumed. Both systems need to be feed at the same time while the migration is happening.

Testing and Validation

During the replication, should validate all the data that when inserting along with the migrated one. To ensure data integrity and the proper functioning of applications. It involves several key aspects and best practices to ensure a successful migration. Here’s a deeper explanation of data validation and testing

Ensure that the schema of the migrated data in ScyllaDB matches the schema of the source MySQL database. Differences in data types, column names, and primary keys can lead to data inconsistencies.
Compare the data in ScyllaDB with the data in MySQL to verify consistency. We can use SQL queries to retrieve records from both databases and compare them for discrepancies.
Validate that the number of records in ScyllaDB matches the number of records in MySQL. This helps ensure that no data has been lost during the migration.
Check for data quality issues, such as missing values, incorrect data types, or data outliers, which can affect application functionality.
Assess the performance of queries against ScyllaDB. Ensure that any indexes or materialized views are created and configured correctly to support our application’s query patterns.
Test application using ScyllaDB as the backend database. Ensure that all application features and functionalities work as expected
Conduct benchmark tests to evaluate the performance of ScyllaDB compared to MySQL. This helps to understand the system’s capacity and whether it meets the application’s performance requirements
Develop a rollback plan in case issues are discovered during testing. This plan should allow us to revert to the original MySQL database if the migration to ScyllaDB encounters unexpected problems.
If applicable, involve end-users in the testing process. Conduct user acceptance testing to ensure that the application meets their expectations and requirements.
Testing is an iterative process. Reiterate the validation and testing steps as needed, especially after making adjustments to the migration process based on initial test results.

It’s essential to thoroughly plan and execute data validation and testing to minimize the risk of data loss, ensure data consistency, and avoid application downtime. After the migration is successful, we can update application’s configuration to point to ScyllaDB as the primary database and shut down the legacy database. For the next I want to learn to scale in ScyllaDB and monitoring.

Reference:

Tomer Sandler. (2019). Migrating to Scylla [Video]. YouTube. https://youtu.be/i0F1QOSDUg8

Debezium. (2023). Debezium. Change data capture for a whole new level of applications. https://debezium.io/

COPY (n.d.). https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/cqlshCopy.html

Migration Strategy From MySQL to ScyllaDB

Strategy

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Rickhy Siswanto

No responses yet