Rerunnable Data Update Scripts for MySQL/MariaDB

Published: {ts '2022-10-11 00:00:00'}
Author: Steven Neiland
Site Url: http://www.neiland.net/article/rerunnable-data-update-scripts-for-mysqlmariadb/

For the past year I've been working on bringing the various databases at my current job under version control. To do this we are using a product called liquibase in combination with Jenkins to auto deploy changes to our database. This removes any manual user interaction with our databases and allows us to keep code and database changes locked together.

Managing Reference Data Changes

When deploying changes to our databases it is not enough to just modify the data structures and functions. You also need to be able to version control certain table data. While we do not want liquibase modifying user data, we do however need to be able to update reference/lookup tables data ( e.g. a list of countries ) so that we can change that data inline with changing the code that may depend on said data during deployment.

This is easily achieved with liquibase as it allows us to automatically run sql scripts on the database if the contents of that script changes. All we need to do is write our sql scripts for these specific tables so that new records are inserted when they don't exist, otherwise update the existing records based on the records Primary Key.

Cannot use "REPLACE INTO"

When working with MySQL/MariaDB databases the seemingly obvious solution to making re-runnable data updates is the REPLACE statement to insert new data or replace existing data. For example:

REPLACE INTO countries ( id , name , iso ) VALUES ( 1 , 'Afghanistan', 'AF' ) ,( 2 , 'Albania', 'AL' ) ,( 3 , 'Algeria', 'DZ' ) ,( 4 , 'Andorra', 'AD' ) ,( 5 , 'Angola', 'AO' )

There is a problem with this approach however. Lets look at how this statement actually operates.

  1. Insert a new row into the table, if a duplicate key error occurs.
  2. If the insertion fails due to a duplicate-key error occurs:
    1. Delete the conflicting row that causes the duplicate key error from the table.
    2. Insert the new row into the table again.

The problem with this approach happens on step 2a. You will note that REPLACE does not UPDATE the record in question, it delete it. When this statement attempts the DELETE operation it will fail if there are any Foreign Key dependencies on the record being deleted.

Solution: "INSERT IGNORE"

The solution fortunately is relatively simple. The INSERT statement has a modifier IGNORE which allows us to add new records to our table ignoring any existing ones. When we pair this with the ON DUPLICATE KEY UPDATE modifier we achieve our goal of updating any existing records.

INSERT IGNORE INTO countries ( id , name , iso ) VALUES ( 1 , 'Afghanistan', 'AF' ) ,( 2 , 'Albania', 'AL' ) ,( 3 , 'Algeria', 'DZ' ) ,( 4 , 'Andorra', 'AD' ) ,( 5 , 'Angola', 'AO' ) ON DUPLICATE KEY UPDATE name = values(name) , iso = values(iso);

Using this pattern we can easily manage our reference tables data via version controlled sql scripts.