How X Plugin Works Under the Hood

In this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.

X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly! 🙂

Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.

I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.

Creating our first collection

We start the MySQL shell and create our first collection:

$ mysqlsh -u root --py
Creating an X Session to root@localhost:33060
No default schema selected.
[...]
Currently in Python mode. Use sql to switch to SQL mode and execute queries.
mysql-py> db.createCollection("people")

$ mysqlsh -u root --py

Creating an X Session to root@localhost:33060

No default schema selected.

[...]

Currently in Python mode. Use sql to switch to SQL mode and execute queries.

mysql-py> db.createCollection("people")

What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:

CREATE TABLE `people` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,
  PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

CREATE TABLE `people` (

`doc` json DEFAULT NULL,

`_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL,

PRIMARY KEY (`_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.

So, the basics are clear.

It stores everything inside a JSON column.
Indexes are created on virtual columns that are generated by extracting data from that JSON. Every time we add a new index, a virtual column will be generated. That means that under the hood, an alter table will run adding the column and the corresponding index.

Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:

mysql-py> db.getCollections()
[
    
]

mysql-py> db.getCollections()

[

]

This is what MySQL actually runs:

SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name;

SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name;

This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.

Inserting and reading documents

I am going to start adding data to our collection. Let’s add our first document:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Miguel Angel",
      ...     "Country": "Spain",
      ...     "Age": 33
      ...   }
      ... )

mysql-py> db.people.add(

... {

... "Name": "Miguel Angel",

... "Country": "Spain",

... "Age": 33

... }

... )

In the background, MySQL inserts a JSON object and auto-assign a primary key value.

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7'));

1	INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7'));

Ok, this is supposed to be schemaless. So let’s add someone else using different fields:

mysql-py> db.people.add(
      ...  {
      ...     "Name": "Thrall",
      ...     "Race": "Orc",
      ...     "Faction": "Horde"
      ...   }
      ... )

mysql-py> db.people.add(

... {

... "Name": "Thrall",

... "Race": "Orc",

... "Faction": "Horde"

... }

... )

Same as before, MySQL just writes another JSON object (with different fields):

INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7'));

1	INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7'));

Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:

mysql-py> db.people.find()

1	mysql-py> db.people.find()

MySQL translates to a simple:

SELECT doc FROM `test`.`people`;

1	SELECT doc FROM `test`.`people`;

And this is how filters are transformed:

mysql-py> db.people.find("Name = 'Thrall'")

1	mysql-py> db.people.find("Name = 'Thrall'")

It uses a SELECT with the WHERE clause on data extracted from the JSON object.

SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

1	SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Updating documents

Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:

mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance")

1	mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance")

MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:

UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

1	UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall');

Now I want to remove my own document:

mysql-py> db.people.remove("Name = 'Miguel Angel'");

1	mysql-py> db.people.remove("Name = 'Miguel Angel'");

As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:

DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel');

1	DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel');

Summary

The magic that makes our MySQL work like a document-store NoSQL database is:

Create a simple InnoDB table with a JSON column.
Auto-generate the primary key with UUID values and represent it as a virtual column.
All searches are done by extracting data JSON_EXTRACT, and passing that info to the WHERE clause.

I would define the solution as something really clever, simple and clean. Congrats to Oracle! 🙂