In this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.
X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly! 🙂
Alexander already wrote a blog post exploring how the X Plugin works, with some examples. In this post, I am going to show some more query examples and how they are transformed.
I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.
Creating our first collection
We start the MySQL shell and create our first collection:
1 2 3 4 5 6 | $ mysqlsh -u root --py Creating an X Session to root@localhost:33060 No default schema selected. [...] Currently in Python mode. Use sql to switch to SQL mode and execute queries. mysql-py> db.createCollection("people") |
What is a collection in SQL terms? A table. Let’s check what MySQL does by reading the slow query log:
1 2 3 4 5 | CREATE TABLE `people` ( `doc` json DEFAULT NULL, `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED NOT NULL, PRIMARY KEY (`_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
As we correctly guessed, it creates a table with two columns. One is called “doc” and it stores a JSON document. A second column named “_id” and is created as a virtual column from data extracted from that JSON document. _id is used as a primary key, and if we don’t specify a value, MySQL will choose a random UUID every time we write a document.
So, the basics are clear.
- It stores everything inside a JSON column.
- Indexes are created on virtual columns that are generated by extracting data from that JSON. Every time we add a new index, a virtual column will be generated. That means that under the hood, an alter table will run adding the column and the corresponding index.
Let’s run a getCollections that would be similar to “SHOW TABLES” in the SQL world:
1 2 3 4 | mysql-py> db.getCollections() [ ] |
This is what MySQL actually runs:
1 | SELECT C.table_name AS name, IF(ANY_VALUE(T.table_type)='VIEW', 'VIEW', IF(COUNT(*) = COUNT(CASE WHEN (column_name = 'doc' AND data_type = 'json') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name = '_id' AND generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))') THEN 1 ELSE NULL END) + COUNT(CASE WHEN (column_name != '_id' AND generation_expression RLIKE '^(json_unquote[[.(.]])?json_extract[[.(.]]`doc`,''[[.$.]]([[...]][^[:space:][...]]+)+''[[.).]]{1,2}$') THEN 1 ELSE NULL END), 'COLLECTION', 'TABLE')) AS type FROM information_schema.columns AS C LEFT JOIN information_schema.tables AS T USING (table_name)WHERE C.table_schema = 'test' GROUP BY C.table_name ORDER BY C.table_name; |
This time, the query is a bit more complex. It runs a query on information_schema.tables joining it, with information_schema.columns searching for tables that have “doc” and “_id” columns.
Inserting and reading documents
I am going to start adding data to our collection. Let’s add our first document:
1 2 3 4 5 6 7 | mysql-py> db.people.add( ... { ... "Name": "Miguel Angel", ... "Country": "Spain", ... "Age": 33 ... } ... ) |
In the background, MySQL inserts a JSON object and auto-assign a primary key value.
1 | INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Age',33,'Country','Spain','Name','Miguel Angel','_id','a45c69cd2074e611f11f62bf9ac407d7')); |
Ok, this is supposed to be schemaless. So let’s add someone else using different fields:
1 2 3 4 5 6 7 | mysql-py> db.people.add( ... { ... "Name": "Thrall", ... "Race": "Orc", ... "Faction": "Horde" ... } ... ) |
Same as before, MySQL just writes another JSON object (with different fields):
1 | INSERT INTO `test`.`people` (doc) VALUES (JSON_OBJECT('Faction','Horde','Name','Thrall','Race','Orc','_id','7092776c2174e611f11f62bf9ac407d7')); |
Now we are going to read the data we have just inserted. First, we are going to find all documents stored in the collection:
1 | mysql-py> db.people.find() |
MySQL translates to a simple:
1 | SELECT doc FROM `test`.`people`; |
And this is how filters are transformed:
1 | mysql-py> db.people.find("Name = 'Thrall'") |
It uses a SELECT with the WHERE clause on data extracted from the JSON object.
1 | SELECT doc FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall'); |
Updating documents
Thrall decided that he doesn’t want to belong to the Horde anymore. He wants to join the Alliance. We need to update the document:
1 | mysql-py> db.people.modify("Name = 'Thrall'").set("Faction", "Alliance") |
MySQL runs an UPDATE, again using a WHERE clause on the data extracted from the JSON. Then, it updates the “Faction”:
1 | UPDATE `test`.`people` SET doc=JSON_SET(doc,'$.Faction','Alliance') WHERE (JSON_EXTRACT(doc,'$.Name') = 'Thrall'); |
Now I want to remove my own document:
1 | mysql-py> db.people.remove("Name = 'Miguel Angel'"); |
As you can already imagine, it runs a DELETE, searching for my name on the data extracted from the JSON object:
1 | DELETE FROM `test`.`people` WHERE (JSON_EXTRACT(doc,'$.Name') = 'Miguel Angel'); |
Summary
The magic that makes our MySQL work like a document-store NoSQL database is:
- Create a simple InnoDB table with a JSON column.
- Auto-generate the primary key with UUID values and represent it as a virtual column.
- All searches are done by extracting data JSON_EXTRACT, and passing that info to the WHERE clause.
I would define the solution as something really clever, simple and clean. Congrats to Oracle! 🙂
“I have enabled the slow query log to see what it is actually being executed when I run NoSQL queries.”
Why would you enable the slow query log (rather than the general query log)? Every NoSQL query is slow enough to show up in the slow log? 🙂
Hehe. To avoid all the “connect – disconnect – connect – disconnect” noise 😛
we like to set long_query_time=0; this also works better with a lot of tools
thanks!
Seems that using a UUID without including caveats would negatively impact performance rather quickly .
http://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/
Hi Miguel,
Good post, thanks for that. I want to translate it on russian language. Please let me know if you have any objection.
No objection. Thanks 🙂
Here is the translation of your article on russian language:
http://sqlinfo.ru/articles/info/29.html
You write:
mysql-py> db.getCollections()
[
]
but if previously we have created a collection “people”, then we get:
mysql-py> db.getCollections()
[
]
Happy new year.
the system removes the braces 🙂
must be “Collection:people” in braces