MySQL and Materialized Views
May 13, 2008 – 1:01 amI was poking around the MySQL Worklog again over the weekend, and found a request for materialized views for MySQL. This feature has existed in Oracle for a while, in DB2 as a materialized query table, and appeared in MS SQL Server 2000 and 2005 as indexed views.
What is a materialized view?
A materialized view is a database object that contains the results of a query. The FROM clause of the query can name tables, views, and other materialized views. (from Oracle).
Essentially a materialized view lets you tell the database to periodically refresh a table with the results of a query. You may join, group, and perform calculations. The goal is to increase query performance in a read-heavy environment.
Additionally, at least in the other DBs listed, you can put indexes on the fields within the view.
Disadvantages:
- There can be issues with altering underlying tables, just as if you were to remove a column that a standard view references.
- If you’re inserting and updating into the base tables frequently, you will either see a performance hit or have to deal with stale data.
- Since the data is actually stored as a table on disk, it can take up considerable space
According to the high level architecture
Support creation of materialized views, with only the bare minimum — no automatic refresh, no query rewrite.
Which would be a shame, because it seems that adding in automatic refresh would be a pretty small part of a very complex feature. I haven’t dealt with any of the MySQL codebase so this is just my speculation.
Of course, when looking at any feature, it’s important to determine how useful it actually would be to implement. What is the target audience, and are they the existing customers? What’s the goal of adding this feature?
For ther reading, Database journal has a very good overview of indexed views in SQL Server.



10 Responses to “MySQL and Materialized Views”
Like I imagine many others, this sort of feature would be incredibly useful for our implementation work. In one recent case, we had a client managing property details across our state and they wanted to provide in-progress visibility of assessment of applications across the state which they were responsible for.
Because of strong normalisation associated with tracking real live jobs, with status headings, logic relevant date fields as well as status designations, the use of a view – the most logical way to group together the various tables data.
Of course, with millions of records, this wasn’t really practical.
As a result, we’re concocting the query to select and insert all the relevant information daily and inserting it into a separate table. The processing time for the query isn’t too terrible – it has been heavily optimized and indexed – and runs for a few minutes, after which we have to do some table shuffling to ensure the live site doesn’t take a hit.
Anyway, I’d like to add my voice to the benefit of this feature… if your query can’t be run from a view, you’re having to do something like the above which requires disk space anyway…
Geoff
By Geoff McQueen on May 13, 2008
Views are the most direct analog to getting a consistent look at a set of tables from one angle… well, a view.
It is possible to implement something like this using triggers on the tables that comprise your table and creating a table that gets built on changes rather than periodically. However, depending on what process is involved in the view, this could be very inefficient.
Like Geoff, I had a problem with many records not working well with a view; after all, queries on a view are not optimized in the same way as a normal query can be; the entire results of the view must be built before the query can be executed.
I’d welcome this feature.
By Eric on May 13, 2008
Materialized views are a really bad solution to the problem at hand. In relational algebra views are ment as a shorthand for a table expressions. Similar to normal calculus one can create a new function bij composing other functions. They are a tool to manage complexity, an abstraction as you will.
Due to the closure property of relation algebra one can sove this complex function to its basetables, and thus use the indexes on the base tables. Mysql’s ‘merge’ algorithm however, can only handle the simplest of views, rendering mysql views useless in a normalised database.
MYSQL view peformance is very poor due to the fact that they are are almost always fully materialized per query. The correct solution would be to improve the query optimizer such that it can also handle complex views. Materialized views are imho just a ‘quick fix’ for the problem at hand.
Advanced db engines such as Postgresql and SQLServer can optimize complex views. And they are a very powerfull tool to manage complexity in a normalized schema’s.
SQLServer has also option called ‘Indexed Views’. This entails placing a clustered index on a view. Since the clustered index is maped to the underlaying tables it will be automatically updated after an modification of one of the basetables. This ensures that the ‘cached’ data is always correct.
By Joost Moesker on May 13, 2008
Some of the comments here from may reflect a misunderstanding, caused by the unfortunate naming of the feature being discussed.
A “materialized view” is not really a VIEW at all. It is more like a stored query result. What happens is that you use a CREATE MATERIALIZED VIEW command to evaluate a query, and store the result. That stored result is maintained as the tables referenced in the query are updated.
Subsequent queries OF THE TABLES (not of the “view”) that can be resolved using the stored query result reference that result, rather than being re-evaluated. Thus, if a stored query (aka, “materialized view”) did an (expensive) multi-table join and GROUP BY, any query result that can be derived from the stored result does not need to re-execute the join or GROUP BY.
For example, suppose the “materialized view” were something like this …
CREATE MATERIALIZED VIEW mv AS
SELECT * FROM A,B,C
JOIN ON
Then, the query
SELECT MAX(A.somecolumn) FROM A,B
JOIN ON
can be resolved without re-joining A and B by referencing the stored query result. Note that the query does NOT reference the “materialized view”. What happens is a “query-rewrite”, not optimization like a view, where the query references the view and is then re-written to go against the base table(s). Indeed, it’s the opposite: the query references (a subset of) the base table(s) and it is re-written to go against the “materialized view” (or stored query result).
Hope this helps.
By Ken Jacobs on May 13, 2008
yup, mviews are pretty useful indeed. In fact, I’ve create a set of stored procedures that automate the creation and incremental maintenance of materialized views for mysql.
I’m currently in the process of getting them licensed under LGPL by my former company so I can maintain them for public release.
You can read about the stored procedures on my blog, with a pretty detailed description of how they work. Once I get the release (it /should/ be next week) I am going to want alpha and beta testers, so if you are interested let me know.
http://swanhart.livejournal.com/117389.html
By Justin Swanhart on May 13, 2008
This would be indeed a great addition to the MySQL engines.
Here is another approach to provide flexible automatic materialized views for MySQL:
http://www.shinguz.ch/MySQL/mysql_mv.html
By René Leonhardt on May 13, 2008
My concern with automatic refresh is if multiple transactions have to update the same materialized view row then contention can occur in the database. Effectively, the database will serialize on this one materialized view a good amount of the time. The likelyhood of this occuring depends on the size of the summarization grain.
Compare this to a database without a materialized view and the various transactions are updating separate rows without impacting each other.
In the high volume environments I work in, automatic refresh won’t work and a periodic materialized view update is the only way that will work.
By Brooks on May 13, 2008
Brooks,
Immediate maintenance of materialized views is expensive, but by using a DML log, asynchronous maintenance techniques can be employed. Batched asynchronous updates perform much better than an immediate maintenance strategy.
By Justin Swanhart on May 14, 2008
“I haven’t dealt with any of the MySQL codebase so this is just my speculation.”
Considering it’s not open source, no shit. That was a dumb statement!
By Douche on May 21, 2008
Thanks Douche troll – great contribution.
By Geoff McQueen on May 21, 2008