Views and materialized views
In a database, there are views and materialized views. Materialized view stores the query result data and provides an update mechanism. Query on a materialized view replaces the direct query data table to achieve accelerated queries and simplify query logic.
View does not really store data, but only reads the data to perform operations, which can be regarded as a saved SQL query statement.
Materialized view stores the data contained in the SQL query statement, to avoid recalculation and aggregation of the data during the query. Therefore, it can speed up the query by trading space for time.
As a pre-computing optimisation method, materialized view is widely used in traditional databases such as Oracle, MSSQL server, etc. With the popularisation of big data, data warehouses and query engines have become increasingly important. Materialized view, as an accelerator for data query, enhances the user experience in data analysis.
A materialized view uses “query rewrite”, and there is no need to change the original query statement. The engine optimiser automatically selects the appropriate materialized view for query rewrite, completely transparent to the application.
A materialized view is a special table that pre-computes and stores query results. The word “materialized” is relative to a normal view. A normal view provides ease of use and flexibility over a normal table, but does not speed up data access. A materialized view is like a cache of views. Instead of building and computing a dataset at runtime, it pre-computes, stores, and optimises data access at creation time, and automatically refreshes to keep the data real time.
The biggest value of materialized views is query acceleration. There are numerous complex queries executed on large tables in a data warehouse which consume lots of resources and time. Materialized views can answer queries with predicted results, eliminating the overhead of expensive Join and Aggregation calculations, greatly improving query processing time and reducing system load. Materialized views are especially useful for queries that use the same sub-query results repeatedly.
The core content of the materialized view is data update and query rewriting.
Create materialized view
Entry: Database > ⊕ View > Create materialized view
According to SQL sample, fill in the materialized view SQL statement.
After successful creation, if you need to materialize the previous data partitions, manually refresh the defined partition according to the SQL sample.
Recommended usage: Manually define target table for materialized view.
CREATE MATERIALIZED VIEW [IF NOT EXISTS] mv_name [TO [db_name.]target_table_name] [ENGINE = engine_name] [PARTITION BY par_name] [ORDER BY col_name [POPULATE]] AS SELECT select_statement FROM base_table_name;
Other usage: The system internally defines the target table.
CREATE MATERIALIZED VIEW [IF NOT EXISTS] mv_name（ [col1 DataType1] [col2 DataType2] ... ） [ENGINE = engine_name] [PARTITION BY par_name] [ORDER BY col_name [POPULATE]] AS SELECT select_statement FROM base_table_name;
Update materialized view
From the moment the materialized view is created, the data of the materialized view is updated synchronously with the data of the original table. If the previous historical data partition needs to be materialized, we provide a refresh partition function for the running materialized view.
REFRESH MATERIALIZED VIEW xxx PARTITION xxx
- Refresh materialized view test partition '2019-01-01' (synchronises the original data source of the '2019-01-01' partition)
This function is used to update the data of a certain partition of the materialized view table, and the cascading operation is performed by default, that is, when the partition data of the view table is updated, the partition of all materialized views that depend on the current view will be updated at the same time, and it will be passed in cascading. If you do not want to cascade, you can add SETTINGS and set the cascading_refresh_materialized_view to 0, that is:
REFRESH MATERIALIZED VIEW xxx PARTITION xxx SETTINGS cascading_refresh_materialized_view = 0
When updating the view partition data, the data volume of the corresponding bottom sheet partition may be very large. The update will take up a lot of CPU and memory resources, and may also cause the update to fail. Here, you can use the parameter max_rows_to_refresh_by_partition. In ClickHouse, a partition comprises multiple data parts. Using this parameter, we can control that when the total number of rows of the partition’s stand-alone data exceeds the value defined by this parameter, the partition is updated part by part based on the part level, instead of the entire partition, which can control the resource usage. Of course, if the view is an aggregation table, updating it part by part will cause the aggregation effect of the last view data to be inferior to updating it on the entire partition, and you need to balance it yourself. The default value of this parameter is 100000000 (100 million). Usage example:
REFRESH MATERIALIZED VIEW xxx PARTITION xxx SETTINGS max_rows_to_refresh_by_partition = xxx
In addition, you can use the partitionStatus function to get the status of the corresponding partition of a materialized view table. There are three types of statuses: None (showing that the partition does not exist), Normal (showing that the partition exists and is in a normal state), and Refreshing (showing that the partition is being updated). Example:
select partitionStatus(test, test_mv, '2020-01-01')
Manage materialized views
Entry: Database > Materialized views
ByteHouse lists all materialized views in the database, along with their base table/target table row ratio, and hit time.
- Bottom table/target table row ratio: when the ratio > 10, it means that the materialized view is more efficient.
- Hit rate: When the hit rate is high, it means that the materialized view is frequently accessed within 60 days.
Query materialized view
Users can query the materialized view directly, but it is generally recommended to query the base table directly. The ByteHouse optimiser automatically rewrites the query to improve the processing time.
Delete materialized view
Users can delete materialized views through the interface or SQL .
If the user created the target table, they also need to manually drop the target table.
drop view xxxx
Materialized view SQL restrictions
- At present, only three materialized views can be created per base table.
- Unique key engine (UniqueMergeTree) does not currently support the use of materialized views.
- JOIN/SUBQUERY is not supported.
- Nested Aggregation Functions are not supported. For example, sum (c + 1) is supported, while sum (c) + 1 is not supported.
- If there is no corresponding result after the SQL calculation of the view, it cannot be created.
- All GROUP BY fields must appear in SELECT.
- All columns used in the where condition in the query statement need to be defined in the select statement, otherwise the query may not be able to successfully override the match.
- It is recommended that the fields of the created view keep the columns in the source table as much as possible. For example, "select (a + 1) / 2 from table group by a" can be rewritten as "select a from table group by a", so that you can use one view to cover all possible multiple query scenarios.
- Field aliases in view statements do not support underscores beginning with "_".
Updated 2 months ago