Materialized View

Feature definition

In a database, there are Views & Materialized Views. Materialized View stores the query result data and provides an update mechanism. Query on Materialized View replaces the direct query data table, so as to achieve the purpose of query acceleration and simplification of query logic.
View: does not really store data, but only reads the data to perform operations, which can be regarded as a saved SQL query statement.

Materialized view: stores the data contained in the SQL query statement, to avoid re-calculation and aggregation of the data during the query. Therefore, it can speed up the query by trading space for time.

As an optimization method of pre-computing, materialized view is widely used in traditional databases, such as Oracle, MSSQL Server, etc. With the popularization of big data technology, various data warehouse and query engines play an increasingly important role in business data analysis. Materialized view, as an accelerator for data query, will greatly enhance the user experience in data analysis work.

The materialized view uses the "query rewrite" mechanism, there is no need to modify the original query statement, the engine optimizer will automatically select the appropriate materialized view for query rewrite, completely transparent to the application.

Implementation principle

A materialized view is a special table that precomputes and stores query results. The word "materialized" is relative to a normal view. A normal view provides ease of use and flexibility over a normal table, but does not speed up data access. A materialized view is like a cache of views. Instead of building and computing a dataset at runtime, it precomputes, stores, and optimizes data access at creation time, and automatically refreshes to keep the data real-time.

The most important value of materialized views is query acceleration. There are a large number of complex queries executed on large tables in the data warehouse , which consume a lot of resources and time. Materialized views can answer queries with predicted results, eliminating the overhead of expensive Join and Aggregation calculations, greatly improving query processing time and reducing system load. Materialized views are especially useful for queries that use the same subquery results repeatedly.

The core content of materialized view is data update and query rewriting.

User Guide

Create materialized view

User Interface
Entry: Database > + View > Create Materialized View

1280

According to SQL sample, fill in the materialized view SQL statement.

1280

After successful creation. If you need to materialize the previous data partitions, manually refresh the defined partition according to the SQL sample.

1280

SQL
Recommended Usage - Manually define target table for materialized view.

CREATE MATERIALIZED VIEW [IF NOT EXISTS] mv_name [TO [db_name.]target_table_name]
[ENGINE = engine_name]
[PARTITION BY par_name]
[ORDER BY col_name [POPULATE]]
AS SELECT select_statement FROM base_table_name;

Other Usage - The system internally defines the target table

CREATE MATERIALIZED VIEW [IF NOT EXISTS] mv_name(
[col1 DataType1]
[col2 DataType2]
...
)
[ENGINE = engine_name]
[PARTITION BY par_name]
[ORDER BY col_name [POPULATE]]
AS SELECT select_statement FROM base_table_name;

Update Materialized View

From the moment the materialized view is created, the data of the materialized view is updated synchronously with the data of the original table. If the previous historical data partition needs to be materialized, we provide a refresh partition function for the running materialized view.

Update syntax

REFRESH MATERIALIZED VIEW xxx PARTITION xxx

Example:

  • Refresh materialized view test partition '2019-01-01' (synchronize the original data source of the '2019-01-01' partition)

Update process
This function is used to update the data of a certain partition of the materialized view table, and the cascading operation is performed by default, that is, when the partition data of the view table is updated, the partition of all materialized views that depend on the current view will be updated at the same time, and it will be passed in cascading. If you do not want to cascade, you can add SETTINGS and set the cascading_refresh_materialized_view to 0, that is:

REFRESH MATERIALIZED VIEW xxx PARTITION xxx SETTINGS cascading_refresh_materialized_view = 0

When updating the view partition data, the data volume of the corresponding bottom sheet partition may be very large. The update will take up a lot of CPU and memory, and may also cause the update to fail. In this case, you can use the parameter max_rows_to_refresh_by_partition. In Clickhouse , a partition consists of multiple data parts. Using this parameter, we can control when the total number of rows of the partition's stand-alone data exceeds the value defined by this parameter, update the partition based on part of the part level, instead of updating the entire partition, which can control the resource usage. Of course, if the view is an aggregation table, updating it part by part will cause the aggregation effect of the last view data to be inferior to updating it on the entire partition, and you need to balance it yourself. The default value of this parameter is 100000000 (100 million), use example:

REFRESH MATERIALIZED VIEW xxx PARTITION xxx SETTINGS max_rows_to_refresh_by_partition = xxx

In addition, you can use the partitionStatus Function to obtain the status of the corresponding partition of a materialized view table. There are three types of status: None (indicating that the partition does not exist), Normal (indicating that the partition exists and is in a normal state), and Refreshing (indicating that the partition is being updated). Example:

select partitionStatus(test, test_mv, '2020-01-01')

Manage Materialized Views

Entry: Database > Materialized View

1280

Bytehouse lists all materialized views in the database, along with their base table/target table row ratio, and hit time.

  • Bottom table/target table row ratio: when the ratio > 10, it means that the materialized view is more efficient
  • Hit rate: When the hit rate is high, it means that the materialized view is frequently accessed within 60 days

Query materialized view

Users can query the materialized view directly, but it is generally recommended to query the base table directly. The Bytehouse optimizer automatically rewrites the query to greatly improve the query processing time.

Delete materialized view

Users can delete materialized views through the interface or SQL .
UI
If the user created the target table, you also need to manually drop the target table.

1280

SQL

drop view xxxx

Materialized View SQL Restrictions

  • At present, only three materialized views can be created per base table, and the creation of more than this number will be prohibited
  • Unique key engine (UniqueMergeTree) does not currently support the use of materialized views
  • JOIN/SUB QUERY is not supported.
  • Nested Aggregation Functions are not supported. For example, sum (c + 1) is supported, while sum (c) + 1 is not supported.
  • If the view SQL is calculated without any corresponding results, it cannot be created at this time
  • All GROUP BY fields must appear in the SELECT.
  • All columns used in the where condition in the query statement need to be defined in the select statement, otherwise the query may not be able to successfully override the match
  • It is recommended for the creation of materialized view to as far as possible to keep the columns in the source table. For example, "select (a + 1)/2 from table group by a" can be rewritten to "select a from table group by a", so that you can use a view to cover as many query scenarios as possible when querying.
  • Field aliases in view statements do not support underscores beginning with "_"