Statistics information and execution plan


ByteHouse’s self-developed query optimiser uses statistics information to improve query performance. Currently, users can manually collect statistics information for data tables or change query designs to improve query performance. We will support automatic collection and update capabilities in the future. The supported syntax is:

  1. Create statistics information
CREATE STATS [IF NOT EXISTS] (<db>.* | ALL | [<db>.]<table>['(' <col1>, <col2>, ..., <coln> ')']) 
        [WITH (FULLSCAN | SAMPLE [<rows> ROWS] [<ratio> RATIO])]
  1. Show statistics information
SHOW (STATS|COLUMN_STATS) (<db>.* | ALL | [<db>.]<table>'('<col1>, <col2>, ..., <coln> ')' )
  1. Delete statistics information
DROP STATS (<db>.* | ALL | [<db>.]<table>['(' <col1>, <col2>, ..., <coln> ')' ])

Execution plan

For data engineers, troubleshooting problems using execution plans is an important task. For example, finding the cause of issues with join order, missing statistics information, reasonable estimation, effective pushdown, etc. ByteHouse can display the physical execution plan, analyse the execution with Explain + Query, and facilitate an intuitive analysis and optimisation of queries. The Query Profiler can display query details visually.

Note: Statistics information needs to be manually collected before generating an execution plan.

  1. Generate physical execution plan
EXPLAIN (QUERY Statement) ;

Note: The current EXPLAIN syntax only supports the analysis of DQL SELECT statements.

Usage example

  1. Create a database table

CREATE TABLE sampledb.t1 
  id Int64,
  name Int64
) ENGINE = CnchMergeTree()

CREATE TABLE sampledb.t2
  uid Int64,
  name Int64
) ENGINE = CnchMergeTree()
  1. Insert sample data
INSERT INTO sampledb.t1 VALUES (1,'bob'),(2,'alice'),(3,'peter'),(4,'jay'),(5,'yiyang'),(6,'aurora');
INSERT INTO sampledb.t2 VALUES (1,'smith'),(2,'parker'),(3,'taylor'),(4,'davis'),(5,'joines');
  1. Collect statistics information for the table
CREATE stats sampledb.t1;
CREATE stats sampledb.t2;

The result:

  1. Display statistics information for the table
SHOW stats sampledb.t1;
  1. Generate an execution plan for the data table
USE sampledb;
EXPLAIN SELECT * FROM t1 JOIN t2 ON ( = t2.uid);

The result will be displayed as follows:

Gather Exchange est. 6 rows
└─ Projection est. 6 rows
│ Expressions: [id, name, uid],
└─ Inner Join est. 6 rows
│ Condition: id == uid
├─ Local Exchange est. 6 rows
│ └─ ReadFromStorage 2100069045.sampledb.t1 est. 6 rows
│ Outputs: [id, name]
└─ Broadcast Exchange est. 5 rows
└─ ReadFromStorage 2100069045.sampledb.t2 est. 5 rows
Outputs: [uid], name_1:=name
  1. Delete statistics information for the table
DROP stats sampledb.t1;
DROP stats sampledb.t1;