Normal table configurations

🚧

Most parameter fields can't be modified after the table is created. Please read this guide carefully before creating tables.

Table parameters configuration

1. Sort key (required) - Order By

  • Definition and principle:In order to improve query performance, the data is stored in a sorted index order, also called the sorted key/primary key. Sorted index is not the same as MySQL primary key, and the value can repeat. Also, it is not the same as MySQL index - it can’t be changed after the table is created.
  • Field restriction: Does not support Nullable; supports setting 1-5 fields.
  • Configuration recommendations : Select 1-3 fields that are often used as filters for sorted indexes to improve the performance of related queries. The search speed is the fastest with the first sorted index as the condition, and lower with the next two indexes.
  • After the table is created successfully, sorted index does not allow modification of or adding/subtracting columns. If you want to modify, you will need to delete the table and retrace the data.

2. Partition key/ Granularity (required) - Partition By

  • Definition and principle: Partition is a logical data set divided by specific rules in a table. It can be partitioned by day, month, or year. In order to reduce the data that needs to be operated on, each partition is stored in sub-files. When accessing data, ByteHouse tries to use the smallest subset of these partitions to improve performance.
  • Field restriction: Does not support Nullable; supports setting 1-5 fields.
  • Partitioning granularity: When the column type is Date, day and month level aggregation partitioning is supported; when the column type is Datetime, minute, hour, day, and month level aggregation partitioning is supported.
  • Configuration recommendations:
    • Ensure that the type/content of this field is time. Any other type/content will create too many partitioned files, which will adversely affect the import and query speed.
    • Configure partition fields as frequently filtered date fields, such as order time.
    • For single-day data volume over 10 billion, you can consider partitioning by hour, else we recommend choosing to partition by day by default.
  • After the table is successfully created, the field does not allow modification of or adding/subtracting columns.

3. Bucketing (optional) - Cluster By

  • Definition and principle: When partitioning can’t achieve uniform distribution of data, the bucketing field can be utilised. The bucketing field ensures that a column of data is evenly distributed under each node of the cluster. This can maximise the cluster performance for a query. A reasonable setting of the partition field also helps to solve for skewed data and ensures that the data is more evenly distributed.
  • Field restriction : Does not support Nullable.
  • Configuration recommendations: Select the fields that appear frequently in Group by.
  • After the table is successfully created, the field does not allow modification of the column type.

4. Data retention time TTL (optional)

  • Definition and principle: Set data life cycle, day-level unit. Data retention time is calculated based on "partition field".
  • Field restriction: ≥ 0. If field restriction is "0", it means that the TTL of the table is not set, that is, the data is permanently retained.
  • The data retention time can be modified after the table is created, and once the time limit of the data exceeds the retention time, it will be automatically deleted.

5. Sample Key (optional) - Sample Key

  • Definition and principle: Used in sample queries. A sample query executes on a specific part of the (sample) data. This is ideal for queries that sacrifice accuracy for performance, or require approximate results.
  • Field restrictions:
    • Sample fields only support Int, Float, String types.
    • When the sample field is not specified, the system sets it as the first sorted index field by default. The sample field must be one of the sort keys.
  • Configuration recommendations: If the full table has sampling query requirements, and the data is evenly distributed (such as in time class field), it can be set as a sampling field.
  • After the table is successfully created, it can't be modified.

6. Unique Key (optional) - Unique Key

  • Definition and principle: The unique key is developed by the ByteHouse team for efficient query performance and to support primary key update. It solves the user pain point of supporting efficient updates, and helps businesses develop real-time analysis applications easily. Upsert can update write semantics by specifying the Unique Key, and the query automatically returns the latest value of each unique key.
  • Field restrictions: Unique key can only accept up to 10 values and can't contain empty columns. Two granularities are currently supported, namely, partition-level and table-level unique keys.
  • For more best practices and query examples of unique keys, see Unique table.