ByteHouse allows you to load data from external sources into ByteHouse tables. Data loading logic is represented by the concept called
Loading Jobs. You can create a loading job in the web console, and trigger it from the web or via API. Each loading job requires a set of common concepts such as source location, loading mode, source format, sink table, refresh mode and schema mappings. This would help you to customize the data loading jobs to best fit your scenarios.
We are currently supporting the following source locations, and continuously expanding the ecosystem:
- Hive (1.0+)
- Apache Kafka
- Confluent Cloud
- Local file system
Batch loading is applicable when you have chunks of data ready in the source location and wish to load them into ByteHouse in one shot.
Depending on whether the sink table is partitioned, different loading modes are provided:
Full loading would replace the entire table with the latest batch source.
Incremental loading would append to the existing sink table with the new batches according to its partition. ByteHouse would replace existing partitions instead of merging.
The following file formats are supported by ByteHouse in Bulk loading:
- JSON (multiline)
- Excel (xls)
ByteHouse is able to connect to your Kafka source and continuously stream data into your tables. Unlike the batch loading, Kafka job once started would be continuously running. ByteHouse Kafka loading provides exactly-once semantics. You can stop/resume the job and ByteHouse would keep track of consumed offsets.
The following message formats are supported by Bytehouse in Streamming loading:
Updated over 1 year ago