Kafka stream data import: Parsing nested JSON

When you import data from Kafka to ByteHouse, you may encounter nested JSON in the source data. For this, you can use virtual columns and parsing functions to import the data. This page explains the import process for this scenario.

A Kafka table has a virtual column named \_content (string). The content of \_content is the JSON string for each row. The parsing process extracts individual columns from the \_content string info. This extraction is based on the JSON path and uses the JSON extract function.

JSON data example

        "npc_info": {       
                "npc":  "bedok"
        "case_info": {    
                "time_info": {  
                                "offence":  "stealing",  
                                "case_no":  123       

Import requirements

Create a database and table.

CREATE TABLE `sample_data`.`sample_data_npc`
        `npc` String,
        `year` String,
        `offence` String,
        `case_no` Int64
ENGINE = CnchMergeTree
ORDER BY (case_no)
PRIMARY KEY (case_no)

The data will be imported into the target table in the following format.

ColumnTarget TypeExpressionValue
npcStringJSONExtract(_content, 'npc_info', 'npc', 'String')"bedok"
yearStringJSONExtract(_content, 'case_info', 'time_info', 'year', 'String')"2021"
offenceStringJSONExtract(_content, 'case_info', 'offence', 'String')"stealing"
case_noInt64JSONExtract(_content, 'case_info', 'case_no', 'Int64')123

Import data configuration

  1. Choose Data Loading --> New Loading Job

  1. Choose Kafka Data Stream data source, topic, consumer group, and format.

  1. Select the target database, the table and the data will be parsed automatically.

  1. Add in the expression for each row according to the table at the import requirement section and choose Create when you're done.

  1. You have successfully created a data loading job. Choose OK and wait for it to run.

  1. After the status changes to 'success', you can query the target data table.

Kafka data import and JSON parsing are now complete.

About JSONExtract function

The usage of JSONExtract function is consistent with that in the ClickHouse community.

JSONExtract(json[, indices_or_keys…], Return_type)
JSONExtractRaw(json[, indices_or_keys…])