ByteHouse Python driver

ByteHouse provides a Python driver that supports Python Database API Specification v2.0. The driver can be used with most client tools/applications/BI tools which accept Python driver following Python DB API 2.0. The driver uses TCP/Native protocol to connect to ByteHouse.

🚧

Requires Python v3.6 or higher

Install from PyPI

Latest release version can be installed from here:

pip install bytehouse-driver

Install from github

Current development version can be installed from here:

pip install git+https://github.com/bytehouse-cloud/[email protected]#egg=bytehouse-driver

ByteHouse account

You need to create ByteHouse account in order to use Python driver.

Supported regions

Currently, the driver supports the following region names across different cloud providers. Alternatively, if you know the host address of ByteHouse server, you can directly use host address and omit region name.

Region Name Target Server
AP-SOUTHEAST-1 gateway.aws-ap-southeast-1.bytehouse.cloud:19000
VOLCANO-CN-NORTH-1 bytehouse-cn-beijing.volces.com:19000

Connection and authentication

Region and password format

Required parameters: region account user password

'bytehouse:///?region={}&account={}&user={}&password={}'.format(REGION, ACCOUNT, USER, PASSWORD)

Host address and password format

Required parameters: host port account user password

'bytehouse://{}:{}/?account={}&user={}&password={}'.format(HOST, PORT, ACCOUNT, USER, PASSWORD)

For API Key authentication, user is always 'bytehouse'

Region and API key format

Required parameters: region password

'bytehouse:///?region={}&user=bytehouse&password={}'.format(REGION, API_KEY)

Host address and API key format

Required parameters: host port password

'bytehouse://{}:{}/?user=bytehouse&password={}'.format(HOST, PORT, API_KEY)

Virtual warehouse and role management

Connection initialization with ByteHouse always assumes default virtual warehouse and active role, so these values cannot be empty. Before using the driver, users need to set/ensure these values through the console.

465

Constructing client object

Passing parameters

from bytehouse_driver import Client

client = Client(
    region=REGION,
    account=ACCOUNT,
    user=USER,
    password=PASSWORD
)

From URL

from bytehouse_driver import Client

client = Client.from_url('bytehouse:///?region={}&account={}&user={}&password{}'.format(
     REGION, ACCOUNT, USER, PASSWORD)
)

Performing SQL queries

from bytehouse_driver import Client

client = Client(
    region=REGION,
    account=ACCOUNT,
    user=USER,
    password=PASSWORD
)
### DDL Query
client.execute("CREATE DATABASE demo_db")
client.execute("CREATE TABLE demo_db.demo_tb (id INT) ENGINE=CnchMergeTree() ORDER BY tuple()")

### DML Query
client.execute("INSERT INTO demo_db.demo_tb VALUES", [[1], [2], [3]])

### DQL Query
result_set = client.execute("SELECT * FROM demo_db.demo_tb")
for result in result_set:
    print(result)

client.execute("DROP DATABASE demo_db")

Supported data types

ByteHouse typePython type for INSERTPython type for SELECT
Integar family (UInt8/UInt16/UInt32/UInt64 / Int8/Int16/Int32/Int64)int longint
Float family (Float32/Float64)float int longfloat
Stringstr bytesstr bytes
FixedStringstr bytesstr bytes
NullableNone TNone T
Datedate datetimedate
DateTimedatetime int longdatetime
Arraylist tuplelist
Enum familyEnum int long strstr
DecimalDecimal float int longDecimal
IP familyIPv4Address IPv6Address int long strIPv4Address IPv6Address
Mapdictdict
LowCardinalityTT
UUIDUUID strUUID

Setting types_check=True

Default value for 'types_check' is false for performance. If set to true, then explicit type checking and transformation would happen before passing the data onto the server. It is recommended to set it to true, for float/decimal formats where raw data needs to be transformed into appropriate type.

Integer family

Int8 Int16 Int32 Int64 UInt8 UInt16 UInt32 UInt64

client.execute("CREATE TABLE demo_db.demo_tb (a Int8, b Int16, c Int32, d Int64, e UInt8, f UInt16, g UInt32, h UInt64) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [
    (-10, -300, -123581321, -123581321345589144, 10, 300, 123581321, 123581321345589144)
]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Float family

Float32 Float64

client.execute("CREATE TABLE demo_db.demo_tb (a Float32, b Float64) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [
    (3.4028235e38, 3.4028235e38),
    (3.4028235e39, 3.4028235e39),
    (-3.4028235e39, 3.4028235e39),
    (1, 2)
]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data, types_check=True)

String

client.execute("CREATE TABLE demo_db.demo_tb (a String) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [('axdfgrt', )]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

FixedString

client.execute("CREATE TABLE demo_db.demo_tb (a FixedString(4)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [('a', ), ('bb', ), ('ccc', ), ('dddd', ), ('я', )]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Nullable

client.execute("CREATE TABLE demo_db.demo_tb (a Nullable(Int32)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [(3, ), (None, ), (2, )]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Date

from datetime import date, datetime

client.execute("CREATE TABLE demo_db.demo_tb (a Date) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [(date(1970, 1, 1), ), (datetime(2015, 6, 6, 12, 30, 54), )]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

DateTime

from datetime import datetime

client.execute("CREATE TABLE demo_db.demo_tb (a DateTime) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [(datetime(2015, 6, 6, 12, 30, 54), ), (1530211034,)]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Array

client.execute("CREATE TABLE demo_db.demo_tb (a Array(Int32)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [([], ), ([100, 500], )]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Enum family

Enum8 Enum16

from enum import IntEnum

class A(IntEnum):
    hello = -1
    world = 2

class B(IntEnum):
    foo = -300
    bar = 300

client.execute("CREATE TABLE demo_db.demo_tb (a Enum8('hello' = -1, 'world' = 2), b Enum16('foo' = -300, 'bar' = 300)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [(A.hello, B.bar), (A.world, B.foo), (-1, 300), (2, -300)]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Decimal

from decimal import Decimal

client.execute("CREATE TABLE demo_db.demo_tb (a Decimal(9, 5)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [(Decimal('300.42'),), (300.42,), (-300,)]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data, types_check=True)

IP family

IPv4 IPv6

from ipaddress import IPv6Address, IPv4Address

client.execute("CREATE TABLE demo_db.demo_tb (a IPv4, b IPv6) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [
    (IPv4Address("10.0.0.1"), IPv6Address('79f4:e698:45de:a59b:2765:28e3:8d3a:35ae'),),
]
client.execute("INSERT INTO demo_db.demo_tb (a, b) VALUES", data)

Map

client.execute("CREATE TABLE demo_db.demo_tb (a Map(String, UInt64)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [
    ({},),
    ({'key1': 1},),
    ({'key1': 2, 'key2': 20},),
    ({'key1': 3, 'key2': 30, 'key3': 50},)
]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

LowCardinality

client.execute("CREATE TABLE demo_db.demo_tb (a LowCardinality(UInt8)) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [(x,) for x in range(255)]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

UUID

from uuid import UUID

client.execute("CREATE TABLE demo_db.demo_tb (a UUID) ENGINE=CnchMergeTree() ORDER BY tuple()")
data = [
    (UUID('c0fcbba9-0752-44ed-a5d6-4dfb4342b89d'),),
    ('2efcead4-ff55-4db5-bdb4-6b36a308d8e0',)
]
client.execute("INSERT INTO demo_db.demo_tb VALUES", data)

Cursor support: DB API 2.0

Cursors are supported following DB API 2.0 specifications. Cursors are created by the connection.cursor() method. They are bound to the connection for the entire lifetime and all the commands are executed in the context of the database session wrapped by the connection.

from bytehouse_driver import connect

kwargs = {}
kwargs.setdefault('region', REGION)
kwargs.setdefault('account', ACCOUNT)
kwargs.setdefault('user', USER)
kwargs.setdefault('password', PASSWORD)

connection = connect(**kwargs)
cursor = connection.cursor()

cursor.execute("DROP TABLE IF EXISTS cursor_tb")
cursor.execute("CREATE TABLE cursor_tb (id INT) ENGINE=CnchMergeTree() ORDER BY tuple()")

cursor.executemany("INSERT INTO cursor_tb (id) VALUES", [{'id': 100}])

result_set = cursor.execute("SELECT * FROM cursor_tb")
for result in result_set:
    print(result)

connection.close()

User defined query-id

User can manually supply query-id for each query execution. Users are encouraged to maintain uniqueness or relevancy of the query-id string. If not set, then server will assign a randomly generated UUID as the query-id.

client = Client(
    region=self.region,
    account=self.account,
    user=self.user,
    password=self.password
)
client.execute("SELECT 1", query_id="ba2e2cea-2a11-4926-a0b8-e694ded0cf65")

Local development

Change setup.cfg file to include your connection credentials. For running tests locally, follow these steps:

python testsrequire.py && python setup.py develop
py.test -v

Reporting issues

If you have found a bug or if you have a feature request, please report them at this repository issues section.
Alternatively, you can directly create an issue with our support platform.

License

This project is distributed under the terms of the MIT license.