Currently available in the AWS US environment only
User-defined functions (UDF) are routines that accept parameters, perform actions like complex calculations, and return the result of that action as a value. The return value can either be a single scalar value or a result set.
ByteHouse supports the following types of UDF:
Lambda UDF: User-defined lambda function
Python UDF: Define functions in Python and return scalar result
## 1. Lambda UDF
Creates a UDF from a lambda expression. The expression must contain function parameters, constants, operators, or other function calls.
### Syntax
For more general syntax instructions, please refer to the [Python UDF syntax section](🔗)
### Restrictions
A function can have an arbitrary number of parameters, with a few restrictions:
The name of a function must be unique, different from other user-defined functions and system functions.
Recursive lambda functions are not allowed.
All variables used by a function must be specified in its parameter list.
If any of the above rules are violated, an exception will be raised.
### Examples
As a UDF is created in a specific database, you need to create and specify a database to start creating and using UDF.
#### Example 1
#### Example 2
A condition function is called in a user-defined function in the following query:
## 2. Python UDF
### Introduction
A Python UDF is a python script that can be called by ByteHouse. ByteHouse now supports Python written UDF. A scalar Python UDF incorporates a Python program that runs when the function is called and returns a single value.
When a query calls a Python UDF, the following steps occur at runtime:
The function converts the input arguments to Python Numpy [data types](🔗).
The function passes the converted input arguments then runs the Python program.
The Python code returns a single value. The data type of the return value must correspond to the RETURNS data type specified by the function definition.
The function converts Python's return value to the defined data type, then returns that value to the query.
### System Architect

ByteHouse Cloud UDF capabilities ensure isolation while ensuring high performance and security:
Each server and worker container will have its own UDF sidecar container
High performance: Extended Python to access ClickHouse's raw vectorized memory data format, batch processing and zero network IO cost
Security: Minimized external access for UDF container - Only shared volume access is supported and network access can be disabled - Process isolation inside the UDF container
### Data Types
SQL Type | Python Type | Notes |
UInt8 | NPY_UINT8 | |
UInt16 | NPY_UINT16 | |
UInt32 | NPY_UINT32 | |
UInt64 | NPY_UIN64 | |
Int8 | NPY_INT8 | |
Int16 | NPY_INT16 | |
Int32 | NPY_INT32 | |
Int64 | NPY_INT64 | |
Float32 | NPY_FLOAT32 | |
Float64 | NPY_FLOAT64 | |
Decimal32 | Cast to Float | |
Decimal64 | Cast to Float | |
UUID | NPY_STRING | Fixed length of 16 bytes |
Date | NPY_UINT16 | https://numpy.org/doc/stable/reference/arrays.datetime.html |
DateTime | NPY_UINT32 | |
String | NPY_STRING | |
FixedString | NPY_STRING | |
Nullable | masked array | https://numpy.org/doc/stable/reference/maskedarray.html |
Array | ndarray | Only support input as Array. https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html |
**Date Type:** Stored in two bytes as the number of days since 1970-01-01 (unsigned). Supported range of values: [1970-01-01, 2148-12-31].
Allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined by a constant at the compilation stage (currently, this is until the year 2149, but the final fully-supported year is 2148).
**DateTime Type:** Stored in four bytes as the number of seconds since 1970-01-01 00:00:00 (unsigned). Supported range of values: [1970-01-01 00:00:00, 2105-12-31 23:59:59].
### Runtime Manual
ByteHouse provides standard [libraries](🔗), and we can also help users to pre-install customized dependencies if needed.
### Syntax
#### 1. CREATE Function
`
Flags
` The following flags are stored in BitMap. These flags are defined to optimize queries as well as to define certain functionality. When creating a UDF, it is optional to specify the flags.
`
$custom_tag$
` `$custom_tag$
` is a dollar-quoted string. In between the $$ symbols, you can use any custom tag or leave it empty, just make sure to close it again with the same tag. (Refer to [PostgreSQL](🔗) 4.1.2.4. Dollar-quoted String Constants).For example, here are two different ways to specify the string "Dianne's horse" using dollar quoting:
#### 2. DROP Function
`
IF EXISTS
` is optional, it prevents errors if the function doesn't exist. In a normal case, if the function doesn't exist and you try to drop the function, you will face an error.`
db_name
` is optional if you use `USE db_name
` to check the database initially.
#### 3. SHOW Function
To see all UDF functions and specific UDF of the currently used DB function list
#### 4. SHOW CREATE Function
To see the definition of the specific UDF
#### 5. SELECT Function
UDF works like any other function in ByteHouse, but it requires you to specify a database name to access it. For example, to use a UDF which calculates the area of a circle:
### Examples
#### Example 1:
This UDF example will add the value of each row in this table and output the result.
**Create the Database and Table for UDF** Before creating a Python UDF, you must specify the database name and table.
**Create the UDF** Next, create the UDF function by using `
CREATE FUNCTION
`. This is an example of how to write the entry function in Python and uses `[code]
` as the delimiter.
Alternatively, you can use the fully column-based method provided by Python Numpy as shown below. It does the same operation as `test.f_plus_int
`.
Note: The performance of using the column-based method is better than using loops.
**Call the UDF** After the Python UDF function is created, it can be called just like a general built-in function.
**Drop the UDF**
#### Example 2:
**Create the Database and Table for UDF**
**Create the UDF** For return types like `
FixedString
`, you must specify a length for the returned string.
**Call the UDF**
**Drop the UDF**
#### Example 3:
**Create the Database and Table you need for UDF**
**Create the UDF** In the entry function, this example just uses a simple for loop to do the calculation.
**Call the UDF**
**Drop the UDF**
### Overload Function
In addition, ByteHouse provides `overload.py
` which allows you to create multiple UDFs in a single create function statement. Each function can take in different parameters as input as shown below:
`
@overload
` is the decorator that wraps the function and returns a callable object of type Function. If you use this entry function, you must use the decorator `@overload
` in the function. You need to call from entry to avoid errors of having mismatched parameter numbers.