Data Types & Variables
ClickHouse supports numerous data types including Integers, Floats, Decimals, Bools, Strings, UUIDs, DateTime, Array, Tuples, AggregateFunctions, nested structures and more.
As ClickHouse is a strongly typed datastore, Vulkn attempts to map existing Python datatypes to ClickHouse datatypes in a similar manner to an ORM. Vulkn also allows for creating additional datatypes the make it easier to work with ClickHouse from the Python eco-system.
ClickHouse reference
Variables
Variables are declared and assigned using normal Python notation:
x = Int8(34) s = String('Hello World!')
Once declared you can use most variables as you would Python types and then use them within queries:
s = String('Hello world ') s = s.strip() l = s.len() v.select(s.alias('stripped string'), l.alias('length')).s row stripped string length ----- ----------------- -------- 1 Hello world 13 (1 row)
Typed variables also allow datatypes to propagate through their respective method calls
v.select(s.typename().alias('stripped string'), l.typename().alias('length')).s row stripped string length ----- ----------------- -------- 1 String UInt64 (1 row)
You can view the generated SQL at any time either by printing the individual variable using the print
statement or using the .show_sql()
method of a DataTable:
v.select(s.typename().alias('stripped string'), l.typename().alias('length')).show_sql() """SELECT toTypeName(replaceRegexpAll('Hello world ', '^s*|s*$', '')) AS `stripped string`, toTypeName(length(replaceRegexpAll('Hello world ', '^s*|s*$', ''))) AS length;"""
Existing variables and columns can also be declared using <Type>(name='')
(or <Type>(n='')
).
This allows use of the object/datatype type system by name rather than by value.
For example:
database = String(n='database') tablename = String(n='name') v.table('system.tables').select(database, tablename).limit(3).s row database name ----- ----------------- -------------- 1 cached test 2 cached test1 3 database_for_dict table_for_dict (3 rows)
Available Types
Family | ClickHouse Data Type | Range | Vulkn Data Type | Vulkn Shortcut | Examples |
---|---|---|---|---|---|
Integer | Int8 | [-128, 127] | Int8 | vI8 | x = Int8(34) v.select(c(34).cast(Int8)).r x = vI8(34) |
Integer | Int16 | [-32768, 32767] | Int16 | vI16 | x = Int16(3445) v.select(c(3445).cast(Int16)).r |
Integer | Int32 | [-2147483648, 2147483647] | Int32 | vI32 | x = Int32(545678) v.select(c(545678).cast(Int32)).r |
Integer | Int64 | [-9223372036854775808, 9223372036854775807] | Int64 | vI64 | x = vI64(3372036854775) v.select(c(3372036854775).cast(Int64)).r |
Integer | UInt8 | [0, 255] | UInt8 | vU8 | x = UInt8(34) v.select(c(34).cast(UInt8)).r |
Integer | UInt16 | [0, 65535] | UInt16 | vU16 | x = UInt16(3445) v.select(c(3445).cast(UInt16)).r |
Integer | UInt32 | [0, 4294967295] | UInt32 | vU32 | x = UInt32(545678) v.select(c(545678).cast(UInt32)).r |
Integer | UInt64 | [0, 18446744073709551615] | UInt64 | vU64 | x = UInt64(3372036854775) v.select(c(3372036854775).cast(UInt64)).r |
Float | Float, Float32 | [3.40282e+38, 3.40282e+38] | Float32 | vF32 | x = Float32(567.34) v.select(c(567.34).cast(Float32)).r |
Float | Float64 | [-1.79769e+308, 1.79769e+308] | Float64 | vF64 | x = Float64(567.34) v.select(c(567.34).cast(Float64)).r |
String | String | - | String | vS | x = String('hello world') v.select(c('hello world').cast(String)).r |
String | - | - | URL | vU | x = URL('https://www.yandex.ru') |
DateTime | Date | ['0000-00-00', '2106-02-07'] | Date | vD | x = Date('2019-01-01') |
DateTime | DateTime | ['0000-00-00 00:00:00', '2106-02-07 17:28:15'] | DateTime | vDT, vDT32 | x = DateTime('2106-02-07 17:28:15') |
Array | Array | - | Array | vA | x = Array([1,2,3]) |
Variables and DataTables
Most methods within a DataTable allow for any combination of typed variables, column names (strings), column expressions (strings) or column literals.
Column Names
Column names are simple string names.
v.table('system.tables').select('database', 'name').limit(3).s row database name ----- ----------------- -------------- 1 cached iot_car_inbound 2 cached iot_car_outbound 3 dict_store number_plate_lookup (3 rows)
Column Expressions
Column expressions are simple strings that allow you to specify the entire column formula/function as you would within a standard SQL statement. This can be applied to any method.
v.table('system.tables').select('substring(database, 3)', 'upper(name)').where("database = 'system'").limit(3).s row substring(database, 3) upper(name) ----- ------------------------ ------------------------------ 1 stem AGGREGATE_FUNCTION_COMBINATORS 2 stem ASYNCHRONOUS_METRICS 3 stem BUILD_OPTIONS (3 rows)
Column Literals
Column literals enable users to mix and match a literal SQL string with the type system. Use the
functions c
or col
to declare a column literal.
ss = 'substring(database, 3)' l = c('upper(name)').cast(String).length().alias('ucase_name') v.table('system.tables').select(ss, l).where("database = 'system'").limit(3).s row substring(database, 3) ucase_name ----- ------------------------ ------------ 1 stem 30 2 stem 20 3 stem 13 (3 rows)
Mixing Python variables
Most Python variables, except dictionaries, can be mixed with most Vulkn variables and will be converted to the correct SQL type or a literal constant at query time.
Arrays, tuples, strings, integers, floats and datetime types can all be use within Vulkn:
>>> v.select(list(range(0,10)), ('a','b'), ['my','array']).s row [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] tuple(\'a\', \'b\') [\'my\', \'array\'] ----- -------------------------------- --------------------- --------------------- 1 [0,1,2,3,4,5,6,7,8,9] ('a','b') ['my','array'] (1 row)
Dictionaries will be supported in the future as variants of the JSON data type and the JoinStorage table type.