DataTables

DataTables are specialised container objects that can refer to either tables or queries including queries comprised of other DataTables or tables in the form of subselects and joins. Vulkn provides a number of DataTable entrypoints. All DataTable variants are compatible and can be used in joins, for creating cache tables or creating new tables, relations and vectors.

By default DataTables are executed 'lazily'. This means that they are only executed when a result is requested. Parameters and caching are also supported to enable re-use and acceleration.

The following provides an overview of the available entrypoints. Please see the Reference section for more detail.

Entrypoints

q

A simple SQL string interface. Mostly identical in behavior to issuing queries via the clickhouse-client CLI

>>> v.q('select count() from system.tables').s 

  row    count()
-----  ---------
    1         40

(1 row)

table, select

An ORM interface similar to those found within other BigData frameworks.

>>> pprint.pprint(v.table('system.tables').all().limit(1).r)
[{'create_table_query': nan,
  'data_paths': '[]',
  'database': 'system',
  'dependencies_database': '[]',
  'dependencies_table': '[]',
  'engine': 'SystemAggregateFunctionCombinators',
  'engine_full': nan,
  'is_temporary': 0,
  'metadata_modification_time': '0000-00-00 00:00:00',
  'metadata_path': '/tmp/ironman-5cb7f7dc-d45b-443d-bd9c-2a25aa0b077a/clickhouse/metadata/systemaggregate_function_combinators.sql',
  'name': 'aggregate_function_combinators',
  'partition_key': nan,
  'primary_key': nan,
  'sampling_key': nan,
  'sorting_key': nan,
  'storage_policy': nan}]

one

Table function interface to system.one.

>>> v.one()
<vulkn.datatable.SelectQueryDataTable object at 0x7f5ee84b2da0>
>>> v.one().s

  row    dummy
-----  -------
    1        0

(1 row)

numbers

Generates a sequence of numbers via the system.numbers, system.numbers_mt tables and the table functions numbers(), numbers_mt().

>>> v.numbers(100).exec().to_list()
[(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)]

range

Variant on the numbers DataTable that allows negative start/end positions.

>>> v.range(-3,3).s

  row    range
-----  -------
    1       -3
    2       -2
    3       -1
    4        0
    5        1
    6        2
    7        3

(7 rows)

>>> v.range(3,9).select(funcs.agg.sum(c('range'))).s

  row    sum(range)
-----  ------------
    1            42

(1 row)

random, randfloat

Generates random 64 bit integers within the prescribed range. The randfloat variant generates a Float64 type.

>>> v.random(5, start=0, end=5).s

  row    number
-----  --------
    1         3
    2         5
    3         2
    4         2
    5         4

(5 rows)

Others

update

Many BigData or timeseries systems require rewriting entire datasets or partitions. ClickHouse permits updates and deletes for non-key columns.

Todo

Documentation is incomplete.

delete

Many BigData or timeseries systems require rewriting entire datasets or partitions. ClickHouse permits updates and deletes for non-key columns.

Todo

Documentation is incomplete.

data

Data is a special dunder dictionary that automatically maps databases and tables under a dictionary-like interface. Each database.table is mapped to a key/value pair with the values/object represented by a BaseTableDataTable object.

Examples

  • Listing all tables by key

    >>> pprint.pprint(list(v.data.keys())[:5])
    ['system.aggregate_function_combinators',
     'system.asynchronous_metrics',
     'system.build_options',
     'system.clusters',
     'system.collations']
    

  • Values as BaseTableDataTables

    >>> pprint.pprint(list(v.data.values())[:5])
    [<vulkn.datatable.BaseTableDataTable object at 0x7f5ee84b2630>,
     <vulkn.datatable.BaseTableDataTable object at 0x7f5ee8456198>,
     <vulkn.datatable.BaseTableDataTable object at 0x7f5eb8b4f5f8>,
     <vulkn.datatable.BaseTableDataTable object at 0x7f5ee845d978>,
     <vulkn.datatable.BaseTableDataTable object at 0x7f5ee84b85f8>]
    

  • Using the data object in queries.

>>> v.data['system.tables'].select('count()').exec().to_list()
[(44,)]
  • DataTables can also be created from nested properties of the data dictionary.
>>> v.data.system.tables.count().s

  row    count()
-----  ---------
    1         44

(1 row)