Object Management & Profiling

Vulkn provides some simple functions for managing and interrogating objects. These range from simple CRUD operations to partition management and data profiling.

Data Profiling

v.profile(table: str)

Description: Profiles the table returning statistical information regarding the shape of the data. Data types are inspected using different methods or options such as max_length for character types vs max values for decimals.
Parameters
- table - the table to inspect
Returns
- dict

Example

>>> v.profile('timeseries')
{'key': {'data_type': 'String',
        'kurt_pop': 13.22782691569896,
        'kurt_samp': 13.227785078755689,
        'max': 'device-999',
        'max_length': 10,
        'mean_length': 10,
        'median_length': 10,
        'min': 'device-0',
        'min_length': 8,
        'mode': 'device-990',
        'nulls': 0,
        'skew_pop': -3.2023548144821627,
        'skew_samp': -3.2023472181783337,
        'stddev_pop': 0.34284272666371696,
        'stddev_samp': 0.34284299774981736,
        'topK': ['device-990',
                'device-984',
                'device-851',
                'device-878',
                'device-991',
                'device-849',
                'device-934',
                'device-828',
                'device-839',
                'device-362'],
        'total_records': 632351,
        'uniq_values': 1000,
        'var_pop': 0.11754113522621215,
        'var_samp': 0.11754132110608125},
'ts': {'data_type': "DateTime(\\'Australia/Melbourne\\')",
        'histogram': [('2019-01-01 00:00:00', '2019-01-01 00:01:35', 59936),
                    ('2019-01-01 00:01:35', '2019-01-01 00:03:09', 59411),
                    ('2019-01-01 00:03:09', '2019-01-01 00:04:45', 60835),
                    ('2019-01-01 00:04:45', '2019-01-01 00:06:24', 63316),
                    ('2019-01-01 00:06:24', '2019-01-01 00:08:07', 64741),
                    ('2019-01-01 00:08:07', '2019-01-01 00:09:51', 65790),
                    ('2019-01-01 00:09:51', '2019-01-01 00:11:37', 67164),
                    ('2019-01-01 00:11:37', '2019-01-01 00:13:22', 66577),
                    ('2019-01-01 00:13:22', '2019-01-01 00:15:02', 63605),
                    ('2019-01-01 00:15:02', '2019-01-01 00:16:39', 60975)],
        'kurt_pop': -251816852146683.28,
        'kurt_samp': -251816055700870.62,
        'max': '2019-01-01 00:16:39',
        'mean': '2019-01-01 00:08:20',
        'median': '2019-01-01 00:08:16',
        'min': '2019-01-01 00:00:00',
        'mode': '2019-01-01 00:16:38',
        'nulls': 0,
        'skew_pop': 16028484.98636335,
        'skew_samp': 16028446.965204256,
        'stddev_pop': 231.27362468169503,
        'stddev_samp': 231.2738075499905,
        'topK': ['2019-01-01 00:16:38',
                '2019-01-01 00:15:48',
                '2019-01-01 00:15:55',
                '2019-01-01 00:16:06',
                '2019-01-01 00:16:09',
                '2019-01-01 00:16:15',
                '2019-01-01 00:16:18',
                '2019-01-01 00:16:19',
                '2019-01-01 00:16:20',
                '2019-01-01 00:16:23'],
        'total_records': 632351,
        'uniq_values': 1000,
        'var_pop': 53487.48947340954,
        'var_samp': 53487.57405867004},
'value': {'data_type': 'UInt32',
        'histogram': [(246, 454075146.24017453, 66170),
                        (454075146.24017453, 902366140.8160808, 65836),
                        (902366140.8160808, 1310216847.6019177, 62907),
                        (1310216847.6019177, 1674819224.429403, 50024),
                        (1674819224.429403, 2067148057.3543935, 60125),
                        (2067148057.3543935, 2492222219.7494526, 61820),
                        (2492222219.7494526, 2922957520.3494115, 62958),
                        (2922957520.3494115, 3366978285.4684887, 65516),
                        (3366978285.4684887, 3825636033.0746126, 67631),
                        (3825636033.0746126, 4294934229, 69363)],
        'kurt_pop': 1.8003812394411711,
        'kurt_samp': 1.8003755451991572,
        'max': 4294934229,
        'mean': 2148598926,
        'median': 2178824636,
        'min': 246,
        'mode': 1290303696,
        'nulls': 0,
        'skew_pop': -0.0002275688367801933,
        'skew_samp': -0.0002275682969642996,
        'stddev_pop': 1239547006.8484387,
        'stddev_samp': 1239547986.9595456,
        'topK': [1290303696,
                    2903569362,
                    2615129534,
                    33938448,
                    4286069806,
                    586779699,
                    662198152,
                    2473897446,
                    2284234981,
                    1655328950],
        'total_records': 632351,
        'uniq_values': 632339,
        'var_pop': 1536476782186923500,
        'var_samp': 1536479211975462000}}

Management Operations

v.use(database: str)

Description: Sets the current table search path to the named database.
Parameters
- database: str - the new default database/search path
Example
```
v.use('mydatabase')
```

v.db()

Description: Returns a list of the databases available in the current instance
Returns
- vulkn.recordset.RecordSet

Example

>>> v.db().s

row  name
-----  -----------------
    1  billy
    2  cached
    3  database_for_dict
    4  default
    5  dicts
    6  system
    7  tmp
    8  vulkn

(8 rows)

v.tables(database: str), v.t(database: str)*

Description: Returns a list of the tables available in the specified or current database (default)
Parameters
- database - the database to inspect
Returns
- vulkn.recordset.RecordSet

Example

>>> v.t('cached').s

row  database    name    engine       partition_key  sorting_key
-----  ----------  ------  ---------  ---------------  -------------
    1  cached      test    Log                    nan  nan
    2  cached      test1   MergeTree              nan  number

(2 rows)

v.table_exists(database, table)

Description: Tests if the specified table exists
Parameters
- database - the database containing the object to test
- table - the table/object to test
Returns
- bool

Example

v.table_exists('default', 'test')
False

v.desc(table: str), v.d(table: str)

Description: Describe the schema/configuration of the specified table.
Aliases: v.d(table: str)
Parameters
- table: str - the new default database/search path
Returns
- vulkn.recordset.RecordSet

Example

>>> v.desc('system.databases').s

row  name           type      default_type    default_expression    comment    codec_expression    ttl_expression
-----  -------------  ------  --------------  --------------------  ---------  ------------------  ----------------
    1  name           String             nan                   nan        nan                 nan               nan
    2  engine         String             nan                   nan        nan                 nan               nan
    3  data_path      String             nan                   nan        nan                 nan               nan
    4  metadata_path  String             nan                   nan        nan                 nan               nan

(4 rows)

v.get_create(table: str)

Description: Get the CREATE statement for the table or object
Parameters
- table: str - the new default database/search path
Returns
- str

Example

>>> v.get_create('default.timeseries')
"CREATE TABLE default.timeseries\n(\n    `key` String, \n    `ts` DateTime('Australia/Melbourne'), \n    `value` UInt32\n)\nENGINE = MergeTree\nPARTITION BY toStartOfDay(ts)\nORDER BY (key, ts)\nSETTINGS index_granularity = 8192;

v.show_create(table: str), v.c(table: str)

Description: Show/pretty print the CREATE statement for the table or object
Aliases: v.c(table: str)
Parameters
- table: str - the new default database/search path
Returns
- str

Example

>>> v.show_create('default.timeseries')
CREATE TABLE default.timeseries
(
    `key` String, 
    `ts` DateTime('Australia/Melbourne'), 
    `value` UInt32
)
ENGINE = MergeTree
PARTITION BY toStartOfDay(ts)
ORDER BY (key, ts)
SETTINGS index_granularity = 8192;

v.get_schema(database: str=None, table: str=None)

Description: Returns a list of the table schema suitable for the reader interface.
Parameters
- database: str - the database where the table can be found
- table: str - the table to inspect
Returns
- list[ColumnType]

Example

>>> pprint.pprint(v.get_schema('default', 'timeseries'))
[ColumnType('key', 'String', default_kind='nan', default_expression='nan' compression_codec='nan'),
 ColumnType('ts', 'DateTime(\'Australia/Melbourne\')', default_kind='nan', default_expression='nan' compression_codec='nan'),
 ColumnType('value', 'UInt32', default_kind='nan', default_expression='nan' compression_codec='nan')]

v.optimize(database: str, table: str, partition=None, final=False, deduplicate=False)

Description: Executes the OPTIMIZE operation on the specified database.table [optional partition]
Parameters
- database - the database containing the table to inspect
- table - the table to optimize
- partition - the optional partition of the table to optimize
- final - whether to execute in OPTIMIZE FINAL mode
- deduplicate - whether to execute in OPTIMIZE DEDUPLICATE mode
Returns
- bool

Example

>>> v.optimize('default', 'timeseries')
True

v.parts(database=None, table=None)

Description: Returns a list of the parts for the specified database.table.
Parameters
- database - the database containing the table
- table - the table to inspect
Returns
- vulkn.recordset.RecordSet

Example

>>> v.parts('default', 'iot_nw_netflow').s
  row  partition         rows  path
-----  -----------  ---------  -----------------------------------------------------------------------------
    1  2019-10-01   354001673  /var/lib/clickhouse/data/default/iot_nw_netflow/20191001_1809_2141_5_12328/
    2  2019-10-02   344417952  /var/lib/clickhouse/data/default/iot_nw_netflow/20191002_2142_3127_5_12328/
    3  2019-10-03   349287299  /var/lib/clickhouse/data/default/iot_nw_netflow/20191003_2151_3095_5_12328/
    4  2019-10-04   353890586  /var/lib/clickhouse/data/default/iot_nw_netflow/20191004_2170_3121_5_12328/
    5  2019-10-05   351980346  /var/lib/clickhouse/data/default/iot_nw_netflow/20191005_3128_4419_5_12328/
    6  2019-10-06   337347169  /var/lib/clickhouse/data/default/iot_nw_netflow/20191006_3139_4444_5_12328/
    7  2019-10-07   360367914  /var/lib/clickhouse/data/default/iot_nw_netflow/20191007_3140_4445_6_12328/
    8  2019-10-08   350891850  /var/lib/clickhouse/data/default/iot_nw_netflow/20191008_4446_5097_5_12328/
    9  2019-10-09   353855520  /var/lib/clickhouse/data/default/iot_nw_netflow/20191009_4456_5108_5_12328/
   10  2019-10-10   349907606  /var/lib/clickhouse/data/default/iot_nw_netflow/20191010_5109_5765_5_12328/

(10 rows)

v.clone_table(from_database, from_table, database, table)

Description: Copies a table definition only from database.table to another.
Parameters
- from_database - the source database to clone from
- from_table - the source table to clone from
- database - the target database to clone to
- table - the target table to clone to
Returns
- bool

Example

>>> v.clone_table('default', 'iot_nw_netflow', 'other', 'iot_nw_netflow2')
True

v.copy_table(from_database, from_table, database, table)

Description: Copies a table, including it's data, from the source table to the targe table
Parameters
- from_database - the source database to copy from
- from_table - the source table to copy from
- database - the target database to copy to
- table - the target table to copy to
Returns
- bool

Example

>>> v.copy_table('default', 'iot_nw_netflow', 'other', 'iot_nw_netflow2')
True

v.create_table(database=None, table=None, schema=None, engine=None, exists_ok=False)

Description: Creates a new table
Parameters
- database - the database to create the table in
- table - the table name
- schema - the schema ColumnType definition
- engine - the database engine definition
- exists_ok - don't error if the table already exists
Returns
- bool

Example

>>> v.create_table('default', 'iot_nw_netflow3', v.getSchema('default', 'iot_nw_netflow'), engines.Memory())
True

v.create_view(database, view, query, exists_ok=False)

Description: Creates a new view based on the specified query
Parameters
- database - the database to create the table in
- view - the view name
- query - the source query for the view - can be a DataTable
- exists_ok - don't error if the table already exists
Returns
- bool

Example

>>> vdef = 'SELECT count() FROM myrecords'
>>> v.create_view('default', 'myview', vdef)
True

v.create_mat_view(database, view, query, to_database, to_table, engine, exists_ok)

Description: Creates a new materialized view in the target database with name view using the specified query
Parameters
- database - the database to create the table in
- view - the materialized view name
- query - the source query for the MatView
- to_database - the database to write the result table in
- to_table - the table to write the result table to
- engine - the database engine definition
- exists_ok - don't error if the table already exists
Returns
- bool

Example

>>> mv_def = 'SELECT device_id, substring(def_col), count() FROM myrecords'
>>> v.create_mat_view('default', 'mymatview', mv_def, 'default', 'targettable', engines.MergeTree())
True

v.create_sink(database, sink, schema, exists_ok=False)

Description: Creates a new input sink (table with the Null engine) suitable for streaming input with transformations.
Parameters
- database - the database to create the table in
- table - the table name
- schema - the schema ColumnType definition
- exists_ok - don't error if the table already exists
Returns
- bool

Example

>>> v.create_sink('default', 'netflow_sink', v.getSchema('default', 'iot_nw_netflow'))
True

v.drop_table(database, table), drop_sink, drop_view, drop_mat_view

Description: Drops the specified table/object.
Parameters
- database - the database containing the object to drop
- table - the table/object to remove
Returns
- bool

Example

v.drop_table('default','old_table')
True

v.drop_part(database=None, table=None, partitions=None)

Description: Drops the specified partitions from the table
Parameters
- database - the database containing the required table
- table - the table to call the drop operation on
- partitions: list - a list of partitions to drop
Returns
- bool

Example

v.drop_part('default','trades_2015',partitions=['2015-02-01','2015-03-01'])
True

v.drop_columns(database, table, columns)*

Description: Drops the specified columns from the table
Parameters
- database - the database containing the required table
- table - the table to call the drop operation on
- columns: list - a list of columns to drop
Returns
- bool

Example

v.drop_columns('default','trades_2015',['city','state'])
True

v.truncate_table(database, table)

Description: Truncates the specified table.
Parameters
- database - the database containing the object to truncate
- table - the table/object to truncate
Returns
- bool

Example

v.truncate_table('default', 'truncateme')
True

Object Management & Profiling

Data Profiling

v.profile(table: str)

Management Operations

v.use(database: str)

v.db()

v.tables(database: str), v.t(database: str)*

v.table_exists(database, table)

v.desc(table: str), v.d(table: str)

v.get_create(table: str)

v.show_create(table: str), v.c(table: str)

v.get_schema(database: str=None, table: str=None)

v.optimize(database: str, table: str, partition=None, final=False, deduplicate=False)

v.parts(database=None, table=None)

v.clone_table(from_database, from_table, database, table)

v.copy_table(from_database, from_table, database, table)

v.create_table(database=None, table=None, schema=None, engine=None, exists_ok=False)

v.create_view(database, view, query, exists_ok=False)

v.create_mat_view(database, view, query, to_database, to_table, engine, exists_ok)

v.create_sink(database, sink, schema, exists_ok=False)

v.drop_table(database, table), drop_sink, drop_view, drop_mat_view

v.drop_part(database=None, table=None, partitions=None)

v.drop_columns(database, table, *columns)

v.truncate_table(database, table)

v.drop_columns(database, table, columns)*