Object Management & Profiling
Vulkn provides some simple functions for managing and interrogating objects. These range from simple CRUD operations to partition management and data profiling.
Data Profiling
v.profile(table: str)
- Description: Profiles the table returning statistical information regarding the shape of the data. Data types are inspected using different methods or options such as max_length for character types vs max values for decimals.
- Parameters
table
- the table to inspect
- Returns
dict
- Example
>>> v.profile('timeseries') {'key': {'data_type': 'String', 'kurt_pop': 13.22782691569896, 'kurt_samp': 13.227785078755689, 'max': 'device-999', 'max_length': 10, 'mean_length': 10, 'median_length': 10, 'min': 'device-0', 'min_length': 8, 'mode': 'device-990', 'nulls': 0, 'skew_pop': -3.2023548144821627, 'skew_samp': -3.2023472181783337, 'stddev_pop': 0.34284272666371696, 'stddev_samp': 0.34284299774981736, 'topK': ['device-990', 'device-984', 'device-851', 'device-878', 'device-991', 'device-849', 'device-934', 'device-828', 'device-839', 'device-362'], 'total_records': 632351, 'uniq_values': 1000, 'var_pop': 0.11754113522621215, 'var_samp': 0.11754132110608125}, 'ts': {'data_type': "DateTime(\\'Australia/Melbourne\\')", 'histogram': [('2019-01-01 00:00:00', '2019-01-01 00:01:35', 59936), ('2019-01-01 00:01:35', '2019-01-01 00:03:09', 59411), ('2019-01-01 00:03:09', '2019-01-01 00:04:45', 60835), ('2019-01-01 00:04:45', '2019-01-01 00:06:24', 63316), ('2019-01-01 00:06:24', '2019-01-01 00:08:07', 64741), ('2019-01-01 00:08:07', '2019-01-01 00:09:51', 65790), ('2019-01-01 00:09:51', '2019-01-01 00:11:37', 67164), ('2019-01-01 00:11:37', '2019-01-01 00:13:22', 66577), ('2019-01-01 00:13:22', '2019-01-01 00:15:02', 63605), ('2019-01-01 00:15:02', '2019-01-01 00:16:39', 60975)], 'kurt_pop': -251816852146683.28, 'kurt_samp': -251816055700870.62, 'max': '2019-01-01 00:16:39', 'mean': '2019-01-01 00:08:20', 'median': '2019-01-01 00:08:16', 'min': '2019-01-01 00:00:00', 'mode': '2019-01-01 00:16:38', 'nulls': 0, 'skew_pop': 16028484.98636335, 'skew_samp': 16028446.965204256, 'stddev_pop': 231.27362468169503, 'stddev_samp': 231.2738075499905, 'topK': ['2019-01-01 00:16:38', '2019-01-01 00:15:48', '2019-01-01 00:15:55', '2019-01-01 00:16:06', '2019-01-01 00:16:09', '2019-01-01 00:16:15', '2019-01-01 00:16:18', '2019-01-01 00:16:19', '2019-01-01 00:16:20', '2019-01-01 00:16:23'], 'total_records': 632351, 'uniq_values': 1000, 'var_pop': 53487.48947340954, 'var_samp': 53487.57405867004}, 'value': {'data_type': 'UInt32', 'histogram': [(246, 454075146.24017453, 66170), (454075146.24017453, 902366140.8160808, 65836), (902366140.8160808, 1310216847.6019177, 62907), (1310216847.6019177, 1674819224.429403, 50024), (1674819224.429403, 2067148057.3543935, 60125), (2067148057.3543935, 2492222219.7494526, 61820), (2492222219.7494526, 2922957520.3494115, 62958), (2922957520.3494115, 3366978285.4684887, 65516), (3366978285.4684887, 3825636033.0746126, 67631), (3825636033.0746126, 4294934229, 69363)], 'kurt_pop': 1.8003812394411711, 'kurt_samp': 1.8003755451991572, 'max': 4294934229, 'mean': 2148598926, 'median': 2178824636, 'min': 246, 'mode': 1290303696, 'nulls': 0, 'skew_pop': -0.0002275688367801933, 'skew_samp': -0.0002275682969642996, 'stddev_pop': 1239547006.8484387, 'stddev_samp': 1239547986.9595456, 'topK': [1290303696, 2903569362, 2615129534, 33938448, 4286069806, 586779699, 662198152, 2473897446, 2284234981, 1655328950], 'total_records': 632351, 'uniq_values': 632339, 'var_pop': 1536476782186923500, 'var_samp': 1536479211975462000}}
Management Operations
v.use(database: str)
- Description: Sets the current table search path to the named database.
- Parameters
database: str
- the new default database/search path
- Example
v.use('mydatabase')
v.db()
- Description: Returns a list of the databases available in the current instance
- Returns
vulkn.recordset.RecordSet
- Example
>>> v.db().s row name ----- ----------------- 1 billy 2 cached 3 database_for_dict 4 default 5 dicts 6 system 7 tmp 8 vulkn (8 rows)
v.tables(database: str), v.t(database: str)*
- Description: Returns a list of the tables available in the specified or current database (default)
- Parameters
database
- the database to inspect
- Returns
vulkn.recordset.RecordSet
- Example
>>> v.t('cached').s row database name engine partition_key sorting_key ----- ---------- ------ --------- --------------- ------------- 1 cached test Log nan nan 2 cached test1 MergeTree nan number (2 rows)
v.table_exists(database, table)
- Description: Tests if the specified table exists
- Parameters
database
- the database containing the object to testtable
- the table/object to test
- Returns
bool
- Example
v.table_exists('default', 'test') False
v.desc(table: str), v.d(table: str)
- Description: Describe the schema/configuration of the specified table.
- Aliases:
v.d(table: str)
- Parameters
table: str
- the new default database/search path
- Returns
vulkn.recordset.RecordSet
- Example
>>> v.desc('system.databases').s row name type default_type default_expression comment codec_expression ttl_expression ----- ------------- ------ -------------- -------------------- --------- ------------------ ---------------- 1 name String nan nan nan nan nan 2 engine String nan nan nan nan nan 3 data_path String nan nan nan nan nan 4 metadata_path String nan nan nan nan nan (4 rows)
v.get_create(table: str)
- Description: Get the CREATE statement for the table or object
- Parameters
table: str
- the new default database/search path
- Returns
str
- Example
>>> v.get_create('default.timeseries') "CREATE TABLE default.timeseries\n(\n `key` String, \n `ts` DateTime('Australia/Melbourne'), \n `value` UInt32\n)\nENGINE = MergeTree\nPARTITION BY toStartOfDay(ts)\nORDER BY (key, ts)\nSETTINGS index_granularity = 8192;
v.show_create(table: str), v.c(table: str)
- Description: Show/pretty print the CREATE statement for the table or object
- Aliases:
v.c(table: str)
- Parameters
table: str
- the new default database/search path
- Returns
str
- Example
>>> v.show_create('default.timeseries') CREATE TABLE default.timeseries ( `key` String, `ts` DateTime('Australia/Melbourne'), `value` UInt32 ) ENGINE = MergeTree PARTITION BY toStartOfDay(ts) ORDER BY (key, ts) SETTINGS index_granularity = 8192;
v.get_schema(database: str=None, table: str=None)
- Description: Returns a list of the table schema suitable for the reader interface.
- Parameters
database: str
- the database where the table can be foundtable: str
- the table to inspect
- Returns
list[ColumnType]
- Example
>>> pprint.pprint(v.get_schema('default', 'timeseries')) [ColumnType('key', 'String', default_kind='nan', default_expression='nan' compression_codec='nan'), ColumnType('ts', 'DateTime(\'Australia/Melbourne\')', default_kind='nan', default_expression='nan' compression_codec='nan'), ColumnType('value', 'UInt32', default_kind='nan', default_expression='nan' compression_codec='nan')]
v.optimize(database: str, table: str, partition=None, final=False, deduplicate=False)
- Description: Executes the OPTIMIZE operation on the specified database.table [optional partition]
- Parameters
database
- the database containing the table to inspecttable
- the table to optimizepartition
- the optional partition of the table to optimizefinal
- whether to execute in OPTIMIZE FINAL modededuplicate
- whether to execute in OPTIMIZE DEDUPLICATE mode
- Returns
bool
- Example
>>> v.optimize('default', 'timeseries') True
v.parts(database=None, table=None)
- Description: Returns a list of the parts for the specified database.table.
- Parameters
database
- the database containing the tabletable
- the table to inspect
- Returns
vulkn.recordset.RecordSet
- Example
>>> v.parts('default', 'iot_nw_netflow').s row partition rows path ----- ----------- --------- ----------------------------------------------------------------------------- 1 2019-10-01 354001673 /var/lib/clickhouse/data/default/iot_nw_netflow/20191001_1809_2141_5_12328/ 2 2019-10-02 344417952 /var/lib/clickhouse/data/default/iot_nw_netflow/20191002_2142_3127_5_12328/ 3 2019-10-03 349287299 /var/lib/clickhouse/data/default/iot_nw_netflow/20191003_2151_3095_5_12328/ 4 2019-10-04 353890586 /var/lib/clickhouse/data/default/iot_nw_netflow/20191004_2170_3121_5_12328/ 5 2019-10-05 351980346 /var/lib/clickhouse/data/default/iot_nw_netflow/20191005_3128_4419_5_12328/ 6 2019-10-06 337347169 /var/lib/clickhouse/data/default/iot_nw_netflow/20191006_3139_4444_5_12328/ 7 2019-10-07 360367914 /var/lib/clickhouse/data/default/iot_nw_netflow/20191007_3140_4445_6_12328/ 8 2019-10-08 350891850 /var/lib/clickhouse/data/default/iot_nw_netflow/20191008_4446_5097_5_12328/ 9 2019-10-09 353855520 /var/lib/clickhouse/data/default/iot_nw_netflow/20191009_4456_5108_5_12328/ 10 2019-10-10 349907606 /var/lib/clickhouse/data/default/iot_nw_netflow/20191010_5109_5765_5_12328/ (10 rows)
v.clone_table(from_database, from_table, database, table)
- Description: Copies a table definition only from database.table to another.
- Parameters
from_database
- the source database to clone fromfrom_table
- the source table to clone fromdatabase
- the target database to clone totable
- the target table to clone to
- Returns
bool
- Example
>>> v.clone_table('default', 'iot_nw_netflow', 'other', 'iot_nw_netflow2') True
v.copy_table(from_database, from_table, database, table)
- Description: Copies a table, including it's data, from the source table to the targe table
- Parameters
from_database
- the source database to copy fromfrom_table
- the source table to copy fromdatabase
- the target database to copy totable
- the target table to copy to
- Returns
bool
- Example
>>> v.copy_table('default', 'iot_nw_netflow', 'other', 'iot_nw_netflow2') True
v.create_table(database=None, table=None, schema=None, engine=None, exists_ok=False)
- Description: Creates a new table
- Parameters
database
- the database to create the table intable
- the table nameschema
- the schema ColumnType definitionengine
- the database engine definitionexists_ok
- don't error if the table already exists
- Returns
bool
- Example
>>> v.create_table('default', 'iot_nw_netflow3', v.getSchema('default', 'iot_nw_netflow'), engines.Memory()) True
v.create_view(database, view, query, exists_ok=False)
- Description: Creates a new view based on the specified query
- Parameters
database
- the database to create the table inview
- the view namequery
- the source query for the view - can be a DataTableexists_ok
- don't error if the table already exists
- Returns
bool
- Example
>>> vdef = 'SELECT count() FROM myrecords' >>> v.create_view('default', 'myview', vdef) True
v.create_mat_view(database, view, query, to_database, to_table, engine, exists_ok)
- Description: Creates a new materialized view in the target database with name view using the specified query
- Parameters
database
- the database to create the table inview
- the materialized view namequery
- the source query for the MatViewto_database
- the database to write the result table into_table
- the table to write the result table toengine
- the database engine definitionexists_ok
- don't error if the table already exists
- Returns
bool
- Example
>>> mv_def = 'SELECT device_id, substring(def_col), count() FROM myrecords' >>> v.create_mat_view('default', 'mymatview', mv_def, 'default', 'targettable', engines.MergeTree()) True
v.create_sink(database, sink, schema, exists_ok=False)
- Description: Creates a new input sink (table with the Null engine) suitable for streaming input with transformations.
- Parameters
database
- the database to create the table intable
- the table nameschema
- the schema ColumnType definitionexists_ok
- don't error if the table already exists
- Returns
bool
- Example
>>> v.create_sink('default', 'netflow_sink', v.getSchema('default', 'iot_nw_netflow')) True
v.drop_table(database, table), drop_sink, drop_view, drop_mat_view
- Description: Drops the specified table/object.
- Parameters
database
- the database containing the object to droptable
- the table/object to remove
- Returns
bool
- Example
v.drop_table('default','old_table') True
v.drop_part(database=None, table=None, partitions=None)
- Description: Drops the specified partitions from the table
- Parameters
database
- the database containing the required tabletable
- the table to call the drop operation onpartitions: list
- a list of partitions to drop
- Returns
bool
- Example
v.drop_part('default','trades_2015',partitions=['2015-02-01','2015-03-01']) True
v.drop_columns(database, table, *columns)
- Description: Drops the specified columns from the table
- Parameters
database
- the database containing the required tabletable
- the table to call the drop operation oncolumns: list
- a list of columns to drop
- Returns
bool
- Example
v.drop_columns('default','trades_2015',['city','state']) True
v.truncate_table(database, table)
- Description: Truncates the specified table.
- Parameters
database
- the database containing the object to truncatetable
- the table/object to truncate
- Returns
bool
- Example
v.truncate_table('default', 'truncateme') True