Strings

String types are used to represent any non-numeric data

>>> v.select(String('ClickHouse rocks!').alias('string_value')).s

  row  string_value
-----  -----------------
    1  ClickHouse rocks!

(1 row)

ClickHouse references

Operators

The Vulkn String object supports all common operators. Operators are valid between both Python, Vulkn and ClickHouse SQL types.

equals, ==, v.select(String('ClickHouse') == 'clickhouse').s
not equals, !=, v.select(String('ClickHouse') != 'clickhouse').s
greater than, >, v.select(String('ClickHouse') > 'clickhouse').s
less than, <, v.select(String('ClickHouse') < 'clickhouse').s
greater than or equal to, >=, v.select(String('ClickHouse') >= 'clickhouse').s
less than or equal to, <=, v.select(String('ClickHouse') <= 'clickhouse').s
concatentation, +, v.select(String('ClickHouse') + ' ' + 'clickhouse').s
or, or, v.select(String('ClickHouse') or 'clickhouse').s
by index, [idx], v.select(String('ClickHouse')[1]).s
slicing, [start:end], v.select(String('ClickHouse')[1:5]).s

Functions

alphaTokens

Return a base64 encoded version of the string.

v.select(String('freedom is the right of all sentient beings').alphaTokens()).s

  row  alphaTokens(\'freedom is the right of all sentient beings\')
-----  --------------------------------------------------------------
    1  ['freedom','is','the','right','of','all','sentient','beings']

(1 row)

appendTrailingCharIfAbsent(char_arg: str), append_if_missing

Return a base64 encoded version of the string.

v.select(String('http://google.com').appendTrailingCharIfAbsent('/')).s
  row  appendTrailingCharIfAbsent(\'http://google.com\', \'/\')
-----  ----------------------------------------------------------
    1  http://google.com/

(1 row)

base64Decode

Return a base64 encoded version of the string.

v.select(String('Q2xpY2tIb3VzZSByb2NrcyE=').base64Decode()).s

  row  base64Decode(\'Q2xpY2tIb3VzZSByb2NrcyE=\')
-----  --------------------------------------------
    1  ClickHouse rocks!

(1 row)

base64Encode

Return a base64 encoded version of the string.

v.select(String('ClickHouse rocks!').base64Encode()).s

  row  base64Encode(\'ClickHouse rocks!\')
-----  -------------------------------------
    1  Q2xpY2tIb3VzZSByb2NrcyE=

(1 row)

len, length, lengthUTF8, character_length, CHARACTER_LENGTH, char_length, CHAR_LENGTH

Return the length of the string. lengthUTF8 returns the UTF8 length of the string.

v.select(String('ClickHouse rocks!').len()).s

  row    length(\'ClickHouse rocks!\')
-----  -------------------------------
    1                               17

(1 row)

concatAssumeInjective(*args)

Same as concat, the difference is that you need to ensure that concat(s1, s2, s3) -> s4 is injective, it will be used for optimization of GROUP BY.

concat(*args)

Concatenates multiple strings.

Concat can be used against both ClickHouse and Python strings:

>>> v.select(String('hello').concat(String(' world')).concat('!')).s

  row  concat(concat(\'hello\', \' world\'), \'!\')
-----  ----------------------------------------------
    1  hello world!

(1 row)

Multiple strings can be specified per concat call:

>>> v.select(String('hello').concat(String(' world'), '!')).s

  row  concat(\'hello\', \' world\', \'!\')
-----  --------------------------------------
    1  hello world!

(1 row)

The Python '+' operator can be used in place of the concat function call:

>>> v.select(String('hello') + String(' world') + '!').s

  row  concat(concat(\'hello\', \' world\'), \'!\')
-----  ----------------------------------------------
    1  hello world!

(1 row)

convertCharset(from_arg: str, to_arg: str)

Converts the string from encoding from_arg to encoding to_arg.

CRC32

Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial and initial value 0xffffffff (zlib implementation).

The result type is UInt32.

>>> v.select(String('hello').CRC32()).s

  row    CRC32(\'hello\')
-----  ------------------
    1           907060870

(1 row)

empty

Returns a boolean 1/true, 0/false value indicating if the string is an empty string.

>>> v.select(String('').empty()).s

  row    empty(\'\')
-----  -------------
    1              1

(1 row)

endswith(pattern: str)

Returns a boolean (1 for true, 0 for false) indicating if the target string ends with the specified string.

>>> v.select(String('Hello World!').endswith('Hello')).s

  row    like(\'Hello World!\', \'%Hello')
-----  ------------------------------------
    1                                     0

(1 row)

extractAll(pattern: str)

Extracts all instances of the string that match the specified regular expression.

>>> v.select(String('the cat sat').extractAll('.at')).s

  row  extractAll(\'the cat sat\', \'.at\')
-----  --------------------------------------
    1  ['cat','sat']

(1 row)

extract(pattern: str)

Extracts the first instance of the string that matches pattern.

>>> v.select(String('the cat sat').extract('.at')).s

  row  extract(\'the cat sat\', \'.at\')
-----  -----------------------------------
    1  cat

(1 row)

isdecimal

Returns boolean true (1) if the value is a decimal value, boolean false (0) otherwise.

>>> v.select(String('1.2').isdecimal()).s

  row    match(\'1.2\', \'^[0-9]*.[0-9]*$\')
-----  -------------------------------------
    1                                      1

(1 row)

isnumeric

Returns boolean true (1) if the value is a numeric value, boolean false (0) otherwise.

>>> v.select(String('1.2').isnumeric()).s

  row    match(\'1.2\', \'^[0-9]*$\')
-----  ------------------------------
    1                               0

(1 row)

>>> v.select(String('1').isnumeric()).s

  row    match(\'1\', \'^[0-9]*$\')
-----  ----------------------------
    1                             1

(1 row)

isValidUTF8

Returns 1, if the set of bytes is valid UTF-8 encoded, otherwise 0.

join(join_chars: str)

Concatenates/joins the given String array by the specified join_chars characters.

>>> v.select(String('hello world').split(' '), String('hello world').split(' ').join('|')).s

  row  splitByString(\' \', \'hello world\')    arrayStringConcat(splitByString(\' \', \'hello world\'), \'|\')
-----  ---------------------------------------  -----------------------------------------------------------------
    1  ['hello','world']                        hello|world

(1 row)

Python lists can also be joined by declaring the list as an Array type.

>>> v.select(Array(['hello','world']).join('|')).s

  row  arrayStringConcat([\'hello\', \'world\'], \'|\')
-----  --------------------------------------------------
    1  hello|world

(1 row)

like(pattern: str)

Returns boolean true (1) if the string is a simple match for the pattern.

>>> v.select(String('the cat sat').like('%cat%')).s

  row    like(\'the cat sat\', \'%cat%\')
-----  ----------------------------------
    1                                   1

(1 row)

lower, lowerUTF8, lcase

Converts the string to lowercase.

>>> v.select(String('HELLO WOrlD').lower()).s

  row  lower(\'HELLO WOrlD\')
-----  ------------------------
    1  hello world

(1 row)

match(pattern: str)

Returns boolean true (1) if the regular expression pattern can be found within the specified string.

>>> v.select(String('the cat sat').match('.at')).s

  row    match(\'the cat sat\', \'.at\')
-----  ---------------------------------
    1                                  1

(1 row)

notEmpty

Returns a boolean 1/true, 0/false value indicating if the string is not an empty string.

>>> v.select(String('').notEmpty()).s

  row    notEmpty(\'\')
-----  ----------------
    1                 0

(1 row)

notLike(pattern: str)

Returns a boolean 1/true, 0/false value indicating if the string doesn't match the specified search string.

>>> v.select(String('Hello World! World!').notLike('%World%')).s

  row    notLike(\'Hello World! World!\', \'%World%\')
-----  -----------------------------------------------
    1                                                0

(1 row)

positionCaseInsensitive(needle: str), positionCaseInsensitiveUTF8

Returns the starting position of the first instance of the case insensitive specified string.

>>> v.select(String('Hello World! World!').positionCaseInsensitive('world')).s

  row    positionCaseInsensitive(\'Hello World! World!\', \'world\')
-----  -------------------------------------------------------------
    1                                                              7

(1 row)

position(needle: str), positionUTF8

Returns the starting position of the first instance of the specified string.

>>> v.select(String('Hello World! World!').position('World')).s

  row    position(\'Hello World! World!\', \'World\')
-----  ----------------------------------------------
    1                                               7

(1 row)

replaceAll(pattern, replacement)

Replace all found instances of the specified string.

>>> v.select(String('Hello World! World!').replaceAll('World', 'foobar')).s

  row  replaceAll(\'Hello World! World!\', \'World\', \'foobar\')
-----  ------------------------------------------------------------
    1  Hello foobar! foobar!

(1 row)

replaceOne(pattern, replacement)

Replace the first found instance of the specified string.

>>> v.select(String('Hello World! World!').replaceOne('World', 'foobar')).s

  row  replaceOne(\'Hello World! World!\', \'World\', \'foobar\')
-----  ------------------------------------------------------------
    1  Hello foobar! World!

(1 row)

replaceRegexpAll(pattern, replacement)

Replace all found instances that match the regular expression.

>>> v.select(String('Hello World! World!').replaceRegexpAll('World(!|$)', 'foobar')).s

  row  replaceRegexpAll(\'Hello World! World!\', \'World(!|$)\', \'foobar\')
-----  -----------------------------------------------------------------------
    1  Hello foobar foobar

(1 row)

replaceRegexpOne(pattern, replacement)

Replace the first found instance matching the regular expression.

>>> v.select(String('Hello World! World!').replaceRegexpOne('! World.*$', 'foobar')).s

  row  replaceRegexpOne(\'Hello World! World!\', \'! World.*$\', \'foobar\')
-----  -----------------------------------------------------------------------
    1  Hello Worldfoobar

(1 row)

replace(pattern, replacement, count=None)

Replace the specified portion of the string.

>>> v.select(String('Hello World!').replace('World', 'foobar')).s

  row  replaceAll(\'Hello World!\', \'World\', \'foobar\')
-----  -----------------------------------------------------
    1  Hello foobar!

(1 row)

reverse, reverseUTF8

Reverse the given string.

>>> v.select(String('Hello World!').reverse()).s

  row  reverse(\'Hello World!\')
-----  ---------------------------
    1  !dlroW olleH

(1 row)

split(separator: str), splitByString(separator: str), splitByChar(separator_arg: str)

Splits the string into an array using the separator character. splitByChar only accepts a single character argument.

v.select(String('freedom is the right of all sentient beings').split(' ')).s 

  row  splitByString(\' \', \'freedom is the right of all sentient beings\')
-----  -----------------------------------------------------------------------
    1  ['freedom','is','the','right','of','all','sentient','beings']

(1 row)

splitlines

Splits the string into an array using the '\n' newline character as the separator.v.select(String('Hello\nworld!').splitlines()).s

>>> v.select(String('Hello\nWorld!').splitlines()).s

  row  splitByChar(\'\\n\', \'Hello\\nWorld!\')
-----  ------------------------------------------
    1  ['Hello','World!']

(1 row)

startswith(pattern: str)

Returns a boolean (1 for true, 0 for false) indicated if the target string starts with the specific string.

>>> v.select(String('Hello World!').startswith('Hello')).s

  row    like(\'Hello World!\', \'Hello%\')
-----  ------------------------------------
    1                                     1

(1 row)

substring(offset, length=None), substringUTF8(offset, length=None)

Returns the substring of the target string starting from offset with length length.

>>> v.select(String('Hello World!').substring(2, 3)).s

  row  substring(\'Hello World!\', 2, 3)
-----  -----------------------------------
    1  ell

(1 row)

toValidUTF8

Converts the specified string to UTF8.

>>> v.select(String('Hello World!').toValidUTF8()).s

  row  toValidUTF8(\'Hello World!\')
-----  -------------------------------
    1  Hello World!

(1 row)

trimBoth

Trims empty white space from both sides of a string.

>>> v.select(String('  Hello World!  ').trimBoth()).s

  row  trimBoth(\'  Hello World!  \')
-----  --------------------------------
    1  Hello World!

(1 row)

trimLeft(trim_str: str='\\\s*'), ltrim, lstrip

Trims the specified regex/character from the left side of a string. Any white space character by default.

>>> v.select(String('  Hello World!').trimLeft()).s

  row  trimLeft(\'  Hello World!\')
-----  ------------------------------
    1  Hello World!

(1 row)

trimRight(trim_str: str='\\\s*'), rtrim, rstrip

Trims the specified regex/character from the right side of a string. Any white space character by default.

>>> v.with_(String('Hello World!   ').alias('example')).select(String(n='example').len(), String(n='example').trimRight().len()).s

  row    length(example)    length(trimRight(example))
-----  -----------------  ----------------------------
    1                 15                            12

(1 row)

trim(trim_str: str='\\\s*'), strip

Trims the specified regex/character from both sides of a string. Any white space character by default.

>>> v.select(String('  Hello World!   ').strip()).s

  row  replaceRegexpAll(\'  Hello World!   \', \'^\\\\s*|\\\\s*$\', \'\')
-----  --------------------------------------------------------------------
    1  Hello World!

(1 row)

tryBase64Decode

Attempt to base64 decode the specified string

>>> v.select(String('Hello World!').base64Encode().tryBase64Decode()).s

  row  tryBase64Decode(base64Encode(\'Hello World!\'))
-----  ------------------------------------------------------
    1  Hello World!

(1 row)

upper, ucase, upperUTF8

Converts the string to uppercase.

v.select(String('hello world').ucase()).s

  row  ucase(\'hello world\')
-----  ------------------------
    1  HELLO WORLD

(1 row)

unhex

Applies the unhex operation to a string previously encoded into hex.

>>> v.select(funcs.encode.hex('HelloWorld').unhex()).s

  row  unhex(hex(\'HelloWorld\'))
-----  ----------------------------
    1  HelloWorld

(1 row)