Standard SQL Dialect

Overview

The underlying Database behind Cooladata is Google BigQuery. Cooladata has added extensions and processing methods in order to improve BigQuery’s performance, run time as well as enables easier behavioral analysis with it’s CQL functions.

Currently, BigQuery supports two SQL dialects: standard SQL and legacy SQL. There are a few differences between the two dialects, including syntax, functions, data types and semantics. BigQuery standard SQL is compliant with the SQL 2011 standard and also includes extensions that support querying nested and repeated data.

Up to August 2018, Cooladata enabled running SQL scripts using Legacy SQL only. Cooladata now enables running freehand queries over Cooladata using Standard SQL, as part of a strategic plan to migrate Cooladata to operate solely in Standard SQL, encompassing all the advantages of the language, including performance enhancements, query optimizations and superior functionality.

Note! Standard SQL in Cooladata is still in beta. Please contact us with any issues or questions.

Using Standard SQL in Queries

Standard SQL is currently only supported in freehand queries, in the following features:

  • CQL report
  • Aggregation tables
  • R/Python Reports
  • Models
  • CQL Segments
  • CQL alert builders

Note! Using the CQL behavioral extensions (such as the Cohort, Funnel and Path functions) is not yet supported when using Standard Dialect.

To use Standard SQL, turn on the Standard SQL toggle in the top right hand side of the query editor:

 

 

Make sure you use Standard SQL syntax when writing the query. We’ve gathered a few syntax differences for you to start with. For more information see BigQuery documentation or contact support@cooladata.com:

Casting and Conversion Functions

In Legacy SQL, conversion and casting was done using <data type> (<expression>)

For example (Legacy):

SELECT float(1), float('87.35'), float('david') 
|
returns: 1.0, 87.35, null

In Standard SQL, conversion and casting is done by calling the CAST or SAFE_CAST functions.

For example (Standard):

SELECT CAST(1 AS FLOAT), CAST('87.35' AS FLOAT), CAST('david' as FLOAT)
|
returns: 1, 87.35, error

When using CAST, a query can fail if BigQuery is unable to perform the cast. For example, CAST('david' as FLOAT)returned an error.

When using SAFE_CAST, returns NULL instead of raising an error:

SELECT SAFE_CAST(1 AS FLOAT), SAFE_CAST('87.35' AS FLOAT), SAFE_CAST('david' as FLOAT)
|
returns: 1, 87.35, NULL

 

Date and Time Functions

All Cooladata date_range functions, such as last n days, current month, previous quarter, etc. are supported in Standard SQL as well.

The main difference is in the extraction and date functions:

EXTRACT

The EXTRACT functions returns the value corresponding to the specified date part of a date or timestamp expression.

In Legacy SQL, extracting a part of a date would work like so: part(timestamp/date expression) would return the month number of the timestamp.

For example (Legacy):

SELECT MONTH(event_time_ts) --when event_time_ts is: 2018-10-03 00:01
|
returns: 10

In Standard SQL, extracting a part of a date needs to use the EXTRACT function: EXTRACT(part FROM date_expression)

For example (Standard):

SELECT EXTRACT(MONTH FROM event_time_ts) --when event_time_ts is: 2018-10-03 00:01
|
returns: 10

DATE_ADD

The DATE_ADD function adds a specified time interval to a DATE or TIMESTAMP.

In Legacy SQL, DATE_ADD worked in the following syntax: DATE_ADD(timestamp or date expression,interval, interval_units)

For example (Legacy):

SELECT DATE_ADD(TIMESTAMP('2018-10-01 02:03'), 5, 'DAY')
|
returns: 2018-10-06 02:03

In Standard SQL, DATE_ADD only works with a date_expression (see TIMESTAMP_ADD for timestamp expression), using in the following syntax: DATE_DIFF(date_expression, date_expression, date_part)

For example (Standard):

SELECT DATE_ADD(DATE('2018-10-01'), INTERVAL 5 DAY)
|
returns:  2018-10-06 00:00

To add a time interval to a timestamp expression, use TIMESTAMP_ADD

For example (Standard):

SELECT TIMESTAMP_ADD(TIMESTAMP('2018-10-01 02:03:00'), INTERVAL 5 DAY)
|
returns: 2018-10-06 02:03

See the BigQuery reference for the supported interval units.

DATE_DIFF

The DATE_DIFF function returns the difference between two specified dates or timestamps.

In Legacy SQL, the function is called: DATEDIFF (no underscore) with the following syntax:  DATEDIFF(<timestamp1>,<timestamp2>) and returns the number of days between two TIMESTAMP data types. The result is positive if the first TIMESTAMP data type comes after the second TIMESTAMP data type, and otherwise the result is negative.

For example (Legacy):

SELECT DATEDIFF(TIMESTAMP('2012-10-02 05:23:48'), TIMESTAMP('2011-06-24 12:18:35'))
|
returns: 466

In Standard SQL, as with the DATE_ADD function, DATE_DIFF is only supported for DATE inputs. The function is used with the following syntax: DATE_DIFF(date_expression, date_expression, date_part). Returns the number of date_part boundaries between the two date_expressions. If the first date occurs before the second date, then the result is non-positive.

For example (Standard):

SELECT  DATE_DIFF(DATE('2012-10-02 05:23:48'), DATE('2011-06-24 12:18:35'), DAY)
|
returns:  466

To return the difference between 2 timestamp expressions use TIMESTAMP_DIFF. Returns the number of whole specified date_part intervals between two timestamps. The first timestamp_expression represents the later date; if the first timestamp_expression is earlier than the second timestamp_expression, the output is negative. Throws an error if the computation overflows the result type, such as if the difference in microseconds between the two timestamps would overflow an INT64 value.

For example (Standard):

SELECT TIMESTAMP_DIFF(TIMESTAMP('2012-10-02 05:23:48'), TIMESTAMP('2011-06-24 12:18:35'), DAY)
|
returns:  465

 

Count Functions

Both legacy SQL and standard SQL contain a COUNT function. However, each function behaves differently, depending on the SQL dialect you use.

In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard SQL, it returns an exact count. For an approximate count of distinct values that runs faster and requires fewer resources, use APPROX_COUNT_DISTINCT.

NOT IN conditions and NULL Values

Legacy SQL does not comply with the SQL standard in its handling of NULL with NOT IN conditions, whereas standard SQL does.

UNNEST function

The UNNEST function takes an ARRAY and returns a table, with one row for each element in the ARRAY. You can also use UNNEST outside of the FROM clause with the IN operator.

For example:

SELECT * FROM UNNEST ([1, 2, 3]) as unnest_alias;

Returns:

Unnest_alias

1

2

3

For input ARRAYs of most element types, the output of UNNEST generally has one column. This single column has an optional alias, which you can use to refer to the column elsewhere in the query. ARRAYS with these element types return multiple STRUCT columns.

OFFSET clause

OFFSET specifies a non-negative skip row of type INT64, and only rows from that offset in the table will be considered.

This clause accept only literal or parameter values.

 

Aggregate Functions

An aggregate function is a function that performs a calculation on a set of values.

ARRAY_AGG

ARRAY_AGG function returns an ARRAY of column values, this function supports all types except ARRAY.

Syntex:

ARRAY_AGG([DISTINCT] column [{IGNORE|RESPECT} NULLS]

         [ORDER BY key [{ASC|DESC}] [, ... ]]  [LIMIT n]) [OVER (...)]

For example:

SELECT FORMAT("%T", ARRAY_AGG(DISTINCT x)) AS array_agg

FROM UNNEST([NULL, 1, -2, 3, -2, 1, NULL]) AS x

Returns:  1, -2, 3, NULL]

STRING_AGG

STRING_AGG function returns a value (either STRING or BYTES) obtained by concatenating non-null values.

If a delimiter is specified, concatenated values are separated by that delimiter; otherwise, a comma is used as a delimiter.

Syntex:

STRING_AGG([DISTINCT] expression [, delimiter]  [ORDER BY key [{ASC|DESC}] [, ... ]] [LIMIT n]) [OVER (...)]

For example:

SELECT STRING_AGG(fruit, " & ") AS string_agg

FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit

Returns:  apple & pear & banana & pear

ARRAY_CONCAT_AGG

ARRAY_CONCAT_AGG function concatenates elements from an expression of type ARRAY, returning a single ARRAY as a result. This function ignores NULL input arrays, but respects the NULL elements in non-NULL input arrays.

Syntex:

ARRAY_CONCAT_AGG(expression  [ORDER BY key [{ASC|DESC}] [, ... ]]  [LIMIT n])

For example:

SELECT FORMAT("%T", ARRAY_CONCAT_AGG(x)) AS array_concat_agg FROM (

 SELECT [NULL, 1, 2, 3, 4] AS x

 UNION ALL SELECT [5, 6])

Returns: [NULL, 1, 2, 3, 4, 5, 6]

Array Functions

ARRAY

The ARRAY function returns an ARRAY with one element for each row in a subquery.

For example:

SELECT ARRAY

 (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS new_array

Returns: [1, 2, 3]

ARRAY_CONCAT

ARRAY_CONCAT function concatenates one or more arrays with the same element type into a single array.

For example:

SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six;

Returns:[1, 2, 3, 4, 5, 6]

ARRAY_TO_STRING

ARRAY_TO_STRING function returns a concatenation of the elements in array_expression as a STRING. The value for array_expression can either be an array of STRING or BYTES data types.

For example:

SELECT ARRAY_TO_STRING(["apples", "bananas", "pears", "grapes"] , '--')

Returns: apples--bananas--pears--grapes

ARRAY_LENGTH

ARRAY_LENGTH function returns the size of the array. Returns 0 for an empty array. Returns NULL if the array is NULL.

For example:

SELECT ARRAY_LENGTH(["apples", "bananas", "pears", "grapes"])

Returns:  4

ARRAY_REVERSE

ARRAY_REVERSE function returns the input ARRAY with elements in reverse order.

For example:

SELECT ARRAY_REVERSE(["apples", "bananas", "pears"])

Returns:  ["pears", "bananas", "apples"]

OFFSET and ORDINAL

Accesses an ARRAY element by position and returns the element. OFFSET means that the numbering starts at zero, ORDINAL means that the numbering starts at one.

For example:

SELECT list, list[OFFSET(1)] as offset, list[ORDINAL(1)] as ordinal

Returns:

| list                              | offset | ordinal |

| [apples, bananas, pears, grapes] | bananas | apples  |

SAFE_OFFSET and SAFE_ORDINAL

SAFE functions identical to OFFSET and ORDINAL, except returns NULL if the index is out of range.

For example:

SELECT list, list[OFFSET(3)] as offset, list[ORDINAL(3)] as ordinal

Returns:

| list                  | offset | ordinal |

| [coffee, tea, milk]   | NULL | milk |

SELECT * EXCEPT / REPLACE

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.

For example:

SELECT * EXCEPT (order_id) →  returns all the column except             ‘order_id’ column.

A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement. In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.

For example:

SELECT * REPLACE ("widget" AS item_name) → the ‘item_name’ column returns "widget" value. 

Filtering selected tables using _TABLE_SUFFIX

To restrict the query so that it scans an arbitrary set of tables, use the _TABLE_SUFFIX pseudo column in the WHERE clause. The _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard.

For example:

The followed query scans only the tables customer_0 and customer_4.

SELECT *

FROM `external.mySqlData.customer_*`

WHERE ( _TABLE_SUFFIX = '0'

   AND _TABLE_SUFFIX = '4' )

 

Using Expressions in Standard SQL

Since the Virtual Properties of type expression use SQL syntax when called in a query, the expression needs to match the query dialect type. To support Standard SQL dialect when using expressions, we have provided a way to enter the Standard SQL expression in the expression configuration.

To do so, in the expression configuration click on the “Advanced” section – there you can configure the Standard SQL expression. Click on “Copy legacy to standard expression” to copy the Legacy expression to the Standard one.

When using a Standard SQL query, the system will use the Standard SQL expression and if empty, it will try to use the Legacy one (in most functions the syntax remains the same between the two dialects).

 

Using Standard SQL in Query API

When querying Cooladata using API requests or any other way outside the UI (like when using our JDBC connection) – you can turn on standard SQL using one of the following methods:

  1. Using the SQL annotation: #standardSQL in the beginning of the query:
    #standardSQL
    SELECT  DATE_DIFF(DATE('2012-10-02 05:23:48'), DATE('2011-06-24 12:18:35'), DAY)
    |
    returns:  466
  2. Adding the standard parameter as part of the queryAPI request:
    https://app.cooladata.com/api/v2/projects/<project id>/cql?standard=true
Print Friendly, PDF & Email