CQL Overview

CoolaSQL, otherwise known as CQL, is at the heart of our platform’s flexibility. CQL is an extension of SQL, which means we support full SQL with proprietary extensions, as described below. The main difference between CoolaSQL and SQL is that CoolaSQL provides the ability to examine a sequence of events, whereas SQL is more set-based.

CoolaSQL supports multiple powerful proprietary clauses, functions, and special fields – allowing for easy analysis of typical behavioral patterns such as with path analysis, cohort analysis, and funnel analysis. The great value of CQL is when it comes to very detailed, tailor-made questions that need specific answers.

Best Practices

  • Included date_range in every query FROM CoolaData
    The date range function must be included in every query running from CoolaData at the same query level (in the WHERE clause directly after the FROM CoolaData).
  • Included slicers in queries FROM CoolaData
    Including the slicers function in the query will enable report and document slicers to affect the query. If omitted they will have no effect. Note that the slicers function must be located at the same query level running from CoolaData  (in the WHERE clause directly after the FROM CoolaData)
  • Use Limit to make queries run faster
    CoolaData limits the result set that can be viewed in the AD to 25,000 rows. For example, enter limit 10 to limit the number of results to 10. Limit should be the last part of the Select clause.  For larger result sets use the Query API or contact your Customer Success Manager for more options.
  • Use user_id in CQL when possible
    CoolaData generates a user_id for better performance. When applicable always use this user_id instead of customer_user_id or alternative_user_id.
Print Friendly

Cohort Analysis

Cohort analysis breaks down data into groups which usually share common characteristics or actions within a defined time frame. Cohort analysis helps identify patterns in the life cycle of customers, adapt, and tailor the service to specific cohorts. A group of people who share a common characteristic or actions within a defined period form a cohort, represented by a row in the returned result set. The date range selected in the report will specify at which date the first cohort begins.

There are various options available for each cohort clause. This section outlines the options for these clauses.

The following is a typical CQL™ query for cohort analysis:

The syntax for the cohort function is as follows:

 

SELECT

Mandatory fields in the SELECT clause:

  • cohort_name: a string describing the cohort:
    • For DAY & WEEK – the date (YYYY-MM-DD) of the first day in the cohort
    • For MONTH – the month (YYYY-MM) of the cohort
    • For Properties – the value of the property
  • cohort_id: the ordinal number of the cohort (starts with 0)
  • cohort_size: the number of users in the cohort
  • bucket_id: the ordinal number of the bucket (starts with 0)
  • Aggregate Function: can be the count of users in the bucket, the cohort size minus the count of users in the bucket (i.e. the rest of the users), or the sum/average/max/min/stddev of a measure (property).

All arguments but the aggregate function must appear in the GROUP BY clause.

 

CLUSTER COHORT BY

The grouping logic for each cohort. Can be defined by one of the following:

  • Time period defined by a number and a time resolution (days, weeks, months, quarters or years)
  • User scope properties

 

STARTS WITH

The first event performed:

  • event_name: performed a specific event
  • TRUE: performed any event

Any number of additional conditions for the event can be specified, i.e. where any of the event’s properties have specific values, using “AND <condition>”.

 

FOLLOWED BY

The concurrent event performed:

  • event_name: performed a specific event
  • NOT event_name: did not perform a specific event
  • TRUE: performed any event

Note: to find the number of users who did nothing use the TRUE argument and, in the select clause, count the number of users left in the bucket – i.e.:

Any number of additional conditions for the event can be specified, i.e. where any of the event’s properties have specific values, using “AND <condition>”.

 

Reverse Cohort

It is also possible to analyze a sequence of two events in the reverse order, i.e. what happened prior to a certain event. Replace “starts with… followed by…” with “ending with… preceded by…”, as followed:

Replace

With:

The rest of the query would still use the same syntax.
Any number of additional conditions for the event can be specified, i.e. where any of the event’s properties have specific values, using “AND <condition>”.

BUCKET BY

The duration of each bucket and the rule by which to count users in each bucket:

  • FIRST: only counts the user once on the first occurrence of the event throughout his life cycle
  • ALL: the user is counted each time he performed the event
  • RECURRING: each cohort user is counted only if the user’s activity occurred consecutively in each bucket throughout the duration of the cohort.

 

HAVING

Used to limit the number of buckets and cohorts in the query.

Examples

 

Q: Which users bought something after registering?

The cohorts are clusters of users that registered within the same 7-day period (for the date range of the query), and then paid for something at some future time. We want to find on which month they made the first payment.

 

Q: What is the difference in purchasing patterns between men and woman after registration?

The cohorts are clusters of users who are either male or female who registered during the date range of the query, and then went on to make a payment at a future time, but only those who bought in consecutive buckets.

Note: clustering cohorts by properties will typically include an N/A cluster for all users who do not have this property value (e.g. did not register).

Q: List of user ID’s and emails for all users who bought something within the first two weeks after registration.

To receive an actionable list of user information – change the selection and eliminate the GROUP BY clause. Using the the same cohort query, select the user ID’s and emails instead of the cohorts and buckets. To limit the list to the first two weeks, select only buckets 0 (the week of the registration), 1 and 2. Note that only user scope properties can be used (see more on property scopes here).

Print Friendly

Path Analysis

A path is a chain of consecutive events performed by a given user or cohort. For example, a user logs on to your e-commerce platform, browses through a few product pages, creates an account, and then logs off. This path is portrayals the events performed during the user’s session.

Path analysis is used in behavioral analytics to investigate and understand user behavior and gain actionable insights.

SELECT clause

The PATH query SELECT clause can contain any User/Session Properties and/or path functions.

CLUSTER PATH BY

The following is the syntax diagram for the CLUSTER PATH BY clause:

CLUSTER PATH BY is composed of the following elements:

  • CLUSTER PATH BY session: lists the events performed during a session. If you need to drill down to data at the events scope use other CQL functions.
  • REMOVE DUPLICATES: (optional) when this element is used, repeated events are treated as one event. Example: the following path:
    [login, viewItem, viewItem, viewItem, saveItem, purchaseItem]

    will remove all consecutive viewItem events and be translated to:
    [login, viewItem, saveItem, purchaseItem]
    .
  • CONTAINS: Path contains is a non-sequential expression resulting in all the paths that contain any the events, regardless of their order. “Contains” should be used instead of the match expression, and can include multiple conditions using AND or OR operators .
  • MATCH : Describes a sequential path, comprised of steps, each defined by a condition. All steps are optional – any combination of them can be applied.
    • Path steps:
      • STARTS WITH: The first event.
      • FOLLOWED BY: Any following event. Can be used multiple times, consecutively.
      • ENDING WITH: The last event.
    • Supported conditions:
      • any: matching any event. Would replace the step’s entire condition.
      • [NOT] event_name = <event_name>: matching a specific events. A preceding NOT will negate the condition.
      • OR: matching a set of event names linked by OR, to indicate that any of these events can be applied to this step.
      • DIRECTLY: Forces the events to be consecutive, i.e. no events occurred between the preceding step and the current one. Applicable to any step except the first.
  • HAVING PATH: supports aggregative path functions (such as path_to_string, path_duration, path_count) with standard mathematical operators: >, <, =, =<, =>, !=, <>.

Path analysis can be used in two ways – sequential path, and non-sequential path:

Examples

Non-Sequential Path (Contains)
The sequence of events of users who made a purchase, and then performed a social action: Like or Share.

Sequential Path

The top 5 paths by number of session of users who started by logging in, did a search, and made a purchase.

 

Path Functions

  • sub_path(path, event1, event2, …. eventn)
    This function will show only the paths that contain all the events in the list and in this order, and remove preceding and succeeding events. All events must be included at least once within the path. Events preceding the first event in the list or succeeding the last event in the list will not be shown. The function should include the property “path” and then a list of events.
    Example:

    Original Path: login, search, view_item, add_to_cart, share, checkout, search
    Result: view_item, add_to_cart, share, checkout
  • part_path (path, event1, event2, …eventn)
    This function will show only the events included in this list of events, from paths which include all the events in the list, regardless of their order in the path.  The function should include the property “path” and then a list of events.
    Example:

    Original Path: login, search, view_item, add_to_cart, share, checkout, search
    Result: add_to_cart, checkout
  • path_to_string(path | category_path)
    This function converts the path or category path (as indicated by the reserved parameters) into a comma delimited string.
    Example:
  • path_duration()
    This function returns an integer representing the duration (in milliseconds) of a session path.
    Example:
  • path_count()
    This function returns an integer representing the number of events in a path.
    Example:
  • session_start_time()
    This function returns a timestamp of the session’s first event.
  • session_end_time()
    This function returns a timestamp of the session’s last event.
  • session_event_counter()
    This function returns the number of times the desired event appears in the session path, for every session. This function does not need to run on a path query.

    This query will return the number of times the events “login”, “like”, and “create playlist” appear in the session path for every session (running on the full path column).
  • first(path_to_string(path),[number_of_events])
    This presents the first x events in a path. It enables the analyst to learn what the popular first steps are that users take in each session. The function is intended for use in SELECT clause of the query.
    Example:

    In the example above we are interested in the learning the first events users take in a session.
  • last(path_to_string(path),[number_of_events])
    This presents the last x events in a path. It enables the analyst to learn what the popular last steps are that users take in each session. The function is intended for use in the SELECT clause of the query.
    Example:

    In the example above we are interested in the learning the last events users take in a session.
  • time_between(first_event, second_event[, first_event_condition, second_event_condition])
    This function calculates the time elapsed between two events with the option to apply conditions on each event. The function is intended for use in SELECT clause of the query. Event names must be quoted. Conditions may be any valid SQL style condition.
    Example:

    In the example above we are interested in the time elapsed between Gold users registration and a purchase they make over 5 dollars.
    As any field in the SELECT clause it is allowed to apply additional functions on top of the time_between(…) function result. For example:
Print Friendly

Arithmetic Functions

Function: abs(numeric_expr)

This function returns the absolute value of the numeric expression.

Function: ceil(numeric_expr)

This function returns the numeric expression rounded up to nearest whole number.

Function: floor(numeric_expr)

This function returns the numeric expression rounded down to the nearest whole number.

Function: pow(numeric_expr1, numeric_expr2)

This function raises the first numeric expression to the power of the second numeric expression and returns the result.

Function: round(numeric_expr)

This function returns the numeric expression rounded up or down to the nearest whole number.

Function: sqrt(numeric_expr)

This function returns the square root of the numeric expression.

Print Friendly

Arithmetic Operators

Operator: numeric_expr + numeric_expr

This operator performs the addition operation and returns the sum of the two numeric expressions.

Operator: numeric_expr – numeric_expr

This operator performs the subtraction operation and returns the difference of the two numeric expressions. This operator can also be used to turn a given value into a negative value.

Operator: numeric_expr * numeric_expr

This operator performs the multiplication of two expressions and returns the product.

Operator: numeric_expr / numeric_expr

This operator performs the division of the first expression by the second expression and returns the quotient.

Operator: numeric_expr % numeric_expr

This is the modulo operator. It returns the value of the remainder of a division operation.

Print Friendly