CQL Overview

CoolaSQL, otherwise known as CQL, is at the heart of our platform’s flexibility. CQL is an extension of SQL, which means we support full SQL with proprietary extensions, as described below. The main difference between CoolaSQL and SQL is that CoolaSQL provides the ability to examine a sequence of events, whereas SQL is more set-based.

CoolaSQL supports multiple powerful proprietary clauses, functions, and special fields – allowing for easy analysis of typical behavioral patterns such as with path analysis, cohort analysis, and funnel analysis. The great value of CQL is when it comes to very detailed, tailor-made questions that need specific answers.

Best Practices

  • Included date_range in every query FROM CoolaData
    The date range function must be included in every query running from CoolaData at the same query level (in the WHERE clause directly after the FROM CoolaData).
  • Included slicers in queries FROM CoolaData
    Including the slicers function in the query will enable report and document slicers to affect the query. If omitted they will have no effect. Note that the slicers function must be located at the same query level running from CoolaData  (in the WHERE clause directly after the FROM CoolaData)
  • Use Limit to make queries run faster
    CoolaData limits the result set that can be viewed in the AD to 25,000 rows. For example, enter limit 10 to limit the number of results to 10. Limit should be the last part of the Select clause.  For larger result sets use the Query API or contact your Customer Success Manager for more options.
  • Use user_id in CQL when possible
    CoolaData generates a user_id for better performance. When applicable always use this user_id instead of customer_user_id or alternative_user_id.
Print Friendly, PDF & Email

Cohort Analysis

Cohort analysis breaks down data into groups which usually share common characteristics or actions within a defined time frame. Cohort analysis helps identify patterns in the life cycle of customers, adapt, and tailor the service to specific cohorts. A group of people who share a common characteristic or actions within a defined period form a cohort, represented by a row in the returned result set. The date range selected in the report will specify at which date the first cohort begins.

Cohort

The syntax for the cohort function is as follows:

SELECT

Mandatory fields in the SELECT clause:

  • cohort_name: a string describing the cohort:
    • For DAY & WEEK – the date (YYYY-MM-DD) of the first day in the cohort
    • For MONTH – the month (YYYY-MM) of the cohort
    • For Properties – the value of the property
  • cohort_id: the ordinal number of the cohort (starts with 0)
  • cohort_size: the number of users in the cohort
  • bucket_id: the ordinal number of the bucket (starts with 0)
  • Aggregate Function: can be the count of users in the bucket, the cohort size minus the count of users in the bucket (i.e. the rest of the users), or the sum/average/max/min/stddev of a measure (property).

All arguments but the aggregate function must appear in the GROUP BY clause.

CLUSTER COHORT BY

The grouping logic for each cohort. Can be defined by one of the following:

  • Time period defined by a number and a time resolution (days, weeks, months, quarters or years)
  • User scope properties

STARTS WITH

The first event performed:

  • event_name: performed a specific event
  • TRUE: performed any event

Any number of additional conditions for the event can be specified, i.e. where any of the event’s properties have specific values, using “AND <condition>”.

FOLLOWED BY

The concurrent event performed:

  • event_name: performed a specific event
  • NOT event_name: did not perform a specific event
  • TRUE: performed any event

Note: to find the number of users who did nothing use the TRUE argument and, in the select clause, count the number of users left in the bucket – i.e.:

Any number of additional conditions for the event can be specified, i.e. where any of the event’s properties have specific values, using “AND <condition>”.

BUCKET BY

The duration of each bucket and the rule by which to count users in each bucket:

  • FIRST: only counts the user once on the first occurrence of the event throughout his life cycle
  • ALL: the user is counted each time he performed the event
  • RECURRING: each cohort user is counted only if the user’s activity occurred consecutively in each bucket throughout the duration of the cohort.

HAVING

Used to limit the number of buckets and cohorts in the query. When clustering cohorts by property, limiting cohort_id isn’t necessary (will return all the property values, sorted by cohort size).

 

Example:

Which users bought something after registering?
Users that registered within 7-day periods, and then paid for something, by months, showing only the first instance.
This will show us on which month each user made his first payment.

 

Reverse Cohort

It is also possible to analyze a sequence of two events in the reverse order, i.e. what happened prior to a certain event.
Replace “starts with… followed by…” with “ending with… preceded by…”, as follows:

replace

with

The rest of the query would still use the same syntax.

 

Export users

You can also generate a list of users and their properties using the cohort query.
To do so, replace the SELECT clause with the user properties you wish to export and remove the GROUP BY clause.

replace

with

 

Example:

List of user ID’s and emails who bought something within the first week after registration.
Using the the same cohort query, select the user ID’s and emails instead of the cohorts and buckets. To limit the list to the first week, limit the results to buckets 0 and 1.
Note that only user scope properties can be exported (see more on property scopes here).

 

Print Friendly, PDF & Email

Path Analysis

A path is a chain of consecutive events performed by a given user or cohort. For example, a user logs on to your e-commerce platform, browses through a few product pages, creates an account, and then logs off. This path is portrayals the events performed during the user’s session.

Path analysis is used in behavioral analytics to investigate and understand user behavior and gain actionable insights.

SELECT clause

The PATH query SELECT clause can contain any User/Session Properties and/or path functions.

CLUSTER PATH BY

The following is the syntax diagram for the CLUSTER PATH BY clause:

CLUSTER PATH BY is composed of the following elements:

  • CLUSTER PATH BY session: lists the events performed during a session. If you need to drill down to data at the events scope use other CQL functions.
  • REMOVE DUPLICATES: (optional) when this element is used, repeated events are treated as one event. Example: the following path:
    [login, viewItem, viewItem, viewItem, saveItem, purchaseItem]

    will remove all consecutive viewItem events and be translated to:
    [login, viewItem, saveItem, purchaseItem]
    .
  • CONTAINS: Path contains is a non-sequential expression resulting in all the paths that contain any the events, regardless of their order. “Contains” should be used instead of the match expression, and can include multiple conditions using AND or OR operators .
  • MATCH : Describes a sequential path, comprised of steps, each defined by a condition. All steps are optional – any combination of them can be applied.
    • Path steps:
      • STARTS WITH: The first event.
      • FOLLOWED BY: Any following event. Can be used multiple times, consecutively.
      • ENDING WITH: The last event.
    • Supported conditions:
      • any: matching any event. Would replace the step’s entire condition.
      • [NOT] event_name = <event_name>: matching a specific events. A preceding NOT will negate the condition.
      • OR: matching a set of event names linked by OR, to indicate that any of these events can be applied to this step.
      • DIRECTLY: Forces the events to be consecutive, i.e. no events occurred between the preceding step and the current one. Applicable to any step except the first.
  • HAVING PATH: supports aggregative path functions (such as path_to_string, path_duration, path_count) with standard mathematical operators: >, <, =, =<, =>, !=, <>.

Path analysis can be used in two ways – sequential path, and non-sequential path:

Examples

Non-Sequential Path (Contains)
The sequence of events of users who made a purchase, and then performed a social action: Like or Share.

Sequential Path

The top 5 paths by number of session of users who started by logging in, did a search, and made a purchase.

 

Path Functions

  • sub_path(path, event1, event2, …. eventn)
    This function will show only the paths that contain all the events in the list and in this order, and remove preceding and succeeding events. All events must be included at least once within the path. Events preceding the first event in the list or succeeding the last event in the list will not be shown. The function should include the property “path” and then a list of events.
    Example:

    Original Path: login, search, view_item, add_to_cart, share, checkout, search
    Result: view_item, add_to_cart, share, checkout
  • part_path (path, event1, event2, …eventn)
    This function will show only the events included in this list of events, from paths which include all the events in the list, regardless of their order in the path.  The function should include the property “path” and then a list of events.
    Example:

    Original Path: login, search, view_item, add_to_cart, share, checkout, search
    Result: add_to_cart, checkout
  • path_to_string(path | category_path)
    This function converts the path or category path (as indicated by the reserved parameters) into a comma delimited string.
    Example:
  • path_duration()
    This function returns an integer representing the duration (in milliseconds) of a session path.
    Example:
  • path_count()
    This function returns an integer representing the number of events in a path.
    Example:
  • session_start_time()
    This function returns a timestamp of the session’s first event.
  • session_end_time()
    This function returns a timestamp of the session’s last event.
  • session_event_counter()
    This function returns the number of times the desired event appears in the session path, for every session. This function does not need to run on a path query.

    This query will return the number of times the events “login”, “like”, and “create playlist” appear in the session path for every session (running on the full path column).
  • first(path_to_string(path),[number_of_events])
    This presents the first x events in a path. It enables the analyst to learn what the popular first steps are that users take in each session. The function is intended for use in SELECT clause of the query.
    Example:

    In the example above we are interested in the learning the first events users take in a session.
  • last(path_to_string(path),[number_of_events])
    This presents the last x events in a path. It enables the analyst to learn what the popular last steps are that users take in each session. The function is intended for use in the SELECT clause of the query.
    Example:

    In the example above we are interested in the learning the last events users take in a session.
  • time_between(first_event, second_event[, first_event_condition, second_event_condition])
    This function calculates the time elapsed between two events with the option to apply conditions on each event. The function is intended for use in SELECT clause of the query. Event names must be quoted. Conditions may be any valid SQL style condition.
    Example:

    In the example above we are interested in the time elapsed between Gold users registration and a purchase they make over 5 dollars.
    As any field in the SELECT clause it is allowed to apply additional functions on top of the time_between(…) function result. For example:
Print Friendly, PDF & Email

Funnel Analysis

Funnel analysis is best used in order to understand how many users progress through the different stages towards a goal decided upon by a company. In other words, the conversion of users between the different stages.

For example, let’s say that there are 100 users that clicked on a banner and were redirected to the company’s landing page. Only 20 users filled out the contact information on the landing page, meaning that the conversion rate for moving from clicking on the banner to filling the contact information on the landing page was 20%. This process continues with registration, deposits, purchases, transactions, etc.

Funnel analysis is relevant and important for any online company, specifically online gaming and e-commerce companies. Whether you want to investigate micro-conversions (tutorial completions, social activity, etc.) or macro-conversions, funnel analysis is important in that it lets you identify which stages of user activity do and do not contribute to attaining the desired goal.

CoolaData provides an intuitive and easy to use widget for this analysis. You can find the tutorial here.

CoolaData also provides the ability to use CQL in order to enrich the analysis and deepen it.

Getting Started – Basic Syntax

There’s a specific syntax in CQL that is used in order to analyze the funnel.

Below are the relevant parameters for the SELECT” clause.

    • STEP_ID: The step (event) number from first to last. For example for the first step, the STEP_ID will equal 1, for the second step the STEP_ID will equal 2, etc.

    • STEP_NAME: The name of the step (or event). For example, “Login” or “Registration”. You can use an “OR” function to select any of multiple events as a single step. For example:

    • USER_COUNT: The number of users that have done the relevant step.

CLUSTER FUNNEL BY Clause

The “WHERE” clause has unique syntax used only in CQL:

  • CLUSTER and END CLUSTER: These will border the conditions that you will use.

  • FUNNEL BY X days: This will enable you choose the time frame by which the user will have completed all the steps in the funnel.

  • STARTS WITH “X”: This will be the first event that the user starts with.

  • FOLLOWED BY “X”: This will be the next event that the user did. This command can be used multiple times, depending on the number of ç have been done.

Step 1 – Basic Statement

Below you’ll be able to see the basic implementation of the CQL query that will analyze the funnel, with two steps, without breakdown (group by) and additional filters (where clauses):

 

This will be the output of of the CQL query above:

FUNNEL01

 Step 2 – Adding a Breakdown

Below you’ll be able to see exactly the same query as was presented in step 1, with one difference – the additional parameter “age_range”, which is marked in bold red. Adding this property to the query will break all the steps down to this property. Please note that there is no need to add the “GROUP BY” command that we’re used to in SQL.

Please see the example below:

This will be the output of of the CQL query above:

Funnel2

Step 3 – Adding a Filter

Below you’ll be able to see exactly the same query as was presented in step 2, with one difference – the additional filter “user_country”, which is marked in bold red. Please note that this filter is added to the “WHERE” clause just before the Funnel CQL unique syntax, starting with “CLUSTER” as well as being able to use regular SQL operators “AND” and “OR” as for your company’s specific needs. In this example we will receive data only for users from France, Great Britain, and Germany.

This will be the output of of the CQL query above:

Funnel3

Step 4 – Adding Duration Between Steps

Below you’ll be able to see exactly the same query as was presented in step 3, with two differences – the additional field “step_duration” and a second “Next Song” step, both marked in bold and red.

“STEP_DURATION” is a field which shows the time between steps. As opposed to “FUNNEL BY X DAYS”, which will limit the time between the first and final steps, “STEP_DURATION” will show the time between the first and second steps, the second and third step, and so on for all steps.

Note that this field will show “0.00”, because the first step doesn’t have a time difference.

Please see the example below:

This will be the output of of the CQL query above:

Funnel4

Step 5 – Adding a Filter to the Funnel’s Steps

Below you’ll be able to see exactly the same query as was presented in step 4, with a few differences – the additional “WHERE” clauses after each event of the funnel, all marked in bold red.

These “WHERE” clauses are specific for each event in the funnel and therefore can be done for as many events in the funnel that we desire.

Please see the example below:

This will be the output of of the CQL query above:

Funnel5

Step 6 – Exporting Users which Met the Funnel Criteria

Below you’ll be able to see exactly the same query as was presented in step 4, with one difference – instead of the fields chosen before, we selected “user_id”, which is marked in bold red.

What this will do is return a list of users which have completed the funnel criteria.

Please see the example below:

This will be the output of of the CQL query above:

Funnel6

Keep in Mind:

  1. It is possible to add more properties to the “SELECT” clause and it will be displayed per user.

  2. When selecting “user_id” you will be able to choose certain steps which won’t be included in the funnel. This can be done by writing “NOT FOLLOWED BY X”.
    For example:

    This query will return users where their first event is “Play song”, their second event was “next song”, and their third event wasn’t “Next song”.

A Few Extra Tips

  1. The Query Time Frame: Please keep in mind that the selected time frame (“date_range”) is also affected by time interval (“CLUSTER FUNNEL BY 7 days”) selected between the first and last event. The time frame that you choose is only for the first event.
    For example, let’s say that the time frame you chose in between 1/1/2014 and 1/2/2014 and the time interval is 7 days. The results you will receive will be until the 8/2/2014 for first events that occurred on 1/2/2014.

  2. Number of Events: The maximum number of events supported in the funnel is five.

  3. When choosing a filter, if you select a session property, be aware that all the events will only refer to the particular session property selected. For example, if you choose to filter events created only in the Chrome browser then those will be the only events shown in the funnel. This means that if in the same sessions there are also sessions created in Internet Explorer or Firefox, they won’t be shown in the funnel.
Print Friendly, PDF & Email

Querying with Segmentation

Segmentation can be used in CQL for breakdown or filtering. Filtering supports using segments (to filter by users), or any other table in your project (Aggregation Tables, Integration tables, etc.).
Read more: Creating and managing segments, Using segments in reports and dashboards

 

Breakdown by Segments

To display users from specific segments: add the following expression to the select clause. In the brackets, include the names of all the segments you want to include in the query. Add the term “other” to also include users not included in these segments. The query output will include the segment name as a data column.

Example:

 

Filter by Segments

To filter any CQL query by users in a segment: add the following expression to the where clause. Use the segment name as saved.

To filter a query by data NOT in the specific segment: precede the segment statement by “NOT”.

Filter by multiple segments: it is possible to slice a query using multiple segments connected by an “AND” logical operator. “NOT segment” can also be used, and can be combined with regular segment statements. Only results that belong to the intersection of all the segments will be shown.
For example, the following would be a valid query:

Notes:

  • “OR” cannot be used on segments: To use an “OR” condition in conjunction with segments (on condition other than the segment statement), enclose it in brackets, with “And” operator. For example:
  • Cohort and Funnel can only be segmented by “user_id”: it is only possible to segment Cohort and Funnel queries by the “user_id” property (no need to declare it specifically).
  • Cannot segment by property “event_name”: the property “event_name” cannot be used when filtering by segment.

Example:

We’ll use a segment “new_users” we created to see the activity of these users over time.

 

Filter by table

Filter by segment can also be used with external tables (other than segments).
To filter using a table: specify the table name instead of the segment name, and append “.<propery_name>” to specify the segmentation property  (default is “user_id” and doesn’t have to be stated).

To filter a query by data NOT in the specific table: precede the segment statement by “NOT”.

Example:

We’ll use an emails list we built to easily create an email campaign (using Query API):

Print Friendly, PDF & Email