Welcome to Medallia Journey Analytics!

Medallia Journey Analytics provides an end-to-end Behavioral Analytics solution that tracks, enriches and stores your events. You can then use the Medallia Journey Analytics web interface to easily generate insights from these events.

Getting Started with Behavioral Analytics

Three easy steps will get you started generating user behavior insights on your own:

  1. Plan which events to track
  2. Implement Medallia Journey Analytics
  3. Build your first Dashboard

Within a couple of hours you’ll start getting events in the Dashboard and can query them in the Medallia Journey Analytics interface or database!

What Are Behavioral Analytics?

Behavioral analytics observe the user and the user’s behavior over time and show a sequence of events performed by users. This is also called a funnel of events – which is a sequence of specific events that occur in a specific order.

Medallia Journey Analytics is a complete behavioral analytics data analysis platform that connects all your event data from all your data points. It unifies data from a multitude of external and internal sources and inspects it as a single unit, thus facilitating time‑series behavior analysis that yields behavioral insights.

Medallia Journey Analytics provides behavioral analytics widgets and behavior query language (CQL) to make advanced behavioral analysis easy.

How Do Behavioral Analytics Differ from Regular Analytics?

Basic analytics (such as in KPI reports) differ from behavioral analytics in that regular analytics typically display quantitative reports that count data. For example:

  • Count 1 – How many people were leads?
  • Count 2 – How many people received sales calls?
  • Count 3 – How many people made a purchase?

Regular analytics display discrete (distinct) values, meaning that the total quantity of Counts 1, 2 and 3, regardless of their relationship to each other. This means that Count 1 shows the total number of people who were leads, Count 2 shows the total number of people to which home sales calls were made and Count 3 shows the total number of people who made a purchase, regardless of whether they were a lead and regardless of whether a sales call was made to them.

In a Behavioral Analytics report, Count 1 is the same as in the KPI report; however, Count 2 is a subset of Count 1, meaning that it represents the quantity of people who were leads and to whom sales calls were made, and Count 3 is a subset of Count 2, meaning that it represents the people who were leads, to whom sales calls were made and who made a purchase.

Table Partitioning by Column

Cooladata’s events and sessions tables are automatically partitioned by time.  Cooladata also enables the partitioning of other external tables in your project by a timestamp or date column.

Partitioning makes sure you run only on the selected time range in the table instead of the entire table, improving performance and minimizing query run time. Partitioned tables can be created by models, aggregation tables , Google Cloud Storage or Gmail integrations and are only available for querying in Standard SQL dialect.

Table Partitioning Settings

Table partitioning requires two settings:

Partition Column

Defines the column in the table which the table is partitioned by. Supports DATE or TIMESTAMP data type.

Data written to a partitioned table is automatically directed to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.

When querying the partitioned table, a filter on this partition column will automatically direct the query to run only on the relevant partitions.

Require Partition Filter

Defines whether queries running on this table will require the partition column in the WHERE clause or not. When TRUE, queries selecting from this table without the partition column filter will fail.

Creating a partitioned Table using an Aggregation Table

Aggregation Tables automatically run scheduled calculations and data aggregations and save them in a separate, permanent table in your Cooladata project. To make this table partitioned, the Aggregation Table query should be in Standard SQL dialect (creating a partitioned table using Legacy SQL is not supported), and the following fields need to be filled in before the aggregation table’s first run:

 

Notice that after the aggregation table’s first run the table can only be queried using Standard SQL dialect.

 

Creating a partitioned Table using a Model

Models are designed to add R and Python capabilities to your Cooladata’s workflow. The results of the R or Python script is saved to a table in your Cooladata project.
To make this table partitioned, you need to fill in the following fields before the model’s first run:

 

Notice that after the model’s first run the model’s table can only be queried using Standard SQL dialect.

 

Creating a partitioned Table using an Integration

Gmail or Google Cloud Storage integrations allow you to upload an external file to a table in your Cooladata project.
To make this table partitioned, open the advanced section in the integration settings and  fill in the following fields before the integration’s first run:

 

 

 

 

 

Notice that after the integration’s first run the table can only be queried using Standard SQL dialect.

Cooladata Fall 2018 Product Update 

In this product update you will find updates on:

New Features:

  • Standard SQL
  • Partitioning in External Tables
  • Writing Models to your external destination.
  • Merge data using Integrations

Feature Improvements:

  • Simplifying working with several projects in Cooladata
  • Cohort Auto Captions
  • Column filters for Table Visualization
  • ETL auto-population by scope ignoring NULL values
  • Lookup values for virtual properties and expressions

New features:

  • Standard SQL
    Up until now, Cooladata’s SQL was compatible with Google BigQuery SQL Legacy dialect. Cooladata now enables running freehand queries over Cooladata using Standard SQL, as part of a strategic plan to migrate Cooladata to operate solely in Standard SQL, encompassing all the advantages of the language, including performance enhancements, query optimizations and superior functionality. Writing Standard SQL is now available when creating a CQL report, aggregation table, model, CQL segment or alert. For more information see our Standard SQL documentation.
  • Partitioning in External Tables
     Cooladata’s events and sessions tables are automatically partitioned by time. From now on, Cooladata also enables the partitioning of other external tables in your project by a timestamp or date column. Partitioning makes sure you run only on the selected time range in the table instead of the entire table, improving performance and minimizing query run time. Partitioning is available only when creating aggregation tables or models using Standard SQL dialect, and are only available for querying in Standard SQL dialect.
  • Merge data using Integrations
    Integrations fetching external data can now write to a table in “Append and Update strategy”, which will update existing rows (according to a unique key) and append new rows to the destination table.  This can reduce the number of rows fetched and integrated in each run to only the updated and new rows.
  • Writing Models to your external destination
    Models can now save their results to external destinations as available in aggregation tables: your Google Bucket or BigQuery dataset.

Feature Improvements:

  • Simplifying working with several projects in Cooladata
    We’ve listened to our customers’ requests and enabled filtering all the dashboards and reports shown on the lists filtered by the active project. Since you can add reports from several projects to the same dashboard,  some dashboards might be shown in a few projects. When creating filters, make sure you choose the right project and table to filter by in the filter widget.
  • Chart Visualization – Choose whether or not to auto-fit your chart
    We are aware that sometimes, it’s the big picture showing big trends that is important in a chart, whereas in other times – it’s the little details we want our data consumers to focus on. That’s exactly why we have enabled auto-fitting a single chart in the report size – so you can choose how your report will be viewed in a dashboard.

  • Cohort Auto Captions
    The cohort widget now auto-populates the cohort captions that explain to your report consumers what it is they are seeing. You can override the captions manually as you wish.

  • ETL auto-population by scope ignoring NULL values
    Sending an explicit Null for a session or user scope property will be ignored and the last non-null value sent will be auto-populated (“smeared”) going forward.
  • Column filters for Table Visualization
    Filtering a single value has never been easier. Just click on a single column header and choose the value you want to see.

  • Lookup values for virtual properties and expressions
    Lookup values are the list of values in the filter drop down. These are populated as part of the ETL, so they weren’t available for virtual properties and expressions until now. From now on, all properties can have a drop down list of values in filters.

Standard SQL Dialect

Overview

The underlying Database behind Cooladata is Google BigQuery. Cooladata has added extensions and processing methods in order to improve BigQuery’s performance, run time as well as enables easier behavioral analysis with it’s CQL functions.

Currently, BigQuery supports two SQL dialects: standard SQL and legacy SQL. There are a few differences between the two dialects, including syntax, functions, data types and semantics. BigQuery standard SQL is compliant with the SQL 2011 standard and also includes extensions that support querying nested and repeated data.

Up to August 2018, Cooladata enabled running SQL scripts using Legacy SQL only. Cooladata now enables running freehand queries over Cooladata using Standard SQL, as part of a strategic plan to migrate Cooladata to operate solely in Standard SQL, encompassing all the advantages of the language, including performance enhancements, query optimizations and superior functionality.

Note! Standard SQL in Cooladata is still in beta. Please contact us with any issues or questions.

Using Standard SQL in Queries

Standard SQL is currently only supported in freehand queries, in the following features:

  • CQL report
  • Aggregation tables
  • R/Python Reports
  • Models
  • CQL Segments
  • CQL alert builders

Note! Using the CQL behavioral extensions (such as the Cohort, Funnel and Path functions) is not yet supported when using Standard Dialect.

To use Standard SQL, turn on the Standard SQL toggle in the top right hand side of the query editor:

 

 

Make sure you use Standard SQL syntax when writing the query. We’ve gathered a few syntax differences for you to start with. For more information see BigQuery documentation or contact support@cooladata.com:

Casting and Conversion Functions

In Legacy SQL, conversion and casting was done using <data type> (<expression>)

For example (Legacy):

SELECT float(1), float('87.35'), float('david') 
|
returns: 1.0, 87.35, null

In Standard SQL, conversion and casting is done by calling the CAST or SAFE_CAST functions.

For example (Standard):

SELECT CAST(1 AS FLOAT), CAST('87.35' AS FLOAT), CAST('david' as FLOAT)
|
returns: 1, 87.35, error

When using CAST, a query can fail if BigQuery is unable to perform the cast. For example, CAST('david' as FLOAT)returned an error.

When using SAFE_CAST, returns NULL instead of raising an error:

SELECT SAFE_CAST(1 AS FLOAT), SAFE_CAST('87.35' AS FLOAT), SAFE_CAST('david' as FLOAT)
|
returns: 1, 87.35, NULL

 

Date and Time Functions

All Cooladata date_range functions, such as last n days, current month, previous quarter, etc. are supported in Standard SQL as well.

The main difference is in the extraction and date functions:

EXTRACT

The EXTRACT functions returns the value corresponding to the specified date part of a date or timestamp expression.

In Legacy SQL, extracting a part of a date would work like so: part(timestamp/date expression) would return the month number of the timestamp.

For example (Legacy):

SELECT MONTH(event_time_ts) --when event_time_ts is: 2018-10-03 00:01
|
returns: 10

In Standard SQL, extracting a part of a date needs to use the EXTRACT function: EXTRACT(part FROM date_expression)

For example (Standard):

SELECT EXTRACT(MONTH FROM event_time_ts) --when event_time_ts is: 2018-10-03 00:01
|
returns: 10

DATE_ADD

The DATE_ADD function adds a specified time interval to a DATE or TIMESTAMP.

In Legacy SQL, DATE_ADD worked in the following syntax: DATE_ADD(timestamp or date expression,interval, interval_units)

For example (Legacy):

SELECT DATE_ADD(TIMESTAMP('2018-10-01 02:03'), 5, 'DAY')
|
returns: 2018-10-06 02:03

In Standard SQL, DATE_ADD only works with a date_expression (see TIMESTAMP_ADD for timestamp expression), using in the following syntax: DATE_DIFF(date_expression, date_expression, date_part)

For example (Standard):

SELECT DATE_ADD(DATE('2018-10-01'), INTERVAL 5 DAY)
|
returns:  2018-10-06 00:00

To add a time interval to a timestamp expression, use TIMESTAMP_ADD

For example (Standard):

SELECT TIMESTAMP_ADD(TIMESTAMP('2018-10-01 02:03:00'), INTERVAL 5 DAY)
|
returns: 2018-10-06 02:03

See the BigQuery reference for the supported interval units.

DATE_DIFF

The DATE_DIFF function returns the difference between two specified dates or timestamps.

In Legacy SQL, the function is called: DATEDIFF (no underscore) with the following syntax:  DATEDIFF(<timestamp1>,<timestamp2>) and returns the number of days between two TIMESTAMP data types. The result is positive if the first TIMESTAMP data type comes after the second TIMESTAMP data type, and otherwise the result is negative.

For example (Legacy):

SELECT DATEDIFF(TIMESTAMP('2012-10-02 05:23:48'), TIMESTAMP('2011-06-24 12:18:35'))
|
returns: 466

In Standard SQL, as with the DATE_ADD function, DATE_DIFF is only supported for DATE inputs. The function is used with the following syntax: DATE_DIFF(date_expression, date_expression, date_part). Returns the number of date_part boundaries between the two date_expressions. If the first date occurs before the second date, then the result is non-positive.

For example (Standard):

SELECT  DATE_DIFF(DATE('2012-10-02 05:23:48'), DATE('2011-06-24 12:18:35'), DAY)
|
returns:  466

To return the difference between 2 timestamp expressions use TIMESTAMP_DIFF. Returns the number of whole specified date_part intervals between two timestamps. The first timestamp_expression represents the later date; if the first timestamp_expression is earlier than the second timestamp_expression, the output is negative. Throws an error if the computation overflows the result type, such as if the difference in microseconds between the two timestamps would overflow an INT64 value.

For example (Standard):

SELECT TIMESTAMP_DIFF(TIMESTAMP('2012-10-02 05:23:48'), TIMESTAMP('2011-06-24 12:18:35'), DAY)
|
returns:  465

 

Count Functions

Both legacy SQL and standard SQL contain a COUNT function. However, each function behaves differently, depending on the SQL dialect you use.

In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard SQL, it returns an exact count. For an approximate count of distinct values that runs faster and requires fewer resources, use APPROX_COUNT_DISTINCT.

NOT IN conditions and NULL Values

Legacy SQL does not comply with the SQL standard in its handling of NULL with NOT IN conditions, whereas standard SQL does.

UNNEST function

The UNNEST function takes an ARRAY and returns a table, with one row for each element in the ARRAY. You can also use UNNEST outside of the FROM clause with the IN operator.

For example:

SELECT * FROM UNNEST ([1, 2, 3]) as unnest_alias;

Returns:

Unnest_alias

1

2

3

For input ARRAYs of most element types, the output of UNNEST generally has one column. This single column has an optional alias, which you can use to refer to the column elsewhere in the query. ARRAYS with these element types return multiple STRUCT columns.

OFFSET clause

OFFSET specifies a non-negative skip row of type INT64, and only rows from that offset in the table will be considered.

This clause accept only literal or parameter values.

 

Aggregate Functions

An aggregate function is a function that performs a calculation on a set of values.

ARRAY_AGG

ARRAY_AGG function returns an ARRAY of column values, this function supports all types except ARRAY.

Syntex:

ARRAY_AGG([DISTINCT] column [{IGNORE|RESPECT} NULLS]

         [ORDER BY key [{ASC|DESC}] [, ... ]]  [LIMIT n]) [OVER (...)]

For example:

SELECT FORMAT("%T", ARRAY_AGG(DISTINCT x)) AS array_agg

FROM UNNEST([NULL, 1, -2, 3, -2, 1, NULL]) AS x

Returns:  1, -2, 3, NULL]

STRING_AGG

STRING_AGG function returns a value (either STRING or BYTES) obtained by concatenating non-null values.

If a delimiter is specified, concatenated values are separated by that delimiter; otherwise, a comma is used as a delimiter.

Syntex:

STRING_AGG([DISTINCT] expression [, delimiter]  [ORDER BY key [{ASC|DESC}] [, ... ]] [LIMIT n]) [OVER (...)]

For example:

SELECT STRING_AGG(fruit, " & ") AS string_agg

FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit

Returns:  apple & pear & banana & pear

ARRAY_CONCAT_AGG

ARRAY_CONCAT_AGG function concatenates elements from an expression of type ARRAY, returning a single ARRAY as a result. This function ignores NULL input arrays, but respects the NULL elements in non-NULL input arrays.

Syntex:

ARRAY_CONCAT_AGG(expression  [ORDER BY key [{ASC|DESC}] [, ... ]]  [LIMIT n])

For example:

SELECT FORMAT("%T", ARRAY_CONCAT_AGG(x)) AS array_concat_agg FROM (

 SELECT [NULL, 1, 2, 3, 4] AS x

 UNION ALL SELECT [5, 6])

Returns: [NULL, 1, 2, 3, 4, 5, 6]

Array Functions

ARRAY

The ARRAY function returns an ARRAY with one element for each row in a subquery.

For example:

SELECT ARRAY

 (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS new_array

Returns: [1, 2, 3]

ARRAY_CONCAT

ARRAY_CONCAT function concatenates one or more arrays with the same element type into a single array.

For example:

SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six;

Returns:[1, 2, 3, 4, 5, 6]

ARRAY_TO_STRING

ARRAY_TO_STRING function returns a concatenation of the elements in array_expression as a STRING. The value for array_expression can either be an array of STRING or BYTES data types.

For example:

SELECT ARRAY_TO_STRING(["apples", "bananas", "pears", "grapes"] , '--')

Returns: apples--bananas--pears--grapes

ARRAY_LENGTH

ARRAY_LENGTH function returns the size of the array. Returns 0 for an empty array. Returns NULL if the array is NULL.

For example:

SELECT ARRAY_LENGTH(["apples", "bananas", "pears", "grapes"])

Returns:  4

ARRAY_REVERSE

ARRAY_REVERSE function returns the input ARRAY with elements in reverse order.

For example:

SELECT ARRAY_REVERSE(["apples", "bananas", "pears"])

Returns:  ["pears", "bananas", "apples"]

OFFSET and ORDINAL

Accesses an ARRAY element by position and returns the element. OFFSET means that the numbering starts at zero, ORDINAL means that the numbering starts at one.

For example:

SELECT list, list[OFFSET(1)] as offset, list[ORDINAL(1)] as ordinal

Returns:

| list                              | offset | ordinal |

| [apples, bananas, pears, grapes] | bananas | apples  |

SAFE_OFFSET and SAFE_ORDINAL

SAFE functions identical to OFFSET and ORDINAL, except returns NULL if the index is out of range.

For example:

SELECT list, list[OFFSET(3)] as offset, list[ORDINAL(3)] as ordinal

Returns:

| list                  | offset | ordinal |

| [coffee, tea, milk]   | NULL | milk |

SELECT * EXCEPT / REPLACE

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.

For example:

SELECT * EXCEPT (order_id) →  returns all the column except             ‘order_id’ column.

A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement. In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.

For example:

SELECT * REPLACE ("widget" AS item_name) → the ‘item_name’ column returns "widget" value. 

Filtering selected tables using _TABLE_SUFFIX

To restrict the query so that it scans an arbitrary set of tables, use the _TABLE_SUFFIX pseudo column in the WHERE clause. The _TABLE_SUFFIX pseudo column contains the values matched by the table wildcard.

For example:

The followed query scans only the tables customer_0 and customer_4.

SELECT *

FROM `external.mySqlData.customer_*`

WHERE ( _TABLE_SUFFIX = '0'

   AND _TABLE_SUFFIX = '4' )

 

Using Expressions in Standard SQL

Since the Virtual Properties of type expression use SQL syntax when called in a query, the expression needs to match the query dialect type. To support Standard SQL dialect when using expressions, we have provided a way to enter the Standard SQL expression in the expression configuration.

To do so, in the expression configuration click on the “Advanced” section – there you can configure the Standard SQL expression. Click on “Copy legacy to standard expression” to copy the Legacy expression to the Standard one.

When using a Standard SQL query, the system will use the Standard SQL expression and if empty, it will try to use the Legacy one (in most functions the syntax remains the same between the two dialects).

 

Using Standard SQL in Query API

When querying Cooladata using API requests or any other way outside the UI (like when using our JDBC connection) – you can turn on standard SQL using one of the following methods:

  1. Using the SQL annotation: #standardSQL in the beginning of the query:
    #standardSQL
    SELECT  DATE_DIFF(DATE('2012-10-02 05:23:48'), DATE('2011-06-24 12:18:35'), DAY)
    |
    returns:  466
  2. Adding the standard parameter as part of the queryAPI request:
    https://app.cooladata.com/api/v2/projects/<project id>/cql?standard=true

Take control of your Data – How Cooladata lets you Manipulate data in Multiple ways

As you know, Cooladata lets you control which data to collect from numerous sources. You decide what you want track and Cooladata warehouses it for all your analytical purposes.

Once you decided which sources you want to collect, cooladata normalizes the data and makes it very easy to execute queries no matter how complex they are, to summarize or aggregate data for reporting and serving to other people or other platforms. Since accuracy is essential, Cooladata manages the data at the raw event granularity.

Cooladata ensures that your data is validated against our self learning schema to ensure it is correctly formatted and processed successfully. On top of this validation, you can control how your data is processed, transformed and enriched before it is stored, or manipulate it after the storage is done for easier querying and reporting.

Recently, we’ve added two very useful features to the data manipulation “tool set” Cooladata offers their customers as part of every project:

  1. Custom Manipulations as part of the ETL
    This new feature allows you to customize manipulations on the data before it is stored and requires custom set-up with your Cooladata Customer Success Manager.

    – Blocking or invalidating events: Every data scientist knows: “garbage in – garbage out”. To avoid entering messy data to your project – you could block or invalidate events that match a certain condition as part of your custom project set-up. A common use case is blocking out bots or internal QA users.

    – Changing events before they are stored: manipulating events as part of the ETL can have several use case, such as: hashing personal information, changing data types, extracting specific parameters from long strings or arrays into designated columns, etc. The biggest advantage of this type of manipulation is that it’s done before the data is even stored so it doesn’t require any further manipulation and helps a lot with achieving consistency of data.

  2. Sending retroactive events back to the ETL
    Cooladata allows sending events to the ETL to be stored in the “cooladata” events table with the rest of your events. This is a very common case for events that can be uploaded from an external DB, historical data storage, the invalids table or even a service API. This task can be scheduled to run automatically like any other task in Cooladata and can even be set up as a step in a Job. Notice these events are out of context so the automatic sessionization done by our ETL might be affected. To avoid that you can turn off sessionization for these events.

These above mentioned  features are already deployed and are ready to be exploited. Just to serve as a reminder, here is the full list of data manipulation tools already available:

1. ETL – out of the box data manipulations:

  • Server – side Sessionization: unifies sessions from all platforms into one single session, based on the user ID and the custom session definitions defined per project
  • Enrichment : automatic breakdown of user agent and IP into device details and geo location.
  • User identities matching: unifying between several identities of the same user. The most common use case are anonymous and registered users that login in the middle of the session. Cooladata unifies the 2 users into one identity and stores the data accordingly. You can then query the entire journey of the user, from before he registered to after, understanding the complete picture.
  • Auto-population based on scope: each column is auto-populated based on the scope of the property – for instance: A FIRST USER scope property stores the first values received for this user in all of that user’s events, whereas LAST SESSION scope property stores the last non-null value in all the session’s events received after this value. This saves the analyst the effort of joining multiple events and tables. Each user also receives automatic property of create date, to easily extract the first timestamp the user was seen in the app and it’s first session is always marked as is_new=1.

2. Virtual Properties – Some properties require dynamic manipulations that are applied ad hoc while querying the data. This feature allows you to store SQL expressions as virtual properties, and select these properties in your reports or even filter by them in dashboards. The expression can contain a formula that hides the complex processing of a property. A common use case is total order amount calculations based on several columns or unifications of two columns using the IFNULL function.

3. Aggregation Tables – Aggregation Tables automatically run scheduled calculations and data aggregations and save them in a separate, permanent table. This can be used to compute and store aggregated data in order to enhance the performance of queries running on large scales of data. Most customer store aggregated tables based on several data sources and multiple joined tables. Querying these tables are easier then writing the complex joins and SQLs in each query.

4. Models- Models are designed to add R and Python capabilities to your Cooladata’s workflow. Models enable writing tasks based on R or Python scripts. This allows you to encompass the capabilities of these languages to manipulate your data and save the data frames created by those models into tables in your project.

Both the Aggregation Tables and Models, as well as the new retroactive event sender task, can be scheduled as steps in our Jobs to make up a repeating data manipulation workflow.