Table Partitioning by Column

Cooladata’s events and sessions tables are automatically partitioned by time.  Cooladata also enables the partitioning of other external tables in your project by a timestamp or date column.

Partitioning makes sure you run only on the selected time range in the table instead of the entire table, improving performance and minimizing query run time. Partitioned tables can be created by models, aggregation tables , Google Cloud Storage or Gmail integrations and are only available for querying in Standard SQL dialect.

Table Partitioning Settings

Table partitioning requires two settings:

Partition Column

Defines the column in the table which the table is partitioned by. Supports DATE or TIMESTAMP data type.

Data written to a partitioned table is automatically directed to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.

When querying the partitioned table, a filter on this partition column will automatically direct the query to run only on the relevant partitions.

Require Partition Filter

Defines whether queries running on this table will require the partition column in the WHERE clause or not. When TRUE, queries selecting from this table without the partition column filter will fail.

Creating a partitioned Table using an Aggregation Table

Aggregation Tables automatically run scheduled calculations and data aggregations and save them in a separate, permanent table in your Cooladata project. To make this table partitioned, the Aggregation Table query should be in Standard SQL dialect (creating a partitioned table using Legacy SQL is not supported), and the following fields need to be filled in before the aggregation table’s first run:

 

Notice that after the aggregation table’s first run the table can only be queried using Standard SQL dialect.

 

Creating a partitioned Table using a Model

Models are designed to add R and Python capabilities to your Cooladata’s workflow. The results of the R or Python script is saved to a table in your Cooladata project.
To make this table partitioned, you need to fill in the following fields before the model’s first run:

 

Notice that after the model’s first run the model’s table can only be queried using Standard SQL dialect.

 

Creating a partitioned Table using an Integration

Gmail or Google Cloud Storage integrations allow you to upload an external file to a table in your Cooladata project.
To make this table partitioned, open the advanced section in the integration settings and  fill in the following fields before the integration’s first run:

 

 

 

 

 

Notice that after the integration’s first run the table can only be queried using Standard SQL dialect.

Print Friendly, PDF & Email

Sending Data

Is there any size limit for each event I send?

Event size should not exceed 100KB.

I’ve noticed events on ‘Live Events’ view but cannot see them when I query.

There are two reasons that this is likely to happen:

  1. The events haven’t been loaded to the data warehouse yet (Google BigQuery), and thus cannot be queried. In this case, wait for about an hour until the data loading is completed
  2. Some of the events’ properties were not sent in the correct format (either wrong type, missing etc.) and were consequently sent as invalids. You can query all invalids up to the previous 7 days, using the following query:

Why am I getting the following error, when I am sure this property exists in my project: “Field ‘session_duration’ not found in project, Coolalog no:5522517”?

This error occurs when trying to pull data from two different partitioned tables: events table, and session table. To solve it, join both tables using mutual property appears in both, e.g. user_id.
For instance, the following query will produce this error as event_name is a user scope property (saved in events tables), whilst session_duration is a session property (saved in sessions tables):

Does Cooladata support multiple customer identities?

A user can start as anonymous user (hash key generated automatically), and then become a registered user. We support one old identity per user.
Once the user is sending both identities within the same event, we know how to convert it to the new identity.

Is session_id mandatory?

No

Which columns are automatically generated?

based on session_ip we are generating ip_country, ip_region, ip_city, ip_longtitue, ip_latitude.
based on DUA (device user agent) we are generating brand and model.
based on timestamp, we are generating multiple columns (hour, day, month, year, week…)

See our common properties documentation for more details.

Is the API Token the same as the App Key for using SDK implementation?

The AppKey is used to send events. It is the same regardless of how you send the events (REST, JavaScript etc.). However, the Query API is your user token and is used for querying the system. This can be retrieved by logging into Cooladata and clicking on the avatar in the top right-hand side of the screen.

How can I differentiate test data and organic data sent from my users?

There are several ways of differentiating between test data and real data:
A different project
A property
Both above methods require code intervention (in your app) to distinguish between real and test data.
In addition, if you have a distinct (and not too large) set of either devices or users that are generating test data, you can build segments that reflect those “test users”/”test devices” and filter them out in the dashboard slicers.

Handling Personally Identifiable Information (PII)

Cooladata takes the utmost precautions to ensure the security of your data in the cloud and continually upgrades with the latest security options.

Cooladata accepts any event properties that you send without filtering them. However, even so, we advise you not to send sensitive personal information (such as credit card numbers) that may help a malicious entity identify someone.

Here are a few tips for protecting personal information:

  • Conceal personally identifiable information. For example, by scrambling, cloaking, encrypting, faking or hashing it.
  • Send a person’s location, instead of their IP address.
  • Send only partial information, such as a person’s country instead of their IP address.
  • Do not send combinations of information that may help someone piece together who the person is, such as session IP, address, gender and age.
  • Read here for the feature we have released in order to help our customers comply with GDPR and other PII regulations.
Print Friendly, PDF & Email

Common reasons for data discrepancies

Companies often use several tools to understand their user’s behaviour. Alongside Cooladata, most companies also use Google Analytics or their own DB for comparison. 

We suggest investigating discrepancies if there is more than a 5% difference between Cooladata and other tools. Any less is likely not material enough to warrant a full tracking audit, since often analytics are used to identify trends (e.g. how fast are we growing?), rather than exact numbers. If the difference is greater than 5% across all events, or specific events don’t match between systems, then further investigation is called for.

Check this guide to identify common reasons for data discrepancies between various systems:

Timezone

Cooladata’s default timezone is UTC. When comparing data between Cooladata and other system such as Google Analytics, consider the time zone differences.

Double-check your query

When weird numbers appear in a report, it’s not always a data issue. Sometimes the report is just not querying what we intend it to query. Make sure you are querying the correct date range, and no filters are applied, filtering out relevant data.

Invalids

Cooladata automatically validates the data sent to Cooladata in order to prevent Garbage In – Garbage Out situations. Events marked as invalid are not stored with the rest of the valid events, but in a separate, designated table for invalid events, which you can query to check what went wrong. See Handling Invalid Events to learn more.

Data Sampling

Google Analytics sometimes use sampled data in reports, causing discrepancies in the numbers from Cooladata. Google Analytics sampling occurs automatically when more than 500K sessions are collected for a report. Google Analytics state that a report is based on sampling in text above the report. When comparing Cooladata to a sampled Google Analytics report, discrepancies are expectable.

Session Definition

A session in Cooladata starts when someone visits your site or app, sending an event, and ends after thirty minutes of inactivity. The session duration is calculated as the difference between the first and last event in that session. This thirty minutes timeframe is a configurable parameter. If you are the project’s admin, you can see this parameter under ‘Session timeout“ in your project settings page. Most analytics tools, such as Google Analytics use this thirty minute definition, which might cause discrepancies when comparing session duration or number of sessions if you set this parameter to be different than thirty minutes in Cooladata. Also, consider that Google Analytics will count additional sessions for clicks on AdWords campaigns, and will hard stop all sessions at midnight, whereas in Cooladata sessions occurring across midnight (starting before midnight and ending afterwards) would be stored as one session. Other mobile analytics tools platforms also end sessions if the user moved the app to the background for more than a minute.

Events are sent differently

A common cause for discrepancy is the way the events are sent to Cooladata and other tools. For instance, if one tool is receiving events from the server-side and the other from the client side, differences in numbers will most likely occur.

Even if both tools are sent from the client side, the code needs to be checked. Sometimes, there is a logical condition for sending an event to one tool which is not the same as the code sending the event to the other tool. When using JS SDK, the location of the trackEvent code is important. If the call sending the event to one tool is at the top of the code and the call sending the event to another tool is at the bottom of the code, there might be some discrepancies, due to an error in the code or if the user manually closed the window before the trackEvent function was called.

Bots and Test Users

Some tools automatically filter out events created by bots, Cooladata does not. Cooladata does have several solutions available if you wish to slice out bad IP’s or bots. To find the best solution for you, contact your Customer Support Manager.   

If your operational DB automatically cleans test user’s activity, make sure you filter out test users in Cooladata as well.

Funnel and Conversion Definitions

Funnels in Cooladata count distinct (unique) users who completed the funnel in the date range in question, in the time window set in the report. Conversions are sometimes defined differently in other tools. For instance, Google Analytics count the number of sessions in which the funnel’s steps were completed.  

Notice that if you choose to set the funnel in Cooladata to show users who completed the funnel by X days, the funnel will only include users who did the first event X days before the end of the report date range when looking at the last week or month. This is done in order to give users who came in at the beginning of the date range, the same “chance” to complete the funnel as users who came in at the end of the date range.

Redirects and Self Referrals

When looking at Cooladata’s “referring_url” and “referring_domain” you should see the url and domain the user was referred from. Sometimes, you see url’s and domain’s that you do not expect, such as your own url. This happens when your site uses redirecting rules, usually set up by your site admin.

Cooladata Sessions Table

Cooladata stores your data in two separate tables: one is your event table, in which every row is an event, and the other is your sessions table, in which every row is a session. The event table holds all the event-level data and event-scope properties, as well as user and session scope properties. The sessions table holds session-specific properties (such as session duration and the session path) as well as session and user scope properties. In order to optimize performance, Cooladata automatically shifts your queries to run on top of the sessions table if all the data you are searching for is there. For instance, Daily Active Users (DAU), counting number of unique users per day will run over the sessions table, but if you want to count Daily Active Payers (pDAU), you will have to run over the events table in order to add a filter for the payment event. Shifting between the sessions and events table might cause for slight differences due to the fact that when running over the sessions table, the date range is filtered according to the session start time (session_start_time_ts) whereas when running over the events table the date range is filtered according to the event (event_time_ts)  timestamp.

Print Friendly, PDF & Email

Opting out users

*Please note that this documentation is for a BETA feature. Please contact support@cooladata.com or your CSM for more information.

The opt-out features allows you to mark a user as an “opted-out” user. This will block any future events of this user from being processed or stored.

The GDPR requires you, as a data controller to inform your users (“data subjects”) with a clear explanation of how, where and by whom their data will be processed in accordance with the requirements of Articles 13 and 14 of the GDPR. Controller are also required to receive their consent for the collection and processing of any data they will share through your services unless the controller can rely on another legal basis (the list of permitted legal bases is described in Articles 6 and 9 of the GDPR). This question, or request for consent, is your responsibility as a data controller to invoke, and we imagine will look different across your different websites or your mobile apps. We encourage our customers to carefully assess which legal basis they rely on for each processing operation and display the relevant information required by the GDPR. If you receive an objection from your user, this feature should be used in order to stop tracking its activities.

Opting out a user will be done either by the SDK or a rest API request to Cooladata, by sending an event saying that the user has opted out of data collection. Our system will flag that user and will block all future events or sessions of that user. The opt-out operation will be carried out using customer_user_id. Since Cooladata allows you to link between multiple identities of the same user,  opting out will also work for all the alternative identities of a single user, and not just the records received with the user id that asked for erasure or opting out. Notice that opting-out will only block events sent to Cooladata, but uploading user data from external data source or integrations will not be affected, and it is your responsibility to ensure deletion of this user from these data updates. 

Opt-out event

The structure of the opt-out event will be like so:

{“event_name”:”cooladata_opt_out”,”user_id”:”<user_id>”}

This event can be sent from either the SDK by calling the track_event function (see the SDK documentation) or by sending a REST API call like so:

 

Investigating the Opt-out event

The actual opt-out event will be stored in cooladata, but the events following the opt-out event will not be stored. This means that sessions will be cut off and the “opt-out” event will be the last event stored in those sessions.
You can run the following query to investigate users that requested to be opt-out in the last 7 days:

 

 

Print Friendly, PDF & Email

The Cooladata Guidelines for GDPR Preparations

The GDPR became fully enforceable on May 25, 2018, and has set a high bar for global privacy rights and compliance. We at Cooladata modified our system to help you meet the requirements of the GDPR. This guide is intended to set out the GDPR guidelines for our customers as well as inform our customers of the changes that are done in Cooladata to support their GDPR compliance, which includes:

  • Support for opting out users
  • API for deleting user historical data and properties
  • Anonymizing personal information data as part of the ETL

Please note that this guide is for informational purposes only, and should not be relied upon as legal advice. We encourage you to work with legal and other professional counsel to determine precisely how the GDPR might apply to your organization.

What is GDPR?

The EU General Data Protection Regulation (GDPR) has replaces the Data Protection Directive, was designed to harmonize data privacy laws across Europe and to protect and empower all GDPR-protected individuals’ data privacy and to reshape the way organizations across the region approach data privacy. The GDPR will not only apply to companies that process the personal data of protected individuals and have a presence in the EU (e.g. offices or establishments) but also to companies that do not have any presence in the EU but offer goods or services to individuals in the EU and/or monitor the behavior of European individuals where their behavior takes place within the EU.

The GDPR regulates the “processing” of personal data of any protected individual (who is referred to as a “data subject”). This “Processing” includes the collection, storage, transfer, or use, of personal data. Any company that processes the personal data of any data subject, regardless of where the company is based, may be subject to the GDPR and its rules. We encourage our customers to seek legal advice to determine whether the GDPR applies to their specific processing operations or not.

 

What is “personal data”?

According to the GDPR, personal data is any information relating to an identified or identifiable individual; meaning, information that could be used, on its own or when joined with other data, to identify an individual. Personal data will now include not only data that is commonly considered to be personal in nature (such as email addresses, social security numbers, and physical addresses), but also data such as IP addresses, behavioral data, location data and much more. It’s also important to note that even personal data that has been “pseudonymized” can be considered personal data if the pseudonym can be linked to any particular individual or the pseudonymization is reversible. For these reasons, for the additional information (such as the decryption key) to be kept separately from the pseudonymised data.

More information on what is considered personal data can be found on the GDPR-dedicated website: https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en

Under the new GDPR, protected individuals will have several important rights, including the right to be forgotten, the right of access, and the right of portability. If you are processing the personal data of protected individuals, you must ensure that you can accommodate these rights:

  • Right to be forgotten:  the data subject is entitled to have the data controller erase his/her personal data, cease further dissemination of the data, and potentially have third parties halt processing of the data (opt-out).
  • Right of access: the right for data subjects to obtain from the data controller confirmation as to whether or not personal data concerning them is being processed, where and for what purpose. Further, the controller shall provide a copy of the personal data, free of charge, in an electronic format.
  • Right of portability: the right for a data subject to receive the personal data concerning them, which they have previously provided in a ‘commonly use and machine readable format’ and have the right to transmit that data to another controller.

So how does it affect Cooladata and their customers?

Between Cooladata and our customers, Cooladata is the “data processor” and the customer is the “data controller”. A data controller is the entity that determines the purposes, conditions and means of the processing of personal data, while the processor is an entity which processes personal data on behalf of, and for the benefit of, the controller. According to GDPR, it is the responsibility and the liability of the data controller to implement effective measures and be able to demonstrate GDPR compliance.

The technical How To’s:

The GDPR requires you, as a data controller to inform your users (“data subjects”) with a clear explanation of how, where and by whom their data will be processed in accordance with the requirements of Articles 13 and 14 of the GDPR. Controllers are also required to receive their consent for the collection and processing of any data they will share through your services unless the controller can rely on another legal basis (the list of permitted legal bases is described in Articles 6 and 9 of the GDPR). This question, or request for consent, is your responsibility as a data controller to invoke, and we imagine will look different across your different websites or your mobile apps. We encourage our customers to carefully assess which legal basis they rely on for each processing operation and display the relevant information required by the GDPR.

If you receive an objection from your user, Cooladata enables you to opt-out a user in order to stop tracking their activities. Opting out a user will be done either by the SDK or a rest API request to Cooladata, by sending an event saying that the user has opted out of data collection. Our system will flag that user and will block all future events or sessions of that user.

Since the GDPR also entitles the data subject the “right of erasure”, we introduced a Delete API that will allow you to delete any user properties or historical events and sessions of this user. Keep in mind that if you do not opt-out the user and send any events for that user in the future, the data will be stored. In addition, since Cooladata does not control your Aggregation Tables, external data sources or data uploaded through our integrations, it is your responsibility to delete that user from these external tables.

The API will let you know the status of your request for deletion (whether it’s in progress or done) and the amount of events and sessions that are to be deleted for this user once the erasure is completed.

Both opt-out and the delete API will be carried out using customer_user_id. Since Cooladata allows you to link between multiple identities of the same user, deleting and opting out will also work for all the alternative identities of a single user, and not just the records received with the user id that asked for erasure or opting out.

Also, since every query in Cooladata is calculated based on raw event-level data, once these tasks are completed, the data retrieved from queries on top of Cooladata will change and will not include these events and sessions, even when querying aggregated values.

The users’ “right for access” as well the “right of portability” can be carried out using our Query API. You are in charge of writing the relevant query that retrieves the relevant data you want the user to access.  If you need any help writing these queries, please contact your CSM.

Cooladata also enables you to anonymize personal data in your project.  Since you control what data you send to Cooladata and what is stored, you are responsible for defining what is personal information and what is not.
If some properties in your project contain personal user information, you might want to hash that information based on a certain condition (for instance if a user asked to be anonymized or if that user is based in the EU). 

Notice that we automatically collect IPs so if you need the IPs we have collected to be hashed before they are stored you are responsible to use the above mentioned functionality to set this up.  Hashing IP’s will not affect the geolocation enrichment we provide out-of-the box.

In terms of communication with Cooladata, please make sure all the data sent over or retrieved is done over HTTPs. Cooladata supports sending events through both HTTP and HTTPs so make sure you use the latter to ensure the communication is encrypted.

As for data storage and security, Cooladata uses third party tools for processing and storing your data based on Amazon and Google Cloud Services. Both providers have announced that they will be GDPR compliant. There is no explicit GDPR requirement that personal data must stay in the EU as long as there is a legal framework in place to validate the data transfer; the GDPR recognizes several frameworks including the Privacy Shield.

As for data transfers, please note the following:

  • Amazon Web Services and Google: Both AWS and Google have already announced that will comply with the GDPR and they are also registered with the EU-US Privacy Shield (see: https://www.privacyshield.gov/list).
  • Our staff: Our staff sits in Israel, which was declared by the European Commission as a country that offers adequate level of data protection (see: https://ec.europa.eu/info/law/law-topic/data-protection/data-transfers-outside-eu/adequacy-protection-personal-data-non-eu-countries_en)
  • Other sub-processors, vendors and partners: We only share personal data that is subject to the GDPR with vendors and partners who, like Amazon Web Services or Google, have announced that will comply with the GDPR and have undertaken to do so.

 

Finally, let’s talk legals:

Make sure your Terms of Service or Privacy Policy or other relevant documents properly communicate to your users how you are using Cooladata (and any other similar services) on your website or app.

We have drafted a Data Processing Agreement in accordance with Article 28 of the GDPR in order to enter an agreement with our customers who are subject to the GDPR. We request that you please download it, sign it and return it to your account manager.

More information

Additional information regarding the GDPR can be found at the European Commission’s website, found here: https://ec.europa.eu/info/law/law-topic/data-protection_en

Furthermore, if you have any questions about the GDPR vis-à-vis your relationship with Cooladata, you are welcome to contact us at support@cooladata.com.

Note: This guide might be subject to minor changes in accordance to Cooladata’s ongoing product modifications.

Last update: 2018-05-29

Print Friendly, PDF & Email