Models are designed to add R and Python capabilities to your Cooladata’s workflow.

From the models page, you can create tasks based on R or Python scripts. This opens up a new dimension of exploration and analysis of your data, encompassing all capabilities of these languages.

Models are similar to Aggregation Tables:

  • They can run on any data source in you project, including events, tables, and linked data sources.
  • They can be scheduled to run automatically on a set frequency.
  • The model generates results that are stored in a table, which can then be queried from anywhere, including reports, publications and Query API.

 

Creating/Editing a Model

  1. From the top menu, click Models.
  2. A list of your saved Models is displayed, or if you haven’t created any yet, an Add Model button is shown.
  3. To create a new Model, click the + button at the top right or the Add Model button, if your list is empty. This opens the Models gallery:
  4. From here, choose the type of builder you wish to create. The various builder types are explained below.
  5. Click any task (row) in the Models list to open it for editing, or click the options icon at the right end of the row to see the task history, run it, delete it, or see related Jobs.

 

Model Types

Cooladata provides builders for custom or predefined models. Custom model builders allow you to use any query and script you wish.
Predefined models (coming soon) will allow you to apply advanced machine learning algorithms to your data with just a few clicks and without writing any script.

 

Custom Models – R Script, Python Script

In the new Model gallery, choose either R or Python script to start writing your own custom models.

  • In the Model editor page, enter the following details:
    • Model Name: The name of the Model task. Supports any text. Enter something descriptive to help you identify the task.
    • Data (CQL): enter any query, on any data source, including all tables and linked data sources in your project, that will serve as input for your script.
      • The results from the query above are saved to “data”
      • Notes:
        • Do not use the “filters (context)” or “date_range (context)” features in queries run via Models. This is because there is no (report/dashboard) context for these kind of queries, and therefore the query will fail.  Instead write explicit date range and conditions.
          Tip: use the report options “Show CQL” to see the final query (without “context”) when copying a query from a report to a Model.
        • Does not support “select * from cooladata
    • Script – R: enter your script here. We recommend running your script in a dedicated R IDE (preferably on Unix) and debugging it there before using it in Cooladata.
      • Some libraries are pre-installed in your project and can be used. You can also add other libraries to run in this specific script from the “Additional libraries” field below the script.  The libraries pre-installed in the R script are: ggplot2, plyr, reshape2, RColorBrewer, scales,grid, wesanderson, RJDBC, devtools, corrplot, testthat
      • The results of the script must be in a table format. Save your result to coolaResult, like so:
      • coolaResult <- data
      • This will ensure that the data is stored in a table to your project (see Table name in settings).
      • Cooladata uses S4 OO when running your R script and currently supports R version 3.4.3 and any earlier versions.
    • Script – Python: enter your script here. We recommend running your script in a dedicated Python IDE (preferably on Unix) and debugging it there before using it in Cooladata.
      • Some libraries are pre-installed in your project and can be used. You can also add other libraries to run in this specific script from the “Additional libraries” field below the script.  The libraries pre-installed in the Python script are: JayDeBeApi, Pillow,h5py, ipykernel ,jupyter, matplotlib, numpy, pandas, scipy, sklearn
      • The results of the script must be in pandas data frame format. Save your result to coolaResult, like so:
      • coolaResult = data
      • This will ensure that the data is stored in a table to your project (see Table name in settings).
      • Keep in mind that currently Cooladata supports Python 2.7.12
    • Additional Libraries: If you need additional libraries, enter them in the Additional Libraries section by typing the library name and press enter). For R, you may enter any Library available in CRAN repository. Keep in mind that this ensures the library is installed but you still need to import it in your script (you may use aliases if you wish). 
    • Run: click ‘Save and Run’ to save the Model and execute the query and script.
      • Run time will depend on the complexity of the query and script. You can either close the page and return to it later, or keep it open and the results will be shown once the first run is complete.
      • The default preview shows you the top 50 rows of the table created. You can also examine the run logs from the display.
      • You can see the full table by querying it from a CQL report using:
        select *
        from table_name
    • View Logs: The “view logs” section is available after first run. It shows logs for the 3 steps of the Model Run:
      • Step 1: Initialization: this will show errors when installing invalid libraries or if there was an issue creating the environment to run your script.
      • Step 2: Running the script, this will show warnings and errors in the R/Python script code or in your CQL query that retrieves the input data.
      • Step 3: Saving the data to the DB, this will show errors when you have issues with invalid column names or table schema.
  • Settings:
    • Table name: the table to which the data will be saved. Table names are case sensitive, and cannot include spaces or special characters. Take care when using an existing table, as existing data might be overwritten.
    • Write mode:
      • Append: new rows are added to the table each time the query is computed.
      • Append and update: new rows are added to the table each time the query is computed, and existing rows that match the unique key you selected will be replaced with the updated data.
      • Replace: The entire table data is overwritten each time the query is computed.
    • Notify on Failure to: email/s to be notified if any scheduled run fail.
  • Schedule: when to run the query.
    • Active: turn this off to prevent any run of this task. Note that this will also block Jobs from running it.
    • Frequency:
      • Daily: At a specific hour of the day (UTC).
      • Weekly: On a specific day of the week, at a specific hour of the day (UTC).
      • Monthly: On a specific day of the month, at a specific hour of the day (UTC).
      • CRON: Set the frequency by specifying a CRON expression. The CRON expression should be in the format of – minutes (0 – 59) hour (0 – 23) day of month (1 – 31) month (1 – 12) day of week (0 – 6). The star symbol (*) should be placed as a wild card. For example, this CRON expression will run the Aggregation Table daily at 3:30 am UTC: 30 3 * * *. You may refer to www.cronmaker.com for a description of CRON expression syntax.

 

Technical Details

The models are deployed using Dockers with a static IP so you can use them to access IP white listed services. The IP address is 18.205.8.38
Furthermore, notice that the script written in the model will be running on a Linux environment.

 

Querying Models

To query a Model, in a CQL query, as with any table in your project, state the table name in the FROM clause, instead of “cooladata”. For example:

FROM users_table

You can also use these tables as the data source for various report builders, such as KPI. See the report documentation for more information.

To use a date picker on a query from a table, use the TABLE_DATE_RANGE function – see Date and Time Functions for more information.

Models saved to your project can be found in the Schema, under Models.

  • Expand any table to see the table’s columns and their data types.
  • Drag&drop a table/column name to use it in a query.

 

 

Print Friendly, PDF & Email