How Justice Health and Forensic Mental Health Network Are Using Qlik in COVID-19

Justice Health and Forensic Mental Health Network has developed a Qlik-powered COVID-19 application to aid hospital management in managing staff, patients and stock levels of personal protective equipment (PPE). Updated frequently, and in combination, these three areas provide insight into the current healthcare environment.

The application allows end users to identify stock levels of PPE by location — with this information, management can relocate available stock to staff caring for patients that have fever or respiratory symptoms at other locations. This insight saves time as staff do not have to call a variety of locations to determine stock levels.

Using the application, healthcare managers will also be able to view absenteeism across locations and allocate staff to sites that most need support.

“Having an instant view of essential clinical and human resources information across the agency has been a key part of the Network’s response to COVID-19,” the Network’s Chief Executive Gary Forrest said.

“The Network delivers health care to adults and young people in contact with the NSW forensic mental health and criminal justice systems, in more than 100 custodial, inpatient and community settings.

“As a state-wide service, one of the biggest challenges is having visibility of PPE stock levels across our multiple sites and ensuring it is distributed appropriately.

“Using Qlik, we’ve developed an app that gives us visibility of PPE stock levels across our sites, meaning we can monitor use and reallocate PPE between locations, depending on need. Ultimately, this helps ensure our frontline health workers and patients are protected.

“Having shared visibility of this data is providing reassurance to staff that we have sufficient PPE stocks to keep them safe at work. It’s one less thing they need to contend with as they continue their outstanding job to maintain the health and safety of patients, and their colleagues and families,” Forrest said.

“We’re also using the app to track the number of patients in isolation, number of confirmed COVID-19 patient cases (zero as at 28 April 2020), number of staff on leave by reason (eg, sick, caring for someone, in isolation) and the number of staff on alternative work arrangements (eg, working from home).

“The app saves time on reporting and allows data to be viewed at an organisation-wide level, or by directorate, region and individual location. It’s also informing local and organisation-wide decisions on allocation of staffing and other clinical resources.

“As a result, we can optimise the care and support we provide to patients across NSW, and ensure our healthcare workers are protected while doing their very important job.”

Check out the full and original article on Hospital Health here.

15 Python Libraries That A Data Scientist Need To Know

If you are a Data Scientist, Data Analyst or just an enthusiast, you should not miss some extremely popular and useful libraries for Python.

In this article, totally 15 Python libraries will be listed and briefly introduced. I believe most of them you may have already familiar, but if not, it is highly recommended to go check them out by yourself.

These libraries will be classified into several categories, that are

  • Data Gathering
  • Data Cleansing and Transformation
  • Data Visualisation
  • Data Modelling
  • Audio and Image Recognition
  • Web

Data Gathering

Photo by geralt on Pixabay

Most of Data Analytics projects start from data gathering and extraction. Sometimes, the dataset might be given when you work for a certain company to solve an existing problem. However, the data might not be ready-made and you may need to collect it by yourself. The most common scenario is that you need to crawl the data from the Internet.

1. Scrapy

Scrapy | A Fast and Powerful Scraping and Web Crawling FrameworkEdit descriptionscrapy.org

Scrapy is probably the most popular Python library when you want to write a Python crawler to extract information from websites. For example, you could use it to extract all the reviews for all the restaurants in a city or collect all the comments for a certain category of products on an e-commerce website.

The typical usage is to identify the pattern of the interesting information appearing on web pages, both in terms of the URL patterns and XPath patterns. Once these patterns are figured out, Scrapy can help you automatically extract all the needed information and organise them in a data structure such as tabular and JSON.

You can easily install Scrapy using pip

pip install scrapy

2. Beautiful Soup

Beautiful SoupDownload | Documentation | Hall of Fame | For enterprise | Source | Changelog | Discussion group | Zine ] You didn’t…www.crummy.com

Beautiful Soup is yet another Python library for scraping Web content. It is generally accepted that it has a relatively shorter learning curve compare with Scrapy.

Also, Beautiful Soup will be a better choice for relatively smaller-scaled problems and/or just a one-time job. Unlike Scrapy that you have to develop your own “spider” and go back to command-line the run it, Beautiful Soup allows you to import its functions and use them in-line. Therefore, you could even use it in your Jupyter notebooks.

3. Selenium

Selenium Client Driver — Selenium 3.14 documentationPython language bindings for Selenium WebDriver. The selenium package is used to automate web browser interaction from…www.selenium.dev

Originally, Selenium was developed to be an automated Web testing framework. However, developers found that it is quite convenient to use it as a Web scraper.

Selenium is usually utilised when you have to get the interested data after interactions with the web pages. For example, you may need to register an account, then log in and get the content after clicking some buttons and links, and these links are defined as JavaScript functions. In these cases, usually, it is not easy to use Scrapy or Beautiful Soup to implement, but Selenium can.

However, it is important to be noted that Selenium will be much slower than the normal scraping libraries. This is because it actually initialises a web browser such as Chrome and then simulates all the actions defined in the code.

Therefore, when you are dealing with URL patterns and XPaths, do use Scrapy or Beautiful Soup. Only choose Selenium if you have to.

Data Cleansing and Transformation

Photo by Sztrapacska74 on Pixabay

I guess it is not necessary to claim how data cleansing and transformation are important in data analytics and data science. Also, there are too many outstanding Python libraries that do these well. I’ll pick up some of them which you must know as a Data Scientist or Analyst.

4. Pandas

pandaspandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of…pandas.pydata.org

I am almost sure that listing Pandas in this list is unnecessary. As long as you are dealing with data, you must have used Pandas.

With Pandas, you can manipulate data in a Pandas Data Frame. There are enormous built-in functions that help you to transform your data.

Don’t need too many words. If you want to learn Python, this is a must-learn library.

5. Numpy

NumPy — NumPyNumPy is the fundamental package for scientific computing with Python. It contains among other things: a powerful…numpy.org

Similarly, Numpy is another must-learn library for Python language users, even not only for Data Scientists and Analysts.

It extended Python list objects into comprehensive multi-dimensional arrays. There is also a huge number of built-in mathematical functions to support almost all your needs in terms of calculation. Typically, you can use Numpy arrays as matrices and Numpy will allow you to perform matrix calculations.

I believe many Data Scientist will start there Python scripts as follows

import numpy as np
import pandas as pd

So, it is sure that these two libraries are probably the most popular ones in the Python community.

6. Spacy

spaCy · Industrial-strength Natural Language Processing in PythonspaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your…spacy.io

Spacy is probably not as famous as the previous ones. While Numpy and Pandas are the libraries dealing with numeric and structured data, Spacy helps us to convert free text into structured data.

Spacy is one of the most popular NLP (Natural Language Processing) libraries for Python. Imagine that when you scraped a lot of product reviews from an e-commerce website, you have to extract useful information from these free text before you can analyse them. Spacy has numerous built-in features to assist, such as work tokeniser, named entity recognition, and part-of-speech detection.

Also, Spacy support many different human languages. On its official site, it is claimed that it supports more than 55 ones.

Data Visualisation

Photo by 6689062 on Pixabay

Data Visualisation is absolutely an essential need in Data Analytics. We need to visualise the results and outcomes and telling the data story that we have found.

7. Matplotlib

Matplotlib: Python plotting — Matplotlib 3.2.1 documentationMatplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python…matplotlib.org

Matplotlib is the most comprehensive data visualisation library for Python. Someone says that Matplotlib is ugly. However, in my opinion, as probably the most basic visualisation library in Python, Matplotlib provides the most possibilities to achieve your visualisation idea. This is just like JavaScript developers may prefer different kinds of visualisation libraries, but when there is a lot of customised features that are not supported by those high-level libraries, D3.js has to be involved.

I have written another article to introduce Matplotlib. Check out this if you want to read more about it.

An Introduction to Python Matplotlib with 40 Basic ExamplesMatplotlib is one of the most popular libraries in Python. In this article, 40 basic examples are provided for you to…levelup.gitconnected.com

8. Plotly

Plotly Python Graphing LibraryPlotly’s Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots…plotly.com

Honestly, although I believe Matplotlib is a must-learn library for visualisation, most of the times I would prefer to use Plotly because it enables us to create the fanciest graphs in fewest lines of code.

No matter you want to build a 3D surface plot, a map-based scatter plot or an interactive animated plot, Plotly can fulfil the requirements in a short time.

It also provides a chart studio that you can upload your visualisation to an online repository which supports further editing and persistence.

Data Modelling

Photo by FeeLoona on Pixabay

When data analytics comes to modelling, we usually refer it to Advanced Analytics. Nowadays, machine learning is already not a novel concept. Python is also considered as the most popular language for machine learning. Of course, there are a lot of outstanding libraries supporting this.

9. Scikit Learn

scikit-learn“We use scikit-learn to support leading-edge basic research […]” “I think it’s the most well-designed ML package I’ve…scikit-learn.org

Before you dive into “deep learning”, Scikit Learn should be the Python library you to start your path on machine learning.

Scikit Learn has 6 major modules that do

  • Data Pre-Processing
  • Dimensions Reduction
  • Regression
  • Classification
  • Clustering
  • Model Selection

I’m sure that a Data Scientist who has nailed Scikit Learn should already be considered as a good Data Scientist.

10. PyTorch

PyTorchAn open source deep learning platform that provides a seamless path from research prototyping to production deployment.pytorch.org

PyTorch is authored by Facebook and open-sourced as a mutual machine learning framework for Python.

Compare to Tensorflow, PyTorch is more “pythonic” in terms of its syntax. which also made PyTorch a bit easier to learn and start to use.

Finally, as a deep-learning focus library, PyTorch has very rich API and built-in functions to assist Data Scientists to quickly train their deep learning models.

11. Tensorflow

TensorFlowAn end-to-end open source machine learning platform for everyone. Discover TensorFlow’s flexible ecosystem of tools…www.tensorflow.org

Tensorflow another machine learning library for Python that was open-sourced by Google.

One of the most popular features of Tensorflow is the Data Flow Graphs on the Tensorboard. The latter is an automatically generated Web-based dashboard visualising the machine learning flows and outcomes, which is extremely helpful for debugging and presentation purposes.

Audio and Image Recognition

Photo by Pexels on Pixabay

Machine learning is not only on numbers but also can help on audio and images (videos are considered as a series of image frames). Therefore, when we deal with these multimedia data, those machine learning libraries will not be enough. Here are some popular audio and image recognition libraries for Python.

12. Librosa

LibROSA — librosa 0.7.2 documentationLibROSA is a python package for music and audio analysis. It provides the building blocks necessary to create music…librosa.github.io

Librosa is a very powerful audio and voice processing Python library. It can be utilised to extract various kinds of features from audio segments, such as the rhythm, beats and tempo.

With Librosa, those extremely complicated algorithms such as the Laplacian segmentation can be easily implemented in a few lines of code.

13. OpenCV

OpenCVOpen Computer Vision Libraryopencv.org

OpenCV is the most ubiquitously used library for image and video recognition. It is not exaggerated to say that OpenCV enables Python to replace Matlab in terms of image and video recognition.

It provides various APIs and supports not only Python but also Java and Matlab, as well as outstanding performance, which earns much appreciation both in the industry and academic research.

Web

Photo by 200degrees on Pixabay

Don’t forget that Python was commonly used in Web Development before it comes popular in the data science area. So, there are also a lot of excellent libraries for web development.

14. Django

DjangoDjango is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by…www.djangoproject.com

If you want to use Python to develop a Web service backend, Django is always the best choice. It is designed to be a high-level framework that can build a website in very few lines of code.

It directly supports most of the popular databases to save your time to set up the connections and data model development. You would only focus on the business logic and never worried about CURD manipulations with Django because it is a database-driven framework.

15. Flask

Welcome to Flask — Flask Documentation (1.1.x)Welcome to Flask’s documentation. Get started with Installation and then get an overview with the Quickstart . There is…flask.palletsprojects.com

Flask is a light-weight Web development framework in Python. The most valuable feature is that it can be easily customised with any specific requirements very easy and flexible.

A lot of other famous Python libraries and tools which provides Web UI are built using Flask such as Plotly Dash and Airflow because of Flask’s light-weight feature.

Conclusion

Photo by Kelly Sikkema on Unsplash

Indeed, there are more prominent Python libraries that are eligible to be listed in here. It is always exciting to see that Python’s community is such thriving. In case if there are more libraries become one of the must-known ones for Data Scientists and Analysts, there might be necessary to organise them in another article.

Life is short, so I love Python!

Written by Christopher Tao, Principal Consultant@AtoBI Australia
Follow Christopher on LinkedIn Here

Article originally published on Towards Data Science

What Makes Snowflake Different from other DBMS as “Cloud-Based”

Recent years, Snowflake DB has become more and more popular because of its creative features. A lot of companies who are leaders in their business domain have started their journey with Snowflake, such as Sony, Logitech and Electronic Arts. So, what are these “features” that make Snowflake such attractive and competitive? In this article, I will pick up some points which make me love this database management system. Rather than measuring its performance and abilities, I will focus on the unique features that Snowflake has because it is a true “cloud-based” database.

What I will skip

I will skip some features that have been talked too much everywhere on the Internet, such as the creative architecturehigh performance and pricing model. Of course, these are what Snowflake proud with. However, as a consultant, I have seen too many tools and software that were competed with each other in the “existing” dimensions, and no one of them can be guaranteed to be the top one forever.

Specifically, most customers will not likely be interested in its creative architecture, if they are not technique enthusiasts. For the performance, in fact, none of the popular database management systems will be very bad. Finally, the price is not always the key factor that makes people buy.

Therefore, what I am doing here, is trying to let you know what are the features that Snowflake may let you WOW. Also, if applicable, I will compare Snowflake with 2 other popular cloud-based DBMS, Azure Synapse Analytics (former Azure Data Warehouse) and AWS Redshift, to see how Snowflake makes use its “cloud-based” icon.

Zero efforts for development environment

Photo by RawFilm on Unsplash

Snowflake is not the only cloud-based DBMS, but it is probably the only one providing such a comprehensive UI that supports all popular web browsers.

Azure Synapse Analytics:
Need to download SQL Server Management Studio

AWS Redshift:
Need to download a third-party SQL client such as SQL Workbench

Snowflake:
In your web browser, go to the URL and press enter

It’s not only about downloading management software. Think about we you need to work on a new machine…

Retain your worksheet

Photo by Annie Spratt on Unsplash

I believe I am not the only one who use to open several worksheets with many SQL statements for different purposes. Most of the time I will NEVER shut down the SQL client and my laptop for many days until reaching a milestone of the project, because I just don’t want to lose my SQL statements. I knew I can save them into script files and import next time, but just lazy, and re-import those cause some headache, too.

With Snowflake, be calm down enough to close your browser tab any time. Whenever you open the URL, your worksheets will always be there, even you change to another machine because everything is stored in the cloud.

Change your role

Photo by Christina @ wocintechchat.com on Unsplash

If you are an admin or a guy who owns master permissions, you probably will be frequently asked to grant permissions to other users. When using other DBMS, you will need to do your job and then confirm with the user that you granted permission to, whether the resource is already available.

With Snowflake, simply switch your role to see what permission this role have. This will also let you easily testing whether you have over-granted the permission.

SQL statement execution history

Photo by Djim Loic on Unsplash

It is not uncommon sometimes we want to see what we have done before during the development. At least I used to write a lot of SQL statements for testing purposes, and then I got the query working, so I deleted the testing code. But later on, I may realise that I want to see what I have actually tested.

With Snowflake, all your query history will be retained for 1 YEAR. Whenever you want to view what you have done before, simply go to the history to find it.

Configure notifications to email and SMS

As a cloud-based DBMS, Snowflake makes it extremely easy to configure your email or SMS notifications.

Azure Synapse Analytics:
Email: ask your IT department to provide you with the SMTP server and configure the white list for you. Then, configure your SMTP in Mail Profile
SMS: configure your Azure Action Group

AWS Redshift:
Need to create a Kinesis Data Firehose Delivery Stream and then configure it

Snowflake:
Fill in your email address and/or phone number. Then, enable it.

You can even enable MFA (Multi-Factor Authentication) in 30 seconds. Just provide your phone number, receive a verification code and fill it back.

Time Travel

Photo by Jake Blucker on Unsplash

This is the most WOW feature when I started to explore Snowflake DB. Specifically, with the Standard Edition, you can enable a table with “time travel” and have 1-day data retained no matter how you deleted or updated it. With the Enterprise Edition, the time travel is up to 90-day.

How can we use “time-travel” feature? Easy!

SELECT <columns>
FROM <your_table>
AT (timestamp => '<timestamp>');

Also, I have never seen such expression in any other DBMS:

UNDROP TABLE <table_name>;

Think about any other DBMS, you will need a well-rounded plan for how your data will be backup regularly, just in case for the disaster recovery, and then automate everything. You may need at least 1–3 developers/DBAs to do this job for at least several weeks and test it. But with Snowflake, not necessary at all.

Quick Clone

Photo by henry perks on Unsplash

With Snowflake, you can clone a table, a schema or even a database in no time and occupying no space. This is because the cloned table actually create pointers (point to the stored data) but not the actual data. In other words, the cloned table only has the data that is different from its original table.

Why do we need to clone a table?

  • When you want to perform some experiments with data but don’t want to hurt it by mistakes.
  • When you want to capture data in a snapshot at a certain time.
  • When you want to get a separate copy of a table’s history, then use clone together with time travel!
  • When you want to share production data with others, but don’t want to directly give access to the production database.
  • etc.

Quick Integration with Cloud Storage

Photo by Norbert Kundrak on Unsplash

If you use one of the biggest 3 cloud providers: Azure, AWS or GCP, you may use their cheap storage services Blob Storage, S3 or GCS. You can use 1 minute to configure your Snowflake to be able to import data from these storages.

Automatically Data Ingestion (SnowPipe)

Photo by Samuel Sianipar on Unsplash

Snowflake, as a DBMS, is not designed as a comprehensive ETL tool, but if your data ingestion is that complicated, it definitely will satisfy your requirements, with very quick implementation.

For example, if you have another process to put data from whatever data sources to cloud storage (e.g. Azure Blob Storage), and then what you need to do for automating the data ingestion into Snowflake is:

  • Create an Azure Event Grid to emit events when a blob is created in Blob Storage
  • In Snowflake, create an “integration” with the Event Grid (can be done in a single SQL command CREATE EVENT)
  • In Snowflake, create a SnowPipe to monitor the “integration” and set the destination table
CREATE PIPE <snowpipe_name>
AUTO_INGEST = TRUE
INTEGRATION = <integration_name>
AS
COPY INTO <destination_table>
FROM <blob_storage_path>
FILE_FORMAT = (type = '<e.g. JSON>')

Then, whenever there is a new file been created in the Azure Blob Storage, the event will be emitted, and SnowPipe will aware and run COPY INTO command to ingest the data into the destination table.

If you want to know more about this, please check out one of my previous post here: https://medium.com/@qiuyujx/building-snowpipe-on-azure-blob-storage-using-azure-portal-web-ui-for-snowflake-data-warehouse-f0cdd7997250

Easiest Scheduled SQL Tasks Ever

Photo by Estée Janssens on Unsplash

If you want to have your SQL statement running regularly, I found it is never such as easy before Snowflake was invented. What you simply to do is:

CREATE TASK <task_name>
WAREHOUSE = <warehouse_name>
SCHEDULE = <frequency (e.g.'10 MINTUE')>
AS
<sql_statement>

Do you still remember how you need to do this in SQL Server? Enable SQL Agent Windows Service, create a new task, and then fill in several pages of forms…

If you want, you can also use SnowPiple and Tasks together to do some complicated ETL, it is not “drag-n-drop” though, but you also don’t need to involve a separated tool, it’s all in your DBMS.

Resource

Snowflake Architecture:
https://www.snowflake.com/product/architecture/

Written by Christopher Tao, Principal Consultant@AtoBI Australia
Follow Christopher on LinkedIn Here

Article originally published on Medium.

ELT Data Pipelining in Snowflake Data Warehouse — using Streams and Tasks

Snowpipe was introduced to be utilised for automatically discovering new data files in cloud storage (e.g. Azure Blob Storage) and then load the data into a certain table.

Snowpipe is a very convenient tool for the above purposes. However, Snowpipe itself is only considered as the “E” (Extract) of ELT, because only COPY INTO command is allowed in a Snowpipe creating statement. In other words, we can easily achieve the following with it:

  • Loading data files in different formats such as CSV, JSON, XML, Parquet and ORC
  • Adopting and tweaking the data source for better compatibility such as stripping outer array for JSON and stripping outer element for XML
  • Changing column names
  • Changing column orders
  • Omitting columns
  • Parsing data/time string into data/time object

Therefore, for the following reasons, Streams and Tasks are needed for the rest of the data pipelining:

  1. Some data transformation might be necessary, such as numbers calculation and strings concatenation.
  2. The data source is not in typical third-norm format, so it needs to be loaded into multiple tables based on certain relations.
  3. The ELT jobs may not be limited to appending tables, but also include more complex requirements such as SCD (Slowly Changing Dimension) to be implemented.

Therefore, we have to involve other objects in Snowflake to complete the data pipeline.

Snowflake Streams

A Snowflake Stream object is to tracking any changes to a table including inserts, updates and deletes, and then can be consumed by other DML statement. One of the typical usage of steam object is the CDC (Change Data Capture)

Standard v.s. Append-only Stream

There are 2 types of streams that we can define in Snowflake, which are standard and append-only. Standard streams will capture any types of changes to the table, whereas append-only streams only capture inserted rows. The former can be used for general purpose, and the latter can be utilised in a typical ELT model when we only interest in the new rows ingested.

For example, we can have a Snowpipe to automatically ingest the data from CSV files in a cloud storage and copy into a staging table. Then, a stream will capture this bulk-inserting action and record the offset of these new rows.

Stream Offset

Snowflake Streams do not physically store, contain or copy any data. It just takes an snapshot of the tracking table at the current time (e.g. upon stream creation), then every changes made to this table will be recorded by the stream as additional metadata. The difference between the previous version of the table and current version is called the offset.

When other DML statements utilise the offset of the stream, the offset will be reset, and the stream will consider the consumed data as “previous” version of the table. A typical DML statement that consumes stream offset is INSERT INTO ... SELECT ... FROM STREAM.

Here is an example of the stream consumption flow:

  1. Suppose there are already 1 million rows in a table. We create a stream on this table, the stream will have no offset because the current version of the table is snapshotted.
  2. Suppose there are 1 thousand rows inserted into the table from Snowpipe COPY INTO statement.
  3. The stream will have these 1 thousand rows recorded as its offset.
  4. Suppose now we read these 1 thousand rows from the stream and insert them into another table.
  5. The stream now will have no offset so that it will be empty, because we have consumed the previous offset.

It is obvious that Snowflake Streams are designed for the ELT processes. For example, we have a high frequency data that is being ingested into the database, and we are consuming the data every 5 minutes. The stream will guarantee that every time we consume the new data has no missing and no overlaps. This indeed can be a logic that is complicated to be implemented in other database management systems, or we may need to use extra programming language achieve this.

Stream Columns

Because stream is actually a snapshot of the original table, all the columns in the original table is also accessible in the stream. For example, if the original table has 3 columns names col1col2 and col3, then we can simple run

SELECT col1, col2, col3
FROM the_stream;

to retrieve the offset in this stream.

Additionally, there are 3 extra columns that are metadata particularly for stream objects:

  • METADATA$ACTION indicates the type of the changes for this row. It can be either INSERT or DELETE. Please note that there is no “UPDATE” action. An update of a row will be indicated by 2 rows, one is DELETE of the previous row, the other one is INSERT of the new row. You’ll find this is very convenient because you can get information in the stream regarding what are the specific fields were updated by comparing these two rows.
  • METADATA$ISUPDATE indicates whether this row is generated because of an update. This column is an additional indicator that allows you to get all the rows with INSERT and DELETE but actually generated because of an update.
  • METADATA$ROW_ID is an unique ID of a particular row. It will not be changed even if the row is updated, and will also remain unchanged for any further updates in the future offsets.

Here is an image from official documentation that presents the data flow very clear.

Image courtesy: https://docs.snowflake.net/manuals/user-guide/streams.html

Snowflake Tasks

A Snowflake Task is such an object that can schedule an SQL statement to be automatically executed as a recurring event.

Yes, I use “an” SQL statement for each task. Currently, this is an limitation of Snowflake Tasks. Hopefully in the future updates it will support a transaction will proper BEGIN and COMMIT keywords so that we can execute a series of SQL statements in a transaction, and utilising a task to schedule this transaction.

Despite there is such an limitation, Snowflake does provide approaches for multiple SQL statement scheduling, which is called Task Tree. As its name, we can define multiple tasks in a tree structure. Several tips:

  • A task can have multiple tasks as its offsprings, so the offsprings tasks will be executed when the parent task is finished.
  • A task can have only one parent task.
  • Therefore, a task tree is really just a tree, but not DAG. So, you can consider that it can do “fork”, but there is no “join”.

Here is a graph to indicate a simple task tree.

Image courtesy: https://docs.snowflake.net/manuals/user-guide/tasks-intro.html

Trigger a Task

There are two ways for triggering a task, one is by defining a schedule and the other one is triggering by another task.

Snowflake supports two types of task schedule definition, “CRON” expression and time interval. CRON expression can define the exact date/day/time that the task will be triggered, it is very powerful that can achieve any simple to complex requirements, such as

  • Run task at certain time of every day
  • Run task at certain time of a certain day of week
  • Support timezones
  • Support daylight saving

For specific syntax of CRON expression, please visit this free website for detailed explanation and experiments: Free Online Cron Expression Generator and Describer – FreeFormatter.comGenerate a quartz cron expression with an easy to use online interface. Convert a cron expression into a readable text…www.freeformatter.com

The other approach of scheduling defining is much more straightforward, which is just simple define the time interval. For example, if I want the task to be triggered every 5 minutes, then just define it like this:

SCHEDULE = '5 MINUTE'

Another way of triggering a task is to define a parent task, which is considered to build a task tree. For example, if we want the TASK2 to be triggered when TASK1 is finished, just simple define:

CREATE TASK TASK2
...
AFTER "TASK1"
AS
<SQL>

Conditional Tasks

A task can be defined with a condition by a “WHEN” clause that is used to decide whether this task should be executed.

This is an extremely convenient feature. For example, if you set a condition with your “Root Task”, when the condition is not satisfied, the whole task tree will not run at all.

Also, if this “condition test” is happened in the cloud service layer of Snowflake (using metadata rather than SQL query on a table), it will have ZERO cost for this condition test and consequently having NO cost if the condition is not satisfied.

Using Tasks with Streams

It is very common that the Snowflake Tasks and Streams are utilised together to build a data pipeline. A very typical usage pattern will be:

  1. Snowpipe loads raw data into a staging table.
  2. Snowflake Stream is created on the staging table, so the ingested new rows will be recored as the offsets.
  3. Snowflake Task then consume the Stream offsets by some DML statement to further load the data into production tables, some more complex transformations might included.
  4. In practice, there will be a task tree to execute multiple SQL statements in order to perform the complex transformation, and sometime populate the transformed entries into multiple production tables.
  5. Some intermediate tables might need to be created by the tasks in task tree.
  6. There might be some tasks at the last of the task tree to clean some temporary tables.

We have already introduced the typical scenario that we need to integrate Snowpipe and Stream. Now, for Stream and Task, there is also a very typical usage pattern, that is the system variable SYSTEM$STREAM_HAS_DATA.

Specifically, the variable SYSTEM$STREAM_HAS_DATA will return a boolean value that whether there is an offset in a stream that is consumable. Therefore, we can simple put this condition in the “WHEN” clause of our Task definition. So, the task will not be executed if the there is nothing new in the stream. For example:

CREATE TASK mytask1
WAREHOUSE = mywh
SCHEDULE = '5 minute'
WHEN
SYSTEM$STREAM_HAS_DATA('MYSTREAM')
AS
INSERT INTO ... ;

In this example, only if there is consumable offset in the Stream “MYSTREAM”, the Task “mytask1” will be executed. Otherwise, the task will be skipped and check the condition again after 5 minutes.

One more tip

It is important to remember that a Task that has just been created will be suspended by default. It is necessary to manually enable this task by “altering” the task as follows:

ALTER TASK mytask1 RESUME;

Future works

In this article, the Snowflake Stream and Task are introduced theoretically. The general picture of using Snowpipe -> Stream -> Task to build an ELT data pipeline is drawn. For the next part, I’ll use a real example data source to build the ELT data pipeline using these Snowflake components with necessary SQL scripts demonstration, which will be more hands on.

Resources

Official documentation of Snowflake Streams: https://docs.snowflake.net/manuals/user-guide/streams.html

Official documentation of Snowflake Tasks:
https://docs.snowflake.net/manuals/user-guide/tasks-intro.html

Free online CRON expression generator:
https://www.freeformatter.com/cron-expression-generator-quartz.html

Written by Christopher Tao, Principal Consultant@AtoBI Australia
Follow Christopher on LinkedIn Here

Article originally published on Towards Data Science.

A Leading Australian Private Health Insurance Company

“The beta version has already assisted in decision making, understanding competitor movements and emerging trends.”

The Client

Recently, a representative from a leading Australian health provider spoke with us about their AtoBI journey. Their experience has once again affirmed why our clients continuously choose AtoBI as their business solutions provider. Read on below to find out more about their experiences with us.

My role is to be responsible for our company’s product portfolio. That is, managing the lifecycle of all our PHI and OVC product and product related services; this includes all stages, from concept to market, to monitoring, to eventual closure.

Our company was founded on social values and community contribution. Established in response to unmet community needs, we continue to identify and help solve the health and financial needs of the broader community. In turn, we have helped almost 1 million Australians create a brighter future with our health, wealth and living services.

The Project

At our organisation, we conduct both Data Integration and BI Reporting.

Data Integrating, for us, includes the integration of Private Health Insurance Statements (PHIS) data and manually collected flat files. There are many reasons why we integrate our data, one being so that we are able to filter by data fields such as fund discounts, brands etc.

There are many reasons we use BI Reporting. Some of the main uses are:

Development of a feature score methodology to compare features across the marketplace including a product comparison page

The functionality to develop new concept products and compare against the rest of the market. We do this so that the user can include premiums and additional product benefits to understand the impact on feature score versus price

A product reference table to enable the user to extract data into a format for manual manipulation including concept products

A strategic summary page which outlines product movements by fund including average premiums, new products introduced, deleted, and overall market positioning based on product tier and feature score relativity

The development of filters to be able to drill down in specific data sets (e.g. product type, competitors, scale, product classification (product tier – gold/silver/bronze/basic Gov classifications etc)

Development of  a monthly competitor intelligence reporting framework and distribution process

What are some Major Challenges you’ve faced?

When we approached AtoBI, we had a number of challenges which needed to be addressed. We wanted to use our data to open our eyes to a range of information, from understanding our competitors and streamlining our processors, to uncovering potential risks and benefits of a product.

We were looking to summarise and rank our products against the competitive set of 38 health funds; a problem which involved a large data set, with a number of data attributes. Using data, we also wanted to improve our decision making and risk management, as well as improve competitor knowledge across the organisation via reporting and analysis.

There was also the process of product development, and how to enhance this process to eventually get products on the market sooner, as well as identify actual in-market positioning, opportunities and potential risks. And lastly, we faced the challenge of not being able to conduct a more in-depth analysis of competitor fund movements across a number of variables, including state, class and product types.

Why AtoBI?

Having worked with AtoBI on a similar project whilst at another health fund, I knew with confidence that they would be able to help us solve the variety of data challenges we had accumulated.

The Benefits

Even though we are currently only implementing changes into our test environment, the difference the project made to our beta version has been tremendous. We have already had assistance in decision making, understanding competitor movements and emerging trends.  I would recommend AtoBI to other organisations, due to their flexibility, ability to meet deadlines and how responsive they are to client needs. AtoBI are agile; they test their products and adapt well to problems.

Business Benefits of Artificial Intelligence

There are many reasons a company would look to artificial intelligence (AI) technology to improve their efficiency. AI technology has the potential to aid your business in a myriad of ways, from helping reduce costs to increasing revenue.  At AtoBI we like to view A as assist rather than artificial, as these solutions are here to aid us.

Regardless of if your business is new to AI, or you have a full range of technologies availability to you, there are major benefits to implementing these solutions into your processes and products. The impact of artificial intelligence on your business depends on how you use it, but there are many overarching benefits for all businesses no matter the way AI works for you. Such benefits include things like saving time and money through the automation of processes, making faster business decisions based on outputs from cognitive technologies and better predictability of a customers preferences to then personalise their experience with your business.

While it may sometimes seem like AI is a solution that works best independently, to obtain the best business enhancement it is important to use it alongside human capabilities, rather than instead of them. AI is fantastic at completing repetitive, consistent tasks which gives your employees more time to make the most of this new technology.

If you are still unsure about opening up your business to the world of AI technology, then it is worth taking some more time to research the endless potential this technology can bring. No matter your reason, whether it be keeping up with competitors or wanting to improve your general efficiency, AI can help you achieve your goal.

To ensure you are able to keep up with all the changes in our fast changing technical world, choosing to work with a business solutions agency is the right choice, allowing you to continue to focus on your business deliverables. At AtoBI, we have consultants ready to help your business take the leap into the AI world, giving you the best tools to unlock the true potential of your business. Email us at info@atobi.com.au or visit our website to get in touch.

Qlik Customer Success: Our New Imperative

Retaining and Growing Customers in the Subscription Economy

Customer success may seem like an obvious concept because it’s essential to doing business. What company doesn’t want its customers to be successful with its products? Yet customer success is a relatively new and rising corporate discipline – something many organizations don’t have a good handle on.

Customer success is a business methodology, designed to ensure that your customers achieve their desired outcomes while using your product or service. By making sure your customers are wildly successful with your solutions, your company increases product or service use and adoption – ultimately boosting customer retention and growing customer relationships.

READ MORE

MapR: the future of MapR and HPE for AtoBI

At AtoBI, we are always sure to stay ahead of the curve when it comes to the software programs best suited to the needs of our clients. One of our current software programs in use is MapR, a Data Platform used to gather AI and analytics dataware. There are many benefits to using MapR, and yet recently there has been some concern as to its future and how best to support its users. CEO of AtoBI, Elaine Graydon, attests to the efficiency and benefits of using MapR, and will continue to support MapR throughout these changes.

Firstly, let’s look at the benefits of using MapR.

No matter your underlying infrastructure for data, the MapR dataware can handle a multitude of different data types. Users gain access to the MapR Data Platform, which allows them to store, manage, process and analyse data no matter their source. This software includes complete data protection and disaster recovery, to give you confidence knowing your data is protected.

MapR and HPE

After MapR Australia closed on May 17th 2019, there was a lot of uncertainty around customer support, expansion licensing and new purchases. Before this, MapR had announced they would be closing their corporate headquarters due to funding issues. This caused some upset for users, and the time between this occurrence and August 4th left MapR clients up in the air. However there is good news, as on August 4th Hewlett Packard Enterprise (HPE) acquired MapR signaling a positive change. AtoBI customers can be confident that MapR will continue to function and thrive. With support from MapR/HPE and us here at AtoBI, there is no better time to start using this dataware and make the most of your data capabilities.

We are pleased and excited to continue to bring this amazing toolset to our customers, a toolset we know will continue to provide outcome requirements. MapR will now continue to be innovated to match customer requirements, and we are confident in the future of MapR for Australians.

Want to know more about our work with MapR? Contact us online or email us at info@atobi.com.au to get in touch regarding MapR. AtoBI are industry experts in business solutions, and together with AI technology we can help you improve efficiency in all areas of your business.

Value of Data and Explosion of Data Volumes

As the business world delves deeper into the possibilities of technology, there is no doubting the power and importance of data. When it comes to increasing your sales profitability, it can be difficult to know which steps are best to ensure that your people are making the most of the avalanche of data now available to them. Businesses must manage data from a myriad of sources – and it is important to know which channels are worth your time. With a landscape that is growing more complicated every day to increase efficiencies, manage costs while continuing to show profitability, there is one vital resource which businesses are not taking full advantage of.

This resource is offline data. The power of offline data is that it allows you to see behind the scenes of what is really driving your decisions, as well as giving you the confidence to know your data is whole before then using it to develop important strategies.

Did you know that large amounts of customer activity happen offline, including 90% of retail sales? This means that sales and marketers are missing huge amounts of data, data which could make or break growth strategies and decisions.

Data onboarding allows you to combine your offline data with your online data, allowing you to use both types of data with a complete understanding of what’s going on. In doing this, you are able to create targeted, personalised communication for customers, and in a time where customers need individual validation this is a smart decision. Data onboarding take the offline data and then links to the relevant data in the digital world, resulting in far more relevant strategy.

Having incomplete data is a real concern for businesses and markets alike, and it’s easy to become overwhelmed. At AtoBI, we are able to ensure you are making the most of all of your data, offline and online. You can be confident that our team will give you all the information you need to make the right decisions for your business, putting you one step ahead. For more information, visit our website or contact us at info@atobi.com.au

What You Need To Know About AI

Artificial Intelligence is now everywhere. From a Science-Fiction concept to an widespread technology, it often comes with clichés and misconceptions (Yup, we’ve all seen how it ended in 2001 Space Odyssey, The Matrix or Terminator…)

More and more organisations of all sizes and industries are trying to understand how they can leverage AI and get aboard the hype train. But from a buzzword to an actual solution delivering strong value, their are a few things to keep in mind before taking the leap.

Artificial Intelligence projects can fail if they are not led for a really, really good reason. This might seem obvious, but having witnessed the significant impact they have on an organisation, the business case must be ambitious though pragmatic, focusing on tangible business outcomes. Also, and as sexy as AI can be, is it the most adapted technology to solve that specific business case?

There are many compelling use cases in all industries and departments, from Sales to Finance, Marketing, Health & Safety, Logistics… and more which create competitive advantage and massive savings. This Gartner paper also encompasses good practice when building your business case.

Until future systems can embrace human-like top-down decision-making, current Artificial Intelligence technologies relies on data. A lot of data. It must combine quantity with quality, needs to be clean, comprehensive and avoid siloed information, biais or partial data (limited history, demographics…)

If it is sometimes challenging to figure out what data you will need for your algorithms to learn and deliver efficiently, working backwards from the problem you are looking to solve can help determine where to start.

In today’s digitized economy, the ability to use data represents a real and essential competitive advantage. To get to a future state of mature analytical competency, there’s real work to be done in integrating the data you have already. This is a strategic goal for the entire company and, when addressed properly, will lead you to develop experience and a data infrastructure that unlocks every next step. HBR

In other words, walk before you run and get a good understanding of your data first: Conduct an audit of what data you have and where it sits, making sure you have easy access to all relevant information.

As mentioned earlier, AI, Machine Learning (ML), Robotic Process Automation (RPA) are subject to many misconceptions. Amongst the most common ones are:

  • AI is a physical entity – Yes, some people still think of AI as a Terminator-like robot. Well, HAL 9000 would almost seem like a more accurate description…
  • AI can think like a human-being – Replicating the complexity and flexibility of a human brain is way beyond existing AI technologies. “The biggest misconception around AI is that people think we’re close to it.” Prof. Gary Marcus, NYU
  • AI can learn on their own – You have to teach them how to learn and improve, tweaking their algorithms constantly and feeding them with data
  • AI is alsways objective AI can have massive biais too
  • AI will take your job – It doesn’t replace jobs, it makes them more strategic, according to many serious studies
  • AI can use and figure out your messy data – As seen above, you need to figure it out first.

Education is key, not only to raise awareness around what AI will bring to the business at an operational level, but also to get executives onboard, promote data literacy and manage their expectations. Support from the top-management is critical to drive change, get access to all data, overcome political hurdles and lead with vision.

Driving AI projects is everything but a one-off process. The best performing organisations have in-house experts and developers constantly improving models and algorithms, making sure they are updated on a frequent basis and adapting to the latest requirements and information.

Project Management methodologies also need to be adapted to this iterative approach, ensuring constant alignment with Business Outcomes and continuous improvement: Scrum, Agile

We have seen how strategic your data is, especially when it comes to AI. Data governance gives you a real competitive advantage, as it will directly impact the quality of your predictions and algorithms. An efficient data governance strategy must cover:

  • Security – Managing access to data
  • Integrity – Making sure the data is accurate
  • Loss prevention & backup – Mitigating data loss/corruption risks and protecting privacy, especially when it comes to sensitive information
  • Lineage – Keeping full transparency for everyone to understand where the information comes from and how it’s been processed, avoiding the “black box” effect
  • Completeness – Avoiding biais and partial views
  • Ethics – This is sometimes overlooked but we cannot emphasize enough how critical this is, in business as in our everyday life.