Share

In my previous posts, I was writing about Market Basket Analysis that you can perform in Oracle Autonomous Database and Frequent Itemsets that can also be created with Oracle Analytics

Association Rules blog 1

When working with Oracle Analytics, one important part of Market Basket Analysis was missing - Association Rules and Recommendations. Oracle Analytics from version 6.0 supports this feature, therefore complete Market Basket Analysis can be done now.

Let's take a look at the data flow we created for creation of Frequent Itemset. It consists of three steps:

Association Rules blog 10
As usual, we begin a data flow by reading data. In our case we read database data table CARBO_TRANSACTIONS. This table has relatively simple structure:

r02

For our analysis only BASKET and UPC columns are important.

In the second step, Database Analytics step with analytics operation Frequent Itemsets we define parameters for the algorithm.

In the Outputs, we define output columns. In our case we are looking only for combinations of the two product combination in a basket, which are specified as Item1 and Item2. Item3 and Item4 are not used.

Association Rules blog 9

In this same step we also define Parameters. And here is the difference from version 5.9.

Association Rules blog 8

The top 6 parameters are unchanged from before and are defining Frequent Itemset dataset:

  • Transaction Id Column is basically transaction identifier, in our case this is BASKET number.
  • Item Value Column is product identifier, in our case UPC. 
  • Maximum and Minimum Itemset Size define product combination sizes we are interested in. We are going to analyse only combinations of two items.
  • Minimum Support Percent filters out all item combinations with SupportPercentage less than the value set by this parameter. We had to go quite low do to low SupportPercentage for our frequent itemsets.
  • Top N Frequent Itemset specifies a number of top N itemsets ordered by SupportPercentage. In our case we left the default value of 50.
Additionally, there are three parameters more, and these define Association Rules:
  • Generate Association Rules parameter defines if association rules should be generated or no. 
  • Top N Association Rules parameter defines how many association rules are generated and stored. Please be aware that "top" is defined by Confidence and then Support.
  • Association Confidence Percent defines the cut off limit for minimal confidence of the rule generated. 

And here I hit on a problem. In case of large datasets with a lot of products and transactions, Support and Confidence Percents tend to be lower that 0.25 and 0.1 respectively. So in my case I didn't get any frequent itemsets and association rules generated at all. Which is quite annoying and tells me that I should go back to Autonomous Database and to my analysis there.

In order to reduce the number of rows I changed the data flow a bit and add a filter step to filter data based on a store. If we are looking at the best sales combinations and recommendation of two products, this has some results.

So here is data flow:

Association Rules blog 7

Original dataset is now filter using STORE value:

Association Rules blog 6

Data Analytics step stays unchanged and in the last step I am storing datasets into database:

Association Rules blog 5

There will be two tables created:

  • CARBO_BASKET_ANALYSIS and 
  • CARBO_BASKET_ANALYSIS_AR.

The first table, CARBO_BASKET_ANALYSIS, contains data about Frequent Itemsets:

Association Rules blog 4

and the second table, CARBO_BASKET_ANALYSIS_AR, is holding information about association rules and therefore product recommendations:

Association Rules blog 3

Yo can see the difference between the two. In the Frequent Itemset table, we are looking at particular combination of two items (it can go up to combinations of 4). Metrics that are presented are number of items in a combination, number of occasions of the combination and total number of frequent itemsets. This information gives us Support for the particular combination of items. 

In the Association Rules table we can see Antecedent Item(s), maximum 3, which are combined in an association rule with the Consequent Item. And there are also corresponding Support (for each of the items in a combination and for all items in a combination together), Confidence and Lift metrics. 

An at the end, we can visualise the model:

Association Rules blog 2

This concludes my writing about Market Basket Analysis, Frequent Itemsets and Association Rules. You can access previous blogposts using these links:

How to visualise Market Basket Analysis in Oracle Analytics

Oracle Analytics 5.9: Frequent Itemsets

Market Basket Analysis with Oracle Machine Learning and Oracle Analytics

Become a member

Newsletter

Receive our newsletter to stay on top of the latest posts.

Topics

Analytics

Ziga Vaupot

Author

Ziga Vaupot