Share

In my previous posts, I was writing about Market Basket Analysis that you can perform in Oracle Autonomous Database and Frequent Itemsets that can also be created with Oracle Analytics

Association Rules blog 1

When working with Oracle Analytics, one important part of Market Basket Analysis was missing - Association Rules and Recommendations. Oracle Analytics from version 6.0 supports this feature, therefore complete Market Basket Analysis can be done now.

Let's take a look at the data flow we created for the creation of Frequent Itemset. It consists of three steps:

Association Rules blog 10
As usual, we begin a data flow by reading data. In our case, we read the database data table CARBO_TRANSACTIONS. This table has a relatively simple structure:

r02

For our analysis only BASKET and UPC columns are important.

In the second step, Database Analytics step with analytics operation Frequent Itemsets we define parameters for the algorithm.

In the Outputs, we define output columns. In our case, we are looking only for combinations of the two product combinations in a basket, which are specified as Item1 and Item2. Item3 and Item4 are not used.

Association Rules blog 9

In this same step, we also define Parameters. And here is the difference from version 5.9.

Association Rules blog 8

The top 6 parameters are unchanged from before and are defining Frequent Itemset dataset:

  • Transaction Id Column is basically transaction identifier, in our case, this is BASKET number.
  • Item Value Column is product identifier, in our case UPC. 
  • Maximum and Minimum Itemset Size define product combination sizes we are interested in. We are going to analyse only combinations of two items.
  • Minimum Support Percent filters out all item combinations with SupportPercentage less than the value set by this parameter. We had to go quite low do to low SupportPercentage for our frequent itemsets.
  • Top N Frequent Itemset specifies the number of top N itemsets ordered by SupportPercentage. In our case, we left the default value of 50.
Additionally, there are three parameters more, and these define Association Rules:
  • Generate Association Rules parameter defines if association rules should be generated or not. 
  • The top N Association Rules parameter defines how many association rules are generated and stored. Please be aware that "top" is defined by Confidence and then Support.
  • Association Confidence Percent defines the cutoff limit for minimal confidence of the rule generated. 

And here I hit on a problem. In the case of large datasets with a lot of products and transactions, Support and Confidence Percents tend to be lower than 0.25 and 0.1 respectively. So in my case, I didn't get any frequent itemsets and association rules generated at all. Which is quite annoying and tells me that I should go back to Autonomous Database and to my analysis there.

In order to reduce the number of rows, I changed the data flow a bit and add a filter step to filter data based on a store. If we are looking at the best sales combinations and recommendations of two products, this has some results.

So here is the data flow:

Association Rules blog 7

The original dataset is now filtered using STORE value:

Association Rules blog 6

The Data Analytics step stays unchanged and in the last step I am storing datasets into the database:

Association Rules blog 5

There will be two tables created:

  • CARBO_BASKET_ANALYSIS and 
  • CARBO_BASKET_ANALYSIS_AR.

The first table, CARBO_BASKET_ANALYSIS, contains data about Frequent Itemsets:

Association Rules blog 4

and the second table, CARBO_BASKET_ANALYSIS_AR, is holding information about association rules and therefore product recommendations:

Association Rules blog 3

You can see the difference between the two. In the Frequent Itemset table, we are looking at a particular combination of two items (it can go up to combinations of 4). Metrics that are presented are the number of items in a combination, the number of occasions of the combination and the total number of frequent itemsets. This information gives us Support for the particular combination of items. 

In the Association Rules table, we can see Antecedent Item(s), maximum 3, which are combined in an association rule with the Consequent Item. And there are also corresponding Support (for each of the items in a combination and for all items in a combination together), Confidence and Lift metrics. 

An at the end, we can visualise the model:

Association Rules blog 2

This concludes my writing about Market Basket Analysis, Frequent Itemsets and Association Rules. You can access previous blogposts using these links:

How to visualise Market Basket Analysis in Oracle Analytics

Oracle Analytics 5.9: Frequent Itemsets

Market Basket Analysis with Oracle Machine Learning and Oracle Analytics

Become a member

Newsletter

Receive our newsletter to stay on top of the latest posts.

Topics

Analytics

Ziga Vaupot

Author

Ziga Vaupot