Market Basket Analysis with Oracle Machine Learning and Oracle Analytics
Recommendations are usually always there to stimulate you to buy additional products or services regardless if you are shopping online or in a store. Sometimes these recommendations are just intuitions of sales manager, but more often these are prepared using some machine learning algorithm, for example, Association Rules.
When Association Rules modeling is applied to transaction sales data, the model is called Market-Basket Analysis. Market-Basket Analysis can be used in various situations, from direct marketing, sales promotions, discovering business trends, but also for effectively managing store layouts, catalog design, or for exposing cross-sell opportunities in a web store.
Since I am not going to discuss a theory about Market Basket Analysis and Association Rules, please find more information here:
Key Rule Metrics: Support, Confidence and Lift
Support is a measure that tells us how many times an itemset appears in a dataset.
Confidence of a rule indicates the probability of both the antecedent and the consequent appearing in the same transaction.
Confidence is the ratio of the rule support to the number of transactions that include the antecedent.
Both support and confidence must be used to determine if a rule is valid. However, there are times when both of these measures may be high, and yet still produce a rule that is not useful.
Lift indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent, given their individual support. It provides information about the improvement, the increase in the probability of the consequent given the antecedent.
Any rule with an improvement of less than 1 does not indicate a real cross-selling opportunity.
The data source: Transactions
The basis for Market-Basket Analysis is transactions. The goal of Market-Basket Analysis is to identify the products which are usually sold together.
We can use the results of the Market-Basket Analysis to make recommendations. For example, when shopping online, once a customer adds a product to her basket, then the recommendation engine would suggest buying a product that is most often sold together with the product that has already been added to the basket.
Data that is required is expected at the lowest possible data level, for example, POS Sales transactions.
Data that I am using in this example can be downloaded from Dunnhumby source files for Carbo-Loading (https://www.dunnhumby.com/source-files/). The data model that is imported into Oracle Autonomous Data Warehouse is as follows:
For our Market-Basket Analysis exercise, CARBO_TRANSACTIONS database table is relevant, with the following two columns:
- BASKET is simply basket ID and
- UPC which contains product codes.
Each basket can have one or more rows in this table, for example:
Market-Basket Analysis model
We will use Oracle Autonomous Data Warehouse and Oracle Machine Learning for PL/SQL to create a model.
To get to this, just navigate to Service Console / Development in OCI console for your ADW. Once you select Oracle Machine Learning Notebooks and provide your credentials, the Zeppelin notebook will open.
Start by creating a new Notebook.
Provide your new notebook a name:
Once your new notebook is created, you can begin. First, let’s a view that we will use as a basis for our model:
We can create a model now.
First, we specify some parameters, that will be used for the model creation:
- ALGO_NAME is an algorithm name to be used in model creation. In our case, this is Apriori.
- PREP_AUTO is a parameter that defines if there is additional automatic data preparation required. In most cases, this should be set to ON. The algorithm will then perform the necessary transformation before the actual data model is created.
- ASSO_MIN_SUPPORT defines minimum “cut-off” support. We can define this value to avoid running into too many samples with very rare occurrences.
- ASSO_MIN_CONFIDENCE defines minimum confidence in itemsets. It doesn’t make sense to set this value too high or too low as there could be only a couple of cases or too many.
- ASSO_MAX_RULE_LENGTH simply defines the size of the itemset. In our case, it is set to 3, which means we will look into all itemsets with size 2 or 3.
- ODMS_ITEM_ID_COLUMN_NAME is the column that contains information about (in our case) products.
When parameters are prepared, a model can be built using DBMS_DATA_MINING.CREATE_MODEL2 function call. Parameters that are required to create association rules model are the following:
- MINING_FUNCTION is ASSOCIATION
- DATA QUERY which retrieves data, in our case this is a view we created in the beginning,
- SET_LIST is a list of parameters defined above,
- CASE_ID_COLUMN_NAME is basically basket ID, in our case BASKET.
When the creation of the model is completed, then we can start reviewing it and we can start our analysis:
When the model is created, there are several database tables and views created as well. We need to be familiar with these tables and as they will be the basis for our further analysis.
You can find a more detailed explanation on Model Detail Views in Oracle Data Mining User’s Guide (https://docs.oracle.com/en/database/oracle/oracle-database/19/dmprg/model-detail-views.html), however, I would like to emphasize the two which I find the most important:
Model detail view for Transactional Rule describes the transactional rule view and transactional itemsets view:
- Itemsets: DM$VTmodel_name view provides the itemsets information in transactional format:
- Transaction Rules: DM$VAmodel_name view provides information about Transactional Rules:
In our example, these two views are:
- DM$VTAR_CARBO_TRANS and
Let’s check which are the most Frequent Itemsets.
We can see that ITEMSET_ID, which consists of two ITEMS (product UPC codes), 360001369 and 9999985029, has a support 0.00169, meaning that this combination is represented 0,169% of all itemsets in our transaction data table.
Let’s examine now what is Association Rule behind this Frequent Itemset:
We can see that RULE_CONFIDENCE is pretty high, and RULE_LIFT is high as well. So, we have a clear candidate for cross-sale.
We can also take ANTECEDENT_PREDICATE (for example product with UPC 3620001369) and find out which products are also sold very often with that product, which can be treated as clear recommendations for cross-sale.
Market-Basket Analysis visualization and analysis
In the end, it would be great if we could bring the Market-Basket Analysis results into Oracle Analytics and visualize the whole thing. It is not to expect that regular “category managers” would be skilled enough to write SQL code to retrieve recommendations from model views.
In the dashboard below, we can see the Top 25 Itemsets by Itemset Support. By clicking on Itemset ID in the top left bar chart, detailed information can be displayed.
We can see Itemset ID 8561 with two known products, PRIVATE LABEL FETTUCCINI (UPC: 9999985029) and RAGU\CHZ CREATION ALFRDO (UPC: 3620001369).
At the bottom of the screen, we can examine the correlation between the two products in terms of sales value, units sold and # of households buying these two products.
In the middle section, we can see Association Rules for the selected Itemset. You can observe that information provided is very similar to the one we saw in the model view, DM$VAAR_CARBO_TRANS, we used in the previous step.
To further analyze association rules in our model, we have two action links defined to navigate to Recommendations for Antecedent and Recommendations for Consequent canvases.
By activating the action menu that takes us to Recommendations for Antecedent, we can analyze which products can be recommended to a buyer when he already picked RAGU\CHZ CREATION ALFRDO (UPC: 3620001369):
Sankey's chart presents these recommendations really well. You can now see by the darkness and width of the lines, which other products can be recommended for the initially selected product. We can see that there is another product that has a higher RULE_LIFT than the one we observed initially. This can happen as we initiated our analysis by looking at ITEMSET_SUPPORT.
We can now continue with our analysis to explore which products are sold well together with that other product, RAGU CHS CREATN PARM MOZZ (product UPC: 2620000217).
By activating another Data Action, Recommendations for Consequent, this time in the first line of the table on the right, we can get recommendations for RAGU CHS CREATN PARM MOZZ (product UPC: 2620000217):
We could go on now analyzing all these relationships. But the other question is: how it is done in backend?
I created a data model using the Data Modeler tool, which is part of Oracle Analytics and I came up with the following model:
In the model above, I have imported Frequent Itemset model view and Associations Rules model view as Fact tables.
Association Rules is linked, on logical level, to Antecedent and Consequent Product dimension tables, which enables us navigation in both directions. For the Frequent Itemsets fact table, an additional Itemsets dimension table is created which is also the dimension in the Association Rules submodel. The function of this dimension is also a linkage between Frequent Itemsets and Business Rules.
The model has been created using Data Modeler tool, Model Administration Tool is used only for data model visualization.