Market Basket Analysis is used for associations among items in transactional data. This is very helpful in recommender systems with high usage in retail and shopping platforms and companies. Market basket analysis is also referred to as affinity analysis.
Market Basket Analysis is used to extract informative insights from large datasets on a large scale. Various Big data problems can be solved with this Machine learning technique.
Association Rules-Market Basket Analysis
An association rule must be defined for this analysis which implies that if an item X occurs, then item Y also occurs with a certain probability and can be associated with the first item. Frequent Itemsets can be found using the apriori algorithm. Apriori algorithm is also used to get information from a large dataset into a manageable dataset by finding out the associations that meet a particular business/analytical criterion. The results that we get by using apriori algorithms are very easy to understand by any human.
Market Basket analysis is used to get association sets. An example of association rule is:
{bread, fruit jam}_->{butter}
{Shaving Cream, Shaving razor}_->{After Shave Lotion}
{Trousers, Shoes}_->{Socks}
Apriori Algorithm
Apriori Algorithm for association based rule learning is an unsupervised technique that can be used for association rule mining. It reduces the task of evaluating the high volume data by assuming a prior guideline/rule for removing the unnecessary space search. Please note that this algorithm is not recommended for smaller datasets.
An itemset is defined for storing the frequent sets of data. Apriori follows a rule which states that:
all subsets of a frequent itemset must also be frequent. Using this rule this algorithm finds out associations. So even if one of the item in itemset is not frequent, it is not added into the frequent items.
in apriori algorithm support, confidence and lift are calculated which determine an association rule. Against any items, support can be calculated as:
support=count(A)/N {support of A resulting in the purchase of B}
where,
count(A) is the number of itemsets where A is present,
N=total number of transactions recorded in a database
confidence(A->B)=support(A,B)/support(A) { confidence of A resulting in the purchase of B}
where,
support(A,B) is the support for itemsets where A and B both occurs.
support(A) is the support where only A is present.
lift(X->Y)=confidence(X->Y)/support(Y)
Please note that lift(X-Y)=lift(Y->X) which is not the case of confidence and support.
Lift is a very important metric in association rule mining as it showcases the the measure the how likely will an item will be purchased relative to a given purchasing rate. A high lift value can show an analyst that the rule in consideration is important and will highly result in a greater connection between the items in the rule.
> confidence calculates the accuracy of the of an association rule.
> A rule is said to be strong iff when they have high support and confidence. Apriori algorithm uses these strong rules to reduce the data associations so that it can be understood by the human mind. Apriori algorithm’s first step is to identify all the frequent itemsets. Once these frequent itemsets are identified by the algorithm, rules are created after a threshold has been set for the confidence measure.
Basic Steps in Apriori algorithm
- Data Collection and exploration
- Data preparation for further processing
- Creating a sparse matrix data structure for processing transactions (each row in sparse matrix defines a transaction : Use arules library in R programming for faster creation.
- To understand the data in depth, visualise the data to get some important insights by using Item frequency charts. Sparse matrix can also be plotted to get further insights on transactional data.
- Training the model using apriori algorithm.: To train the model on a data, we can use apriori function in the arules library for R programming language. In python, mlxtend library’s apriori function can be used for the same to get frequent itemsets.
- Evaluating the model’s performance and improving the performance. The resulting association rules can be applied to a business for marketing and recommending purposes.
- Set a minimum threshold and extract subsets of association rules meeting that threshold from the 6th step.
As we have seen association rules are very important for market basket analysis. They have and are being used by various companies in various marketing campaigns. This technique is highly optimal for large retail chains and businesses where the data available is available in large quantities.
Be First to Comment