Association Rules
Machine Learning
Jesus A. Gonzalez
February 1, 2020
Data Mining with Association Rules
- Find associations or correlations between the elements or objects from transactional database, relational, or datawarehouses
Applications
- Support for decision making
- Alarm diagnosis and prediction in telecommunications
- Sales information analysis
- Catalog design
- Merchandise distribution in stores
- Customer segmentation based on buying patterns
Association Rules
- Similar to classification rules
- Also found with a covering procedure
- Main difference
- In the right hand of the rules we can find any attribute-value pair or pairs
- In order to find this type of rules
- Consider each possible combination of attribute-value pair in the RHS
Association Rules
- After that we prune
- Using coverage
- Number of correctly predicted instances
- Using precision
- Proportion of number of instances to which the rule applies
Example
- Find the association rules \(X \& Y \rightarrow Z\) with the restrictions of having a minimum confidence and support
- Rules with
- Minimum suport of 50%
- Minimum confidence of 50%
- \(A \rightarrow C\) (50%, 66%)
- \(C \rightarrow A\) (50%, 100%)
Association Rules
- An association rule is an expression of the form \(X \rightarrow Z\) where \(X\) and \(Z\) are sets of elements
- Intuitive meaning
- Transactions in the database that contain \(X\) tend to contain \(Z\)
- A minimum level of confidence and support is required
Definitions
- \(I = \{i_1, i_2, i_3, \dots, i_m\} \rightarrow\) a set of literals, attributes
- \(D \rightarrow\) a set of transactions \(T\), \(T \subseteq I\)
- \(TID \rightarrow\) an identifier associated to each transaction
- \(X \rightarrow\) a set of elements \(X \subset I\)
- An association rule is an implication:
- \(X \rightarrow Z\)
- \(X \subset I\)
- \(Z \subset I\)
- \(X \cap Z = \emptyset\)
Definitions
- Support (or coverage), \(s\), is the probability that a transaction contains \(\{X, Y, Z\}\)
- Confidence (or efficiency), \(c\), is the conditional probability that a transaction containing \(\{X, Y\}\) also contains \(Z\)
Rule Evaluation
- We evaluate the rules according to their support and confidence
- In association rules, coverage is known as
- Precision is known as
Rule Evaluation
Can be read as
- \(support(X \rightarrow Z) = P(X Z) =\)
- \(\frac{\#Trans-with-elements-in-X-and-Z}{\#Total-transactions}\)
Rule Evaluation
- Can be read as
- \(confidence(X \rightarrow Z) = P(Z | X)\)
Rule Evaluation
- Can be read as
- \(confidence(X \rightarrow Z) =\)
- \(\frac{support(X Z)}{support(X)} =\)
- \(\frac{\#Transactions-containing-X-and-Z}{\#Transactions-containing-X}\)
Rule Evaluation
- We want rules with a minimum support and confidence
- support \(\geq\) min_supp
- confidence \(\geq\) min_conf
- We look for (independently of the side in which appear)
- Attribute-value pairs that cover a large number of instances
- The sets of attribute-value pairs are called
- Each attribute-value pair is called
Example
Example
- A typical example of association rules
- Market basket analysis
- Find associations from the customers’ products
- May impact marketing strategies
- After generating all sets of itemsets
- Transform them into rules
- With the minimun confidence required
- Some items produce more than one rule and others just none
Example
- Following the weather example
- Itemset: \(humidity = normal\), \(wind = no\), \(class = P\)
- Produces the following rules
- If \(humidity = normal\) & \(wind = no\) Then \(Class = P\) 4/4
- If \(humidity = normal\) & \(class = P\) Then \(Wind = no\) 4/6
- If \(wind = no\) & \(class = P\) Then \(humidity = normal\) 4/6
- If \(humidity = normal\) Then \(wind = no\) & \(class = P\) 4/7
- If \(wind = no\) Then \(class = P\) & \(humidity = normal\) 4/8
- If \(class = P\) Then \(wind = no\) & \(humidity = normal\) 4/9
- If \(true\) Then \(humidity = normal\) & \(wind = no\) & \(class = P\) 4/12
Example
- If we think about 100% of success, only the first rule complies
- There are 58 rules considering the whole table that cover 2 examples with 100% of accuracy
Algorithm
- The algorithm follows 2 steps
- Apriori (Agrawal et al. 1994)
- Generate all itemsets with one element
- Uses these to generate those with 2 elements and so on
- Takes all possible pairs with minimum support
- Allows removing possible combinations (prune)
- Generate rules verifying if they have the minimum confidence
Algorithm
Algorithm
- AprioriGen generates candidates \(C_k\) (of size \(k\)) from the frequent itemsets of size \(k-1\)
- Subset determines which of the candidate itemsets are actually frequent in each pass (re-scans and compares)
Algorithm
Algorithm
Algorithm
Example
- Consider a table with shopping lists
- Process with the data of the previous table
- Candidate genertion per levels
Enhancements (for efficieny)
- Some enhancements to the basic algorithm Apriori in order to make it more efficient
- Use hash tables to reduce the number of itemsets when generating candidates
- An Effective Hash-based Algorithm for Mining Association Rules
- Eliminate transactions (elements from the database) that do not contribute to the supersets to consider
- Divide the transactions in disjunct partitions, evaluate the local itemsets and then, based on those results, estimate the global results
Enhancements (for efficiency)
- Create approximations with samples in the list of products, avoiding reading all the data
- Avoid generating candidates using alternative data structures
- FP-trees (Frequent Pattern tree)
Extensions
- Finding association rules at different levels of abstraction
- Usually starts with upper level classes
- Results can be used to filter lower level classes
- i.e. Association rules over computers and printers
- Then over laptops and workstations in one side and lasser and inkjet on the other, and so on
- When proceeding to the subclasses we may consider
- A uniform support criterion
- Reducing the criterion for the subclasses
- Consider all subclasses independently of the support criterion
- When finding association rules at different levels of abstraction
- It’s common generating redundant rules
- Rules that don’t add knowledge (i.e. a more general rule already means the same)
- It’s necessary to incorporate filtering mechanisms
Extensions
- Finding association rules by combining information from multiple tables or multi-dimensional association rules
- Datacubes may be used to find multi-dimensional association rules
Extensions
- Originally, association rules work with discrete attributes
- Mechanisms to deal with continuous attributes have been created
- Discretize before mining, using ranges from possible predefined hierarchies
- Discretize dynamically during the process, trying to maximize some confidence criterion or reduction of the size of the rules
- i.e. ACRS Association Rule Clustering System
- Maps quantitative attributes into a mesh and then uses clustering
- First assign data to containers delimited by ranges (could be changed at a later time)
- More common schemes: containers of the same size, with the same number of elements, with elements uniformly distributed
- After that, find association rules using the containers
- Then, rules may be grouped if they conform larger rectangles in the mesh
- Discretize using semantic information
- i.e. create groups with elements that are close to each other (i.e. with clustering over the attributes)
- Once clusters have been found, find association rules from those clusters
- Based on similarity measures
Association versus Correlation
- Finding an association rule doesn’t necessarily mean it will be useful
- i.e. We analyze 10,000 shopping tickets
- 6,000 of them bought videogames
- 7,500 bought dvds
- 4,000 both of them
- We might generate the rule: buy videogames \(\rightarrow\) buy dvds
- [supp = 4,000/10,000 = 40%, conf = 4,000/6,000 = 66%]
- But if 75% buy dvds, then buying videogames reduces the possibilities of buying dvds
Association versus Correlation
- The occurrence of an itemset A is independent of another, B if \(P(AB) = P(A)P(B)\)
- Otherwise there exists a dependence or correlation
- Correlation between two events:
- \(corr_{A,B} = \frac{P(AB)}{P(A)P(B)}\)
- A value smaller than 1 indicates that the occurrence of one decreases the occurrence of the other
- A value larger than 1 indicates that the occurrence of one increases the occurrence of the other
- A value of 0 indicates no relation at all, (they are independent)
Association versus Correlation
- Now we can find correlated association rules
- May estimate if that correlation is significative with an statistical test \(X^2\)
- If a set of elements is correlated, any superset of this will be as well
- This may help to find the minimum correlated sets and build their supersets from them
Use of Restrictions
- We may use restrictions over the types of data, hierarchies, or possible forms of the rules to find in order to reduce the search space
- Restrictions may be
- Antimonotonic
- If a set doesn’t satisfy a condition, then its supersets won’t satisfy it either
- Monotone
- If a set satisfies a restriction, then all its supersets will also satisfy it
- Succint
- We can ennumerate all the sets that satisfy a restriction
- Convertible
- We can convert a restriction to one of the previous classes
- Non-convertible
Association Rules vs Classification Rules vs Decision Trees
- Association rules versus Classification rules
- Dependencies exploration vs Focused prediction
- Different combinations of dependent or independent attributes vs Prediction of an attribute (class) from other attributes
- Complete search (all rules found) vs Heuristic search (finds a subset of rules)
Association Rules vs Classification Rules vs Decision Trees
- Trees use an evaluation heuristic over an attribute
- Based on splitting
- Usually produce overfit followed by pruning
- Classification rules use an evaluation heuristic over an attribute-value pair condition
- Based on covering
- Using an stop criterion (sometimes overfit and then prune)
- Association rules are based on support and confidence measures
- Consider any set of attributes with any other set of attributes