Basic Concepts and Algorithms of Association Analysis
Association analysis is the process that expects transactions to be in a particular format. The input grid data should be binomial with each column having items and a row as a transaction. Transactions that contain IDs can be tagged or ignored as a special attribute. If a dataset is in another format, it should be to this transactional format. These can be done by the use of the data transformation operators. The output of the conversion fro, numerical to binomial gets connected to the FP- Growth operator which is supposed to generate frequent itemsets(Addi, et al.,2015).
Basic association analysis only deals with the occurrence of items with another. Other analyses may deal with the occurrence of quantity, sequence, and price of the occurrence among others. There are various steps to finding the association rules by use of data mining(Addi, et al.,2015). The first step involves preparing the data into the transactional format since association algorithms require input data to be in a specific format. The second step involves short-listing item sets that occur frequently. Frequently occurring items are limited by the association algorithms. The last step involves generating association rules that are relevant from the item sets therefore the algorithms filter rules based on the interest measure(Yuan, 2017).
Item sets can occur in the consequent or antecedent portion of the rule. The sets should not be jointed. This means that the item sets should not have a common item on both sides. Item sets with more than one item can be tested for the strength between the relationships since there is an increase in the permutations of rules. Association rule strength is quantified by confidence and support measures of the rule. Lift and Conviction measures can also be used. The measures to be used are determined by the occurrence of a particular item set in transactions (Yuan, 2017).
The support of an item refers to the relative frequency of occurrence in the transaction set. The measure of how items in a rule are represented in transactions is known as the support measure for a rule. The support measure show which items occur mostly hence uncovering the patterns that are worth investigating. The likelihood of occurrence of an item set is measure by the confidence of a rule. The confidence of a rule provides the reliability of the rule(Addi, et al.,2015).
References
Addi, A. M., Tarik, A., & Fatima, G. (2015, May). Comparative survey of association rule mining algorithms based on multiple-criteria decision analysis approach. In 2015 3rd International Conference on Control, Engineering & Information Technology (CEIT) (pp. 1-6). IEEE.
Yuan, X. (2017, March). An improved Apriori algorithm for mining association rules. In AIP conference proceedings (Vol. 1820, No. 1, p. 080005). AIP Publishing LLC.