
Multiple-level Association Rules 문제 풀때는 같은 level끼리 rule 만들어야 한다!

- closed cares about same support,
- maximal cares only about frequent superpattern existence.
why can we safely prune supersets of an infrequent itemset?
any subset of a frequent itemset must also be frequent. if some itemset is infrequent, none of its supersets can be frequent. So, this downward closure property is the reason.
1. What is the Apriori property?
If an itemset is frequent, all of its subsets must also be frequent.
If there is any infrequent itemset, its superset should not be generated/tested.
2. Why is Apriori inefficient?
Because it generates a large number of candidate itemsets and repeatedly scans the database.
핵심 기능 - prune any candidate whose subset is infrequent.
DIC (Dynamic Itemset Counting)
DIC reduces the number of scans by starting support counting for larger itemsets as soon as enough evidence becomes available. The slides illustrate that once both A and D are found frequent, counting for AD can begin; similarly, once all size-2 subsets of BCD are frequent, counting for BCD can begin.
Main benefit: fewer database scans.
Partition : every globally frequent pattern must be frequent in at least one partition.
Main benefit: only two scans of the DB.
Important caution: local frequent does not necessarily mean globally frequent, so verification is required.
Sampling mines patterns from a sample DB using a reduced minimum support.
problems: some patterns frequent in the sample are not truly frequent globally => 이거는 위 문제랑 같은데
some truly frequent global patterns may be missed if absent in the sample. => two more scans for verification including checking the set of mined frequent itemsets and their negative border (NB).
Main benefit: smaller working data.
Main risk: missed patterns or false positives.
DHP (Direct Hashing and Pruning)
DHP hashes size k+1 itemsets while generating Lk. Then, when forming candidates, it checks the hash bucket counts first. If a bucket count is below min_sup, the candidate cannot be frequent and is pruned.
Main benefit: reduces the number of candidates.
3. How does DHP improve Apriori?
DHP uses hashing to prune candidate itemsets early, reducing the number of candidates.
4. Why is FP-Growth more efficient than Apriori?
- No candidate generation
- Compresses database using FP-tree
- Only scans database twice
The main idea is:
- grow long patterns from short ones using local frequent items,
- construct a global FP-tree,
- then use divide-and-conquer to recursively grow patterns through conditional pattern bases and conditional FP-trees.
global FP-tree construction in three steps:
- scan the DB once to find frequent 1-itemsets,
- sort them by descending frequency to get the F-list,
- scan the DB again, reorder each transaction by the F-list, and build the FP-tree.
5. What is a conditional pattern base?
A collection of prefix paths leading to a given item in the FP-tree.
6. How does FP-Growth use divide-and-conquer?
It recursively constructs conditional FP-trees for each item, solving smaller subproblems.
- completeness: preserves complete information for frequent pattern mining,
- compactness: removes infrequent items and shares common prefixes.
7. What is a closed pattern?
An itemset that has no superset with the same support.
8. What is a maximal pattern?
An itemset that has no frequent superset.
9. What is the main idea of CLOSET?
To mine only closed itemsets by merging itemsets with identical support.
CLOSET mines closed patterns while running FP-Growth. FP-Growth naturally supports closed pattern mining. Instead of outputting every frequent itemset, CLOSET focuses on the closed ones.
When useful: when we want lossless compression of frequent patterns.
10. What is the key idea of MaxMiner?
It finds maximal patterns by testing large itemsets first and pruning their subsets.
MaxMiner is based on Apriori, but it aims to find only max-patterns, not all frequent patterns. It tries to reduce candidates aggressively by searching potential max-patterns and using a set-enumeration tree.
When useful: when we want a very compact summary and do not need all frequent subsets.
CHARM uses a vertical data format: each itemset is represented by the set of transaction IDs containing it. Frequency is then just the length of the TID list, and larger candidates can be obtained by intersection of TID sets.
Why it works well: intersection is simple, and vertical format can be efficient for support counting.
lower-level items should often use lower support thresholds. This is called a flexible support setting. The goal is to make lower-level and higher-level items similarly likely to appear as frequent.
redundancy filtering : some rules are redundant due to ancestor relationships.
confidence is high simply because the consequent is very common overall. Then Lift solve this problem.
11. Why can confidence be misleading?
Because it does not consider the base probability of the consequent.
12. What are the main steps in data preparation?
- Data cleaning : to reduce noise and handle missing values
- Feature selection : to remove irrelevant or redundant attributes
- Data transformation including generalization and normalization
Evaluation points include:
- accuracy
- speed (training and testing time)
- robustness
- scalability
- interpretability
Decision Tree
A decision tree is a graphical model of decisions under conditions:
- each internal branch node represents a test,
- each leaf node represents a class decision.
The induction process is:
- greedy,
- top-down,
- recursive,
- divide-and-conquer.
At each step:
- choose the best feature,
- split the data,
- stop if the node is pure or no attributes remain,
- otherwise recurse.
the stopping conditions:
- all examples at a node belong to the same class,
- or there are no remaining features, so majority voting is used.
13. What is the goal of information gain in ID3?
Information gain measures the reduction in entropy after splitting the data using a feature. The feature with the highest information gain is selected.
14. What does entropy represent?
Entropy measures the impurity or randomness of a dataset. A higher entropy indicates more mixed classes.
the best feature is the one giving the largest reduction in entropy.
15. When does decision tree splitting stop?
- All instances belong to the same class
- No remaining features → use majority voting
16. Why do we use gain ratio instead of information gain?
Because information gain is biased toward attributes with many values. Gain ratio normalizes this using split information.
Overfitting :
- if you keep splitting until training data is perfectly classified,
- you can get 100% training accuracy,
- but that may be overfitting. => tree pruning : The point of pruning is not to improve training accuracy. It is to improve generalization by removing overly specific branches.
Random Forest is an ensemble of 500 to 10,000 decision trees. Each tree is trained on a bootstrapped sample, and during tree construction a small random subset of candidate features is considered at each recursion.
17. What are the two sources of randomness in Random Forest?
- randomness in the training data through bootstrap sampling,
- randomness in the feature subset considered at each split.
18. Can an itemset be maximal but not closed?
No. A maximal itemset cannot have any frequent superset, so it cannot have a superset with the same support. Therefore, all maximal itemsets are also closed.
19. In Random Forest, why does feature randomness improve performance?
It reduces correlation between trees, increasing diversity and improving generalization.
20. What happens if entropy is zero?
All instances belong to the same class → no further splitting needed.
21. Why do we discretize quantitative attributes?
Because association rule mining requires categorical data, and discretization groups numeric values into intervals.

Rule-Based Classification => uses IF–THEN rules
if multiple rules fire, conflict resolution is needed.
- size ordering,
- class-based ordering,
- rule-based ordering / decision list.
rules extracted from a decision tree are mutually exclusive
rules mined from association rules are not necessarily mutually exclusive
Associative Classification : It relies on support and confidence thresholds to find strong associations between feature-value conjunctions and the class label.
- benefit: can explore highly confident multi-attribute associations,
- limit: may suffer from low coverage depending on min_sup and min_conf.

KNN is the classic lazy learner:
- find the k nearest neighbors,
- classify by majority vote or weighted voting.
- lazy learning needs less training time, but more prediction time.

holdout method:
- randomly split data into training and test sets,
- repeat holdout multiple times,
- average the results.
Bagging
- repeatedly sample training sets with replacement,
- train one classifier on each bootstrap sample,
- final prediction is by majority vote.
Boosting
- begin with equal weights on data points,
- after each model, increase attention to previously misclassified points,
- final voting weights depend on classifier accuracy.
'학교수업 > 데이터 사이언스' 카테고리의 다른 글
| Exercises (0) | 2026.04.13 |
|---|---|
| Evaluation Protocols (0) | 2026.04.13 |
| Random Forest (0) | 2026.04.09 |
| [6-1] Classification part 2 (0) | 2026.04.06 |
| [5-2] Decision Tree + Random Forest (0) | 2026.04.01 |