Data Mining e Scoperta di Conoscenza


Libri di testo:

TSK = Tan, Steinbach, Kumar, Introduction to Data Mining, Wiley, 2006.

TM = Tom Mitchell, Machine Learning. McGraw Hill, 1997.

WF = Ian Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2000

PC = R. Duda, P. Hart, D. Stork. Pattern Classification, Wiley, 2001.

HK = J. Han and M. Kamber, Data Mining Techniques, Morgan-Kaufman, 2000.

MS = D. Hand, H. Mannila, P. Smyth.Principles of Data Mining, MIT Press, 2001.

HA = S. Haykin, Neural Networks, Prentice Hall, 1999.

PY = D. Pyle, Data Preparation for Data Mining, Morgan-Kaufman, 1999.

Schedule:

Data Argomenti Materiale Docente Approfondimenti
16 Gennaio 2007 Introduzione. Caratterizzazione del Knowledge discovery come processo. Lucidi [pdf ]

Dispense [pdf]

Manco HK, cap.1; MS, cap. 1
U. Fayyad and others, "From Data Mining to Knowledge discovery in Databases".
Applicazioni di Data Mining
S. Chaudhuri, U. Dayal, V. Ganti. "Database technology for Decision Support Systems".
17 Gennaio 2007 Data Preprocessing. (Data Selection, Information Gathering, Data Visualization).
Lucidi [pdf ]

Locane HK, cap.2-3; MS cap.2-3; PY.
Exploratory data Analysis at NIST.
Codd, "Providing OLAP to the user analyst: An IT mandate".
S. Chaudhuri, U. Dayal, "An Overview of Data warehouse and OLAP technology".
E. Galhardas and others, "Declarative data cleaning: Languages, models and algorithms".
M .Hernandez, S. Stolfo, "Real-world data is dirty".
H. Lee and others, "Cleansing data for mining and warehousing".
18 Gennaio 2007 Introduzione alla classificazione. Concept Learning

Lucidi [pdf ]

 

Manco

TM, cap. 2.

H. Hirsch. "Polynomial-Time Learning with Version Spaces".

23 Gennaio 2007 Data Cleaning, Data Reduction, WEKA.
Lucidi [pdf ]

Locane Weka è disponibile da questo sito.
24 Gennaio 2007 Discretizzazione (Chi-merge), WEKA.

Lucidi [pdf]

algoritmo ChiMerge [java]

Locane

H. Liu and others, "Discretization: an enabling technique".
J. Dougherty, R. Kohavi, M. Sahami, "Supervised and Unsupervised discretization of Continuous features".

25 Gennaio 2007 Alberi di Decisione: Algoritmo ID3 Lucidi [pdf]

Dispense [pdf

Manco

TSK, cap.4.

R. Holte, "Very simple classification Rules perform well on most commonly used datasets

HK, cap. 7, TM, cap. 3.
M. Mehta, R. Agrawal, J. Rissanen, "SLIQ: A scalable Decision-Tree classifier for Data Mining"
Freund, Y., Mason, L, "The alternating decision tree learning algorithm".
L. Breiman, "Random Forests".
J. Gehrke, R. Ramakrishnan, V. Ganti, "RainForest: A Framework for Large Decision Tree Construction of Large DataSets"
Lim, Loh, Shih, "An Empirical Comparison of Decision Trees and Other Classification Methods"

30 Gennaio 2007 Alberi di Decisione: Algoritmi CHAID, CART

Lucidi [pdf]

Dispense [pdf

Algoritmo CHAID [java]

Algoritmo CART [java]

Manco  
31 Gennaio 2007 Alberi di decisione: estensioni, valutazione Lucidi [pdf Manco
E. Frank et al. "Using Model Trees for Regression".
A. Moore, M. Lee, "Efficient Algorithms for minimizing Cross-Validation Error". 
1 Febbraio 2007 Regole di Classificazione

Lucidi [pdf

Manco  
6 Febbraio 2007 Probabilità e statistica Lucidi [ppt] Locane  
7 Febbraio 2007 Statistica. Distribuzioni, funzioni di densità Lucidi [ppt] Locane  
8 Febbraio 2007 Altri metodi di classificazione . Classificazione Bayesiana Lucidi [pdf] Manco

C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition".

TM, cap. 3; HA cap.3-4;PC cap.5.1-5.5,6.1-6.8.
J. Elder, J. Pregibon, "A statistical Perspective on Knowledge Discovery in Databases".
G. John, P. Langley, "Estimating Continuous Distributions in Bayesian Classifiers".
A. Mccallum, K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification".

13 Febbraio 2007 Classificazione Bayesiana Lucidi [pdf] Manco P. Langley et al. "An Analysis of Bayesian Classifiers".
Webb. Boughon, Wang, "Not so Naive Bayes".
J. Provost, "Naive Bayes vs Rule Learning for E-mail classification".
R. Kohavi, "Scaling up the accuracy of naive-Bayes classifiers: a decision tree hybrid".
14 Febbraio 2007 Model evaluation Lucidi [pdf] Locane T. Fawcett, "ROC Graphs: Notes and practical Considerations for data mining researchers".
C. Ferri, P. Flach, J-H. Orallo, "Lerning Decision Trees using the area under the ROC Curve".
14 Febbraio 2007 Esercitazione su classificazione Esercizi [zip] Locane  
21 Febbraio 2007 Introduzione al Clustering Lucidi [pdf] Manco

HK. cap.9.

TSK, cap. 8.


A. K. Jain and R. C. Dubes. "Data Clustering: A review". 
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
R. Ng and J. Han. Efficient and effective clustering method for spatial data mining.
Fayyad U., Reina C., Bradley P. S. "Initialization of Iterative Refinement Clustering Algorithms",
A. Strehl, J. Gosh, R. Mooney, "Impact of Similarity Measures on Web Document Clustering".

22 Febbraio 2007 Esercitazione Esercizi [zip] Locane  
27 Febbraio 2007 Clustering. Density-Based Clustering Lucidi [pdf] Manco HK. cap.9.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases.
Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. "Optics: Ordering points to identify the clustering structure".
D. Fisher. "Knowledge acquisition via incremental conceptual clustering".


28 Febbraio 2007 Clustering gerarchico. Esercitazione su clustering

Lucidi [pdf]

Esercizi [zip]

Locane

S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases

S. Guha, R. Rastogi, and K. Shim: "ROCK: A robust clustering algorithm for categorical Data".

T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH : an efficient data clustering method for very large databases.

1 marzo 2007 Il sistema Weka Lucidi [pdf] Scordio Wiki di Weka.
6 marzo 2007 Regole Associative. L'algoritmo Apriori Lucidi [pdf] Manco

HK. cap.6; SL cap. 14

TSK, cap. 6 .

R. Agrawal, T. Imielinski, and A. Swami.  Mining association rules between sets of items in large databases.  
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules.
Ashoka Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases.
J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules.
H. Toivonen.  Sampling large databases for association rules.  . (citeseer)
R. Srikant and R. Agrawal. Mining generalized association rules.
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables.
S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations.
D. Tsur, and others. Query flocks:  A generalization of association-rule mining.  (citeseer)
Y. Aumann and Y. Lindell. A Statistical Theory for Quantitative Association Rules.
A. Savasere, E. Omiecinski, S. B. Navathe, Mining for Strong Negative Associations in a Large Database of Customer Transaction.
E. Omiecinski. Alternative Interest Measures for Mining Associations.
Y. Xu, J. X. Yu, G. Liu, H. Lu, From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns.
G. Liu, H. Lu, W. Lou, J. X. Yu , On Computing, Storing and Querying Frequent Patterns.
B. Goethals, M. Zaki: FIMI: Workshop on Frequent Itemset Mining Implementations (An Introduction).  

7 marzo 2007 Esercitazione su regole associative e clustering

Esercizi [zip]

Locane  
13 marzo 2007 Association Rules: estensioni, L'algoritmo FPGrowth Lucidi [pdf] Manco

J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation.

R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. (citeseer)

J. Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining Top-K Frequent Closed Patterns without Minimum Support”.

R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules.

R. J. Bayardo. Efficiently mining long patterns from databases. (citeseer)

Zaki and Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining.

14 marzo 2007 Esercitazione su regole associative

Esercizi [zip]

Tracce svolte [zip]

Locane