Reading List

Reading List:

Choose one of the papers listed below and email me your selection.

Association Rule and Frequent Pattern Mining

Beyond Market Baskets: Generalizing Association Rules to Correlations, Craig Silverstein, Sergey Brin, Rajeev Motwani, Data Mining and Knowledge Discovery, 2, 1998, pp. 39-68
H-Mine: Hyper-structure Mining of Frequent Patterns in Large Databases, J. Pei, J. Han, H. Lu, S. Tang, and D. YangProc. of the 2001 IEEE International Conference on Data Mining (ICDM'01), San Jose, California, Novermber 29-December 2, 2001.
J. Pei and J. Han, Constrained Frequent Pattern Mining: A Pattern-Growth View, ACM SIGKDD Explorations (Special Issue on Constraints in Data Mining), June 2002.
Scalable Techniques for Mining Causal Structures, Craig Silverstein, Rajeev Motwani, Sergey Brin, and Jeff D. Ullman, Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), 1998
H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hätönen, H. Mannila Pruning and Grouping Discovered Association Rules Proceedings of the First International Conference on Knowledge Discovery in Databases (KDD'95), Montrea, Canada, 1995.
Brian Lent, Arun Swami and Jennifer Widom, Clustering Association Rules, Proceedings of ICDE'97, Birmingham, English 1997.
Qian Wan and Aijun An An Efficient Approach to Mining Indirect Associations, Journal of Intelligent Information Systems (JIIS), Kluwer Academic Publishers, Vol.27, No.2, 2006.
Xindong Wu, Chengqi Zhang and Shichao Zhang, Efficient Mining of Both Positive and Negative Association Rules. ACM Transactions on Information Systems, 22(2004), 3: 381-405. (SCI).
Guozhu Dong and Jinyan Li Efficient Mining of Emerging Patterns: Discovering Trends and Differences, KDD 1999: 43-52.

Spatial Association Rule Mining

Koperski, K., and Han, J., Discovery of Spatial Association Rules in Geographic Information Databases, Proc. 4th Int. Symp. Advances in Spatial Databases, 1995.
Shekhar, S. and Huang, Y., Discovering Spatial Co-location Patterns: A Summary of Results, 2001.
Malerba, D., Esposito, F. and Lisi, F., Mining Spatial Association Rules in Census Data, 2002.

Data Stream Mining

Frequent Sequence Mining

Approximate Frequency Counts over Data Streams, by Gurmeet Singh Manku, Rajeev Motawani, in the International Conference on Very Large Data Bases (VLDB) 2002.
M.J. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences, Machine Learning, Vol.42, No.1/2, 2001.
Jiong Yang, Wei Wang, Philip S. Yu: Infominer: mining surprising periodic patterns. KDD 2001: 395-400
R. C. Agarwal, C. C. Aggarwal, and V. Parsad. Depth first generation of long patterns. In SIGKDD, 2000.
J. Pei, J. Han, and W. Wang. Mining Sequential Patterns with Constraints in Large Databases", Proc. the 11th International Conference on Information and Knowledge Management (CIKM'02), McLean, VA, November 4-9, 2002.
CloSpan: Mining Closed Sequential Patterns in Large Databases, Xifeng Yan, Jiawei Han, Ramin Afshar, Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, May, 2003.
Finding Recent Frequent Itemsets Adaptively over Online Data Streams, by Joong Hyuk Chang, Won Suk Lee, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2003.

Classification

On Demand Classification of Data Streams, Aggarwal, Han, Wang, and Yu, KDD'04.
Mining Time-Changing Data Streams, by Geoff Hulten, Laurie Spencer, Pedro Domingos, in the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) 2001.

Clustering

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data Streams Proceedings of the International Conference on Very Large Data Bases (VLDB) 2003.
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta, Distance Measures for Effective Clustering of ARIMA Time Series, ICDM'01.

Graph Mining

Efficiently mining frequent trees in a forest, Mohammed J. Zaki, KDD 2002.
Frequent Subgraph Discovery, Michihiro Kuramochi and George Karypis, ICDM, 2001.
Substructure Similarity Search in Graph Databases. Xifeng Yan, Philip Yu, Jiawei Han, SIGMOD'05.
Frequent Subtree Mining - An Overview, Yun Chi, Siegfried Nijssen, Richard Muntz, Joost Kok, Fundamenta Informaticae Special Issue on Graph and Tree Mining, 2005.

Decision Tree Learning

RainForest: A framework for fast decision tree construction of large datasets, In VLDB'98, pp. 416-427, New York, NY, 1998.
Boosting, Bagging, and C4.5, J. R. Quinlan, AAAI'96, pp 725-730.
Learning Trees and Rules with Set-valued Features, William W. Cohen, Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), 1996.
Cesar Ferri, Peter Flach and Jose Hernandez-Orallo, Learning Decision Trees Using the Area Under the ROC Curve, Proceedings of the 19th International Conference on Machine Learning, Morgan Kaufmann, July 2002, pp.139-146.

Decision Rule Learning

Linyan Wang and Aijun An, Fast counting with AV-Space for Efficient Rule Induction, Proceedings of the SIAM International Conference on Data Mining (SDM'07), Minneapolis, Minnesota, April 26-28, 2007.

Learning from Imbalanced Datasets

PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), Ramesh Agarwal and Mahesh V. Joshi, 2001.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P, SMOTE: Synthetic Minority Over-sampling TEchnique, Journal of Artificial Intelligence Research, 16, 2002, 341-378.

Clustering

CACTUS-Clustering Categorical Data Using Summaries, Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan, Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), 1999 Aug, pp. 73-83.
Clustering Large Datasets in Arbitrary Metric Spaces, Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French, Proceedings of the 5th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 502-511
ROCK: A Robust Clustering Algorithm for Categorical Attributes, Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Proceedings of the 15th International Conference on Data Engineering, 23-26 March 1999, Sydney, Austrialia, IEEE CS Press, 1999, pp. 512-521.
BIRCH: an efficient data clustering method for very large databases, Tian Zhang, Raghu Ramakrishnan, Miron Livny, Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, pp. 103-114.
CURE: An Efficient Clustering Algorithm for Large Databases, Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Proceedings of the ACM SIGMOD Conference, 1998.
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, M. Ester M., H.-P. Kriegel, J. Sander, X. Xu, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, 1996, pp. 226-231

Mining XML Documents

J. W. W. Wan and G. Dobbie, Mining association rules from XML documents using XQuery, In Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation.
Winkler and Spiliopoulou, Extraction of Semantic XML DTDs from Texts Using Data Mining Techniques, K-CAP 2001 Workshop on Knowledge Markup and semantic annotation, Victoria, B.C., Canada, 2001.
Braga, Campi, Ceri, Klemettinen, and Lanzi, A Tool for Extracting XML Association Rules from XML Documents, ICTAI'02.

Web Mining

Larry Page, Sergey Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, Technical Report, Computer Science Department, Stanford University, 1998.
J. Kleinberg, Authoritative sources in a hyperlinked environment, In Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pages 668-677, ACM Press, New York, 1998.
Data mining of user navigation patterns, J. Borges and M. Levene, In Web Usage Analysis and User Profiling, pp. 92-111. Published by Springer-Verlag as Lecture Notes in Computer Science, Vol. 1836, 2000.
A Framework for Collaborative, Content-Based and Demographic Filtering, Michael J. Pazzani, Artificial Intelligence Review.
Learning to Extract Symbolic Knowledge from the World Wide Web, M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery, Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98), pp. 509-516, Madison, WI. AAAI Press.

Privacy Preserving Data Mining

Privacy Preserving Mining of Association Rules, by Evfimievski, R. Srikant, R. Agrawal and J. Gehrke, KDD 2002.
Using Randomized Response Techniques for Privacy-Preserving Data Mining, by Wenliang Du and Zhijun Zhan, SIGKDD 2003.
Privacy-Preserving K-Means Clustering over Vertically Partitioned Data, by Jaideep Vaidya and Chris Clifton, SIGKDD 2003.
Collaborative Filtering with Privacy, by John Canny, IEEE S&P 2002.