Research interest

My research interest is empirical machine learning study, especially supervised learning aided by unlabeled data problem. Currently I'm working on co-training, active learning and their applications in text categorization. There are many interesting research problems associated with discovering knowledge from unlabeled data and they are of considerable interest recently in natural language processing, image recognition and classification, and hypertext categorization, etc.

Related Work:

  1. Conferences and Workshops
  2. NIPS*99 Workshop on Using unlabeled data for supervised learning
    NIPS 2000 Workshop: Unlabeled Data Supervised Learning Competition
    Unlabeled Data 2001: It's time to put-up or shut-up!
    COLT - Annual Conference on Computational Learning Theory
    ICML - International Conference on Machine Learning
    NIPS - Annual Conference on Neural Information Processing Systems
    EUROCOLT - European Conference on Computational Learning Theory
    ALT - Annual Conference on Algorithmic Learning Theory
    AAAI - The National Conference on Artificial Intelligence
    UAI - Uncertainty in Artificial Intelligence
    KDD - Knowledge Discovery in Data and Data Mining

  3. Related Publications

    Semi-Supervised Learning

    Matthias Seeger. Learning with Labeled and Unlabeled Data. Technical Report,Edinburgh University

    A. Blum and S. Chawla. Learning from Labeled and Unlabeled Data using Graph Mincuts. ICML, 2001.

    A. Blum and T. Mitchell. Combining Labeled and Unlabeled Data with Co-Training. COLT, pages 92--100, 1998.

    Kamal Nigam and Rayid Ghani. Analyzing the Effectiveness and Applicability of Co-training. The Ninth International Conference on Information and Knowledge Management (CIKM-2000). 2000.

    Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 39(2/3). pp. 103-134. 2000.

    Kamal Nigam and Rayid Ghani. Understanding the Behavior of Co-training. In KDD-2000 Workshop on Text Mining. 2000.

    Yuri Ivanov,Bruce Blumberg,Alex Pentland. Expectation Maximization for Weakly Labeled Data ICML 2001.

    S. Baluja. Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data. Neural Information Procesing systems (NIPS '98).

    T. Mitchell. The Role of Unlabeled Data in Supervised Learning. Proceedings of the Sixth International Colloquium on Cognitive Science, San Sebastian, Spain, 1999 (invited paper).

    Kristin P. Bennett,A. Demiriz. 1999. Semi-Supervised Support Vector Machines. Advances in Neural Information Processing Systems, 12.

    G. Fung and O. L. Mangasarian. Semi-Supervised Support Vector Machines for Unlabeled Data Classification. Data Mining Institute Technical Report 99-05, October 1999.

    Ratsaby, J., & Venkatesh, S. S. Learning from a mixture of labeled and unlabeled examples with parametric side information. In Proceedings of the Eighth Annual Conference on Computational Learning Theory, pp. 412-417.

    D. Miller and Uyar. A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data. Advances in Neural Information Processing (NIPS-9), pp. 571-577

    T. Zhang and F. Oles. A Probability Analysis on the Value of Unlabeled Data for Classification Problems. International Conference on Machine Learning; June, 2000.



    Active Learning

    D. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129-145, 1996.

    David D. Lewis, William A. Gale. A Sequential Algorithm for Training Text Classifiers. Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval(pp. 3-12)
    David D. Lewis, Jason Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. Proc. 11th International Conference on Machine Learning.
    H. S. Seung, M. Opper, H. Sompolinsky. Query by Committee Proc. 5th Annu. Workshop on Comput. Learning Theory (pp. 287-294)
    Yoav Freund, H. Sebastian Seung, Eli Shamir, Naftali Tishby. Selective sampling using the Query by Committee algorithm Machine Learning, 28, 133-168.
    Nick Roy and Andrew McCallum. Toward Optimal Active Learning through Sampling Estimation of Error Reduction. ICML-2001 (pp. 441-448)
    Greg Schohn, David Cohn Less is More: Active Learning with Support Vector Machines Proc. 17th International Conf. on Machine Learning, 2000
    Simon Tong, Daphne Koller. Support Vector Machine Active Learning with Applications to Text Classification. To appear Machine Learning Journal 2001.





    Transductive Learning

    Joachims,Thorsten Joachims. Transductive Inference for Text Classification using Support Vector Machines. International Conference on Machine Learning. (ICML), 1999.





    Text Categorization

    Andrew McCallum, Kamal Nigam. Employing EM and Pool-Based Active Learning for Text Classification Proc. 15th International Conf. on Machine Learning, 1998

    Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom Mitchell. Learning to Classify Text from Labeled and Unlabeled Documents Proceedings of AAAI-98, 15th Conference of the American Association for Artificial Intelligence

    Andrew McCallum, Kamal Nigam. 1999. Text Classification by Bootstrapping with Keywords, EM and Shrinkage.

    Rosie Jones, Andrew McCallum, Kamal Nigam, Ellen Riloff. Bootstrapping for Text Learning Tasks. IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, 1999

    Kamal Nigam Andrew McCallum Sebastian Thrun Tom Mitchell. 1998. Using EM to Classify Text from Labeled and Unlabeled Documents

    kamal Nigam, John Lafferty, Andrew McCallum. 1999. Using Maximum Entropy for Text Classification



Machine Learning Resources

KDD Research.org
MLC++ - Machine Learning Library in C++
UCI Machine Learning Repository
Online Machine Learning Resources
Machine Learning Resources (David Aha)
Machine Learning and Case-Based Reasoning (people in ML and CBR)
Machine Learning Index
Machine Learning Slides
Machine Learning in Practice
Java Programs for Machine Learning
Data Engineering for Inductive Learning
Tom Mitchell's ML Lecture
Machine Learning in Purdue
Machine Learning in MIT
Papers and Talks by Wray Buntine

Machine Learning Tools

DIAMOND and Ice: Visual Exploratory Data Analysis Tools

Artificial Intelligence

AI Software for Linux
JPL Robotics
Robotics Resources
AI Subjects
Informative AI
Journal of AI Research

Networking Tools

Network Simulator
Tutorial for the Network Simulator "ns"
NS Documentation
NS Source Files

Recent Publications

Master Thesis

Neural Network Learning from Incomplete Data