CS 529A: Data Mining (and Applications to Computational Biology) - Spring 2004

Instructor:  Weixiong Zhang
Prereq:  CS241 and SSM 326A (or Math 320), or their equivalent, or permission of the instructor
Text book:

Reference books:

Location: Whitaker Hall 216
Time: Monday and Wednesday, 1:00 pm - 2:30 pm
Office hours: Monday and Wednesday, 2:30 - 3:30 pm, Jolley Hall 506, or by appointment.


Brief description:  Many scientific computing problems are, by nature, statistical.  Such problems appear in many domains, such as text analysis, data mining on the web, computational biology and various medical applications.  Another source of the statistical nature of such problems is the lack of sufficient information of the problem domains as well as the specific problems at hand.  What is available for a typical application is usually a set of data from observation or experiments.  The main objective of this course is to gain experience of dealing with statistical data analysis problems by studying various statistical methods that can be used to make sense out of data, by reading and reviewing literature as well as by working on a specific statistical problem in a selected application domain.


Syllabus
Reading materials on biology
Policies on homework, project, and grading
Collaboration Policy
Homework assignments


Syllabus  (Note: This syllabus will be adjusted as needed to meet the needs of the course.)

******** Introduction and background ********

******** Main topics: Probability/statistical background, statistical and Bayesian Learning ********

******** Main topics: Optimization and combinatorial search ******** 

******** Main topic: Classification ********

3/8 spring break - no class
3/10 spring break - no class

******** Main topic: Regression ******** 

******** Main topic: Clustering ******** 

******** Main topics:  Structure-based methods ********

******** Course projects ********

Goto Top

Supplemental reading materials on biology

If you need basic knowledge on cell biology, microarray technology and microarray data analysis, the following links will be useful.

Specific reading materials can be found at the reading lists in the Syllabus.

Goto Top

Policies on homework, project and grading

In addition to the presentation slides, every student must submit a final report of his/her project.  The following items must be included and covered in detail: Problem description, data and method used, algorithm details, algorithm analysis (if the second option is chosen), result analysis and future work.
The following scale will be applied to compute the final grade from your total points earned:
Goto Top

Policy on collaboration

When solving your homework problems and working on your project, you may discuss HIGH-LEVEL approaches to the homework problems with your classmates, HOWEVER, you are to work out all details of any solutions discussed and write up the solution completely on your own. In particular, when working with a student on an assigned homework problem you should do so verbally -- Nothing should be written. Remember to keep your discussion at a high-level so that everyone can work out the details on their own. Also you must clearly acknowledge anyone (except the instructor) with whom you discussed any problem and say briefly what you discussed.

Please keep any discussions you have with other students to a small group of no more than 3 students and be sure that each of you are equally involved. If you just listen in and are then able to understand and write up the solution you have missed at least half of the benefit of the homework. It is really important to work through the process of recognizing when you are heading the wrong way and learning how to work through the problem solving process.

Violations of any of the above rules will be dealt with harshly! The homework problems and projects are designed to help you learn the material being taught. Being told the solution and understanding it is VERY different from working through the process of actually finding a solution. If you do not take an active role in the process of solving the homework problems and project, then you won't get much out of it, hence you won't learn the material.

Goto Top

Homework assignments

Goto Top

Created by Weixiong Zhang, Jan. 2004.  Last modified by Weixiong Zhang, Feb. 2004.