CSE 515A - Fall 2005 - Intelligent Data Analysis
Instructor: Weixiong
Zhang
Location: Crow 205
Time: Monday and Wednesday,
1:00 - 2:30 pm
Prereq: CS241 (or
CS514A or CS501 and CS502) and SSM
326A (or Math 320), or their equivalent, or permission of the
instructor
Text book:
R.E. Neapolitan, Learning
Bayesian Networks, Prentice Hall, 2003, ISBN:
0130125342
Reference
book:
C.
Borgelt and R.
Kruse, Graphical Models: Methods for
Data Analysis and Mining, John Wiley &
Sons, 2002, ISBN:
0470843373
Newsgroup: tba (course
related announcement and discussions will be posted there)
Instructor office hours: Friday 3-4pm, Jolley
Hall 506, or by appointment
Because we will
cover many topics that are not in the textbook nor the reference book,
it is
critical to attend every lecture.
Brief
description (in course listings)
Lecture and topic
schedule
Additional
information (project and final grade)
Collaboration policy
Brief
description:
We very
often cry for knowledge while immersed with huge amount of data.
Finding models intrinsic to the production of data we collect and
patterns characteristic to the nature of observations we make is of
fundamental and practical importance. In this course, we study
various advanced techniques (e.g., graphical models and spectral graph
theory) from computer science, artificial intelligence and statistics
for analyzing large quantity of data. We consider applications in
selected domains, such as computational biology and text mining on the
web.
Note: 1) In this semester,
we will consider problems in genomics and biology. 2) This course can
be considered as a continuation and extension of CSE 514A Datamining. It will be very
helpful if you have taken that course first, although that is not a
prerequisite.
Lecture and topic schedule (Note: this is just an outline; details
to be added later.)
******** Uncertainty and
independency ********
- Basic concepts:
uncertainty, independency, causal relationship; probability
distribution, bayes method; some basic biology concepts
- Number of lectures: ~4
******** Graphical models of
Uncertainty and independency ********
- Basic concepts: Markov
networks and Bayesian networks
- Number of lectures: ~5
******** Learning graphical
models ********
- Basic concepts: Learning
Markov and Bayesian networks
- Applications and Projects:
Learning and building network representation from biological data
- Number of lectures: ~10
******** Feature extraction,
feature selections and model building ********
- Basic concepts: curse of
dimentionality, dimention reduction using principle component analysis,
singular value decomposition and other methods; other feature
extraction and selection methods; building discriminative models
- Applications: microarray
feature selection; finding regulatory motif modules
- Number of lectures: ~5
******** Time series analysis ********
- Basic concepts: time
course data; basic methods
- Applications: time series
microarray data analysis and finding cell cycle related genes
- Number of lectures: ~4
Information on project
and grading
- Homework Assignments: There will be small sets of homework
problems. They are designed to help you understand the basic
concepts and methods. However, the problems will not be graded
rigurously and will only carry 15% of the total grade. The
assignments are due either in the instructor's office
in Jolley 506 before the class or in class at the beginning
of
the
class of a given due date. Any homework submitted in class after 1:30pm
will be given a 15 point late penalty. If you arrive to class
after 1:30, wait until the end of class to bring up your homework. It
is VERY disruptive to have students walking in late and coming to the
front of the class to submit their homeworks. No assignments will
be accepted after the instructor leaves the classroom.
- Project: There will be no exam for the course. Instead, we
will have a course project for everyone. A project will have two parts:
-
Reading some papers on a particular datamining topic, which we
do not have time to cover or we do not go into detail in the class, and
then giving an in-depth presentation to describe and discuss the
problem, objectives, data used, data analysis procedure and
techniques adopted, your criticism on the existing work, and your
thoughts and suggestions on possible future research.
-
Designing an algorithm/method for a particular problem, most
possibly from the paper(s) you read.
In addition to the presentation slides,
every student must submit a final report of his/her project. The
following items must be covered in detail: Problem
description, data and method used, detail of existing algorithms,
design and implementation of your own algorithm, algorithm
analysis and comparison, result analysis and future
work.
- Computation of the Final Grade: The following elements and
scores will go into the final grade:
- Homework: 15 points total.
- Project: 85 points total. (details to come later.)
The following scale will be applied to
compute the final grade from your total points earned:
- A: 85-100
- B: 70-84
- C: 60-69
- D: 50-59
- F: < 50
Policy on collaboration
When solving your homework problems and working on your project, you
may discuss HIGH-LEVEL approaches to the homework problems with
your classmates, HOWEVER, you are to work out all details of any
solutions discussed and write up the solution completely on your own.
In particular, when working with a student on an assigned homework
problem you should do so verbally -- Nothing should be written.
Remember to keep your discussion at a high-level so that everyone can
work out the details on their own. Also you must clearly
acknowledge anyone (except the instructor) with whom you discussed any
problem and say briefly what you discussed.
Please keep any discussions you have with other students to a small
group of no more than 3 students and be sure that each of you are
equally involved. If you just listen in and are then able to understand
and write up the solution you have missed at least half of the benefit
of the homework. It is really important to work through the process of
recognizing when you are heading the wrong way and learning how to work
through the problem solving process.
Violations of any of the above rules will be dealt with harshly!
The homework problems and projects are designed to help you learn the
material being taught. Being told the solution and understanding it is
VERY different from working through the process of actually finding a
solution. If you do not take an active role in the process of solving
the homework problems and project, then you won't get much out of it,
hence you won't learn the material.
Created by Weixiong Zhang,
August, 2005.