HTML document prepared by Sean Waters.

What is Machine Learning? Here is the definition given by Tom Mitchell.

Machine Learning: Any computer program that improves its performance P at some task

        T through experience E.

 

Example:  Learn to play checkers

            T: play checkers & win

            P: % games won in tournament

            E: play against self

 

Example: Character recognition

            T: recognize & classify hand-written characters

            P: % of characters correctly classified

            E: a database of hand-written characters with given classifications

 

Three Scenarios for Machine Learning

 

More on Supervised Learning

 

Given: Training exs <x,f(x)> for some unknown function f

Find: a good approximation to f

 

Example Applications

        x: properties of customer and proposed purchase

         f(x): approve purchase or not

        x: Bitmap of hair image

              f(x): match or not

        x: economic indicators (S&P, Dow Jones, interest rates, …)

        f(x): market will go up or down

        x: bitmap of road surface in front of car

        f(x): degrees to turn steering wheel

 

What types of data are there?

    Feature Types:

        discrete: e.g. counties in St. Louis

        continuous: e.g. automobile speed

        ordinal: e.g. age

        relational: e.g. brother-of

 

    How are data points x1,x2,… related?

        independent: e.g. each is a different student

        time series: e.g. xi represents the Financial indicators of the ith day of the year

 

        Also need to think about independence of the features (components in x such as the  

        financial indicator).  For example, financial indicators are not independent.  Neither

        is a person’s height and weight.  However, a person’s shoe size and IQ are

        independent.

 

    How does data get selected?

 

Applications for Supervised Learning

 

Terminology

    Training Ex (instance):  <x,f(x)>

    Target function: true function f

    Hypothesis: proposed function h believed to approx f

    Hypothesis space: Set of all hypotheses that can, in principle be output by the learning

                                  algorithm.

Characterizing the Learning Process

    Search Procedure (search for target in hypothesis space)

        Direct Computation: solve for hypothesis directly

        Local Search: start with an initial hypothesis, make small improvements until local

                               optimum is reached

        Constructive Search: start with empty hypothesis and gradually add structure until a

                                           local optimum is reached.

 

    On-line vs. Batch

        on-line: analyze each example and make prediction as presented

        batch: collect all examples, analyze them and output hypothesis

 

    Eager vs. Lazy

        eager: analyze training data and construct an explicit hypothesis

        lazy: store training data and wait until a test point is presented and then construct an

                 adhoc hypothesis to classify that one data point

    Active vs. Passive: Can learner control the environment or run experiments vs just passively observing?

    Training Data vs. Test Data: Are these from same distribution?

 

Characterizing Hypothesis Spaces

    Size: Is size fixed or variable?  That is, are all hypotheses the same size?  Fixed sizes

             are easier to understand.  Variable sizes allow more flexibility but introduce

             problem of over fitting.  Ex. fitting an nth degree polynomial to a set of points.

 

    Is hypothesis deterministic or stochastic (involves randomness)?

    e.g. weather prediction – tomorrow will rain/not rain vs. tomorrow there is 80% chance

           of rain

   

    Parameters: Is each hypothesis described by a set of parameters or via a symbolic

                        language

 

Some issues in Machine Learning