What is Machine Learning? Here is the definition given by Tom Mitchell.
Machine Learning: Any computer program that improves its performance P at some task
T through experience E.
Example: Learn to play checkers
T: play checkers & win
P: % games won in tournament
E: play against self
Example: Character recognition
T: recognize & classify hand-written characters
P: % of characters correctly classified
E: a database of hand-written characters with given classifications
Given: Training exs <x,f(x)> for some unknown function f
Find: a good approximation to f
x: properties of customer and proposed purchase
f(x): approve purchase or not
x: Bitmap of hair image
f(x): match or not
x: economic indicators (S&P, Dow Jones, interest rates, …)
f(x): market will go up or down
x: bitmap of road surface in front of car
f(x): degrees to turn steering wheel
Feature Types:
discrete: e.g. counties in St. Louis
continuous: e.g. automobile speed
ordinal: e.g. age
relational: e.g. brother-of
How are data points x1,x2,… related?
independent: e.g. each is a different student
time series: e.g. xi represents the Financial indicators of the ith day of the year
Also need to think about independence of the features (components in x such as the
financial indicator). For example, financial indicators are not independent. Neither
is a person’s height and weight. However, a person’s shoe size and IQ are
independent.
How does data get selected?
Training Ex (instance): <x,f(x)>
Target function: true function f
Hypothesis: proposed function h believed to approx f
Hypothesis space: Set of all hypotheses that can, in principle be output by the learning
algorithm.
Search Procedure (search for target in hypothesis space)
Direct Computation: solve for hypothesis directly
Local Search: start with an initial hypothesis, make small improvements until local
optimum is reached
Constructive Search: start with empty hypothesis and gradually add structure until a
local optimum is reached.
On-line vs.
Batch
on-line: analyze each example and make prediction as presented
batch: collect all examples, analyze them and output hypothesis
Eager vs. Lazy
eager: analyze training data and construct an explicit hypothesis
lazy: store training data and wait until a test point is presented and then construct an
adhoc hypothesis to classify that one data point
Active vs. Passive: Can learner control the environment or run experiments vs just passively observing?
Training Data vs. Test Data: Are these from same distribution?
Size: Is size fixed or variable? That is, are all hypotheses the same size? Fixed sizes
are easier to understand. Variable sizes allow more flexibility but introduce
problem of over fitting. Ex. fitting an nth degree polynomial to a set of points.
Is hypothesis deterministic or stochastic (involves randomness)?
e.g. weather prediction – tomorrow will rain/not rain vs. tomorrow there is 80% chance
of rain
Parameters: Is each hypothesis described by a set of parameters or via a symbolic
language