Estimating Hypothesis Accuracy
Assume following setting:
X - set of all possible instances (or exs)
specified often by a set of attributes.
Given the example of learning who will purchase skis for marketing purposes,
X would be all people and
possible attributes might be age, home city, weight, #days you would ski
per year, etc.
Ð - arbitrary probability distribution
over X that represents their occurence in natrue. Independant of target
concept
Pr(Event E(x)) - is the probability the E(x)
is true for a x element-of X drawn randomly from Ð
x element-of Ð
H - set of possible hypotheses
C - set of possible target concepts (may
be unknown)
f element-of C - the target concept
Ex. Suppose you want to learn the target
concept "people who plan to purchase skis next year" (Ð is diff for diff
options)
option 1 - Survey people entering
ski resort
option 2 - Do phone survey of
"random people"
for option 1: Ð specifies
for each person x element-of X the probability that they will be the next
person arriving at the
ski resort
f: X-> {0,1} classifies each person as to whether they will buy skis
next year
Key Assumption
Labeled data set S={<x1
, f(x1)>, <x2, f(x2)>, ... , <x
n, f(xn)>} Obtain by drawing each xi independantly
from Ð and
properly labeling by f. You can
use similar ideas to model noise in sample.
Two questions we want to answer
1. Given h and labeled sample
of n exs, what is best estimate of the accuracy of h over future exs drawn
from Ð
2. What is the possible error
in this accuracy estimate?
Defs Sample Error:
errors(h) =
1/n Sum(x element-of delta) [f(x) xor h(x)]
where s is the sample;
h is the hypothesis; n = |S|; f(x) is the target; and f(x) xor h(x) is 1
if f(x) != h(x) and 0 otherwise
True Error (often
called generalization error)
error
Ð(h) = Pr[f(x) != h(x)]
x element-of Ð
Note that if Ð is discrete then this is just the sum of the prob. weights
of x element-of x that are misclassified.
Observe that this is the prob
that a random x from Ð is incorrectly labeled h
How good an estimate of errorÐ(h) is error
s(h)?
Let's apply some standard probability theory to our problem
| Given coin with some
|
want to estimate prob
| bias and want to estimate
| that
for a random x from
| prob p that you'll get a Head
|
Ð h(x) = f(x)
----------------------------------------------------------------------------------------------
sample space | {Head, Tail}
| {h(x)=f(x), h(x) != f(x)}
(possible outcomes) |
H T
|
1
0
----------------------------------------------------------------------------------------------
Event E |
coin lands heads up
| h(x) = f(x)
----------------------------------------------------------------------------------------------
iid sample | each
flip from same coin with | each ex drawn from Ð
and labeled
identically independant |
prob p of heads.
| by f
| compute estimate p-hat for p
|
| p-hat - like sample error
|
| p - like true error
|
Underlying Justification
Binomial Distribution:
gives prob of observing r successful trials
in n independant trials where there is a prob p of success
P(r) = (nCr)
p r(1-p)n-r = [n!/r!(n-r)!]pr(1-p)n-r
Let x be # successes
E(x)=np
Var(x)=np(1-p)
O = sqrt(np(1-p))
when np(1-p) >= 5 then binomial is closely
approximated by normal
For binomial distribution
errors
(h) = r/n
errorÐ
(s) = p
estimation bias - For estimator Y
For parameter p is E[Y]-P
errors(h) is
an unbiased estimator for errorÐ(s)
E[r] = np so
E[r/n] = p
E[errors(h)]-error
s(h) = 0
Oerrors(h) = Or/n =
sqrt[p(1-p)/n]
can approx by using r/n = errors
(h) for p
Central Limit Theorem
Sum of a large number of iid random vars approximately follow
a Normal Distribution
Normal curves with different standard deviations:
95% confidence interval [l,h]

std-dev=.5
std-dev=1.5

area under curve is 1 (it's a prob dist) Look at portion
(centered around mean) that defines 95% of area
Let p = prob event E occurs (e.g. coin heads
or h(x) = f(x))
Let p-hat = estimate for p = (# of trials
when E occurred)/(total # trials)
Prob (l <= p-hat <= h) = .95
Note: if O smaller (so size of sample bigger)
h-l small as compared to when O is larger
best estimate for errorÐ(h) is error
s(h) where errorÐ(h) is p and errors
(h) is p-hat
For N% confidence interval have:
errors(h)-ZNS
<= errorÐ(h) <= errors(h) + ZN
S
where S is estimate for O from sample where S = sqrt[(error
s(h)(1-errors(h)))/n]
and ZN given by:
confidence level N%: 50% |
68% | 80% | 90% | 95% | 98%
| 99%
----------------------------------------------------
constant ZN
: .67 | 1.00 |
1.28 | 1.64 | 1.96 | 2.33
| 2.58
this approximation to area under normal works well
as long as:
n*errors(h)(1-error
s(h)) >= 5
ex n=40, 12 errors
errors
(h) = 12/40 = .3
S = sqrt(.3*.7/40)
=~ .07
So for 95% confidence
interval (ZN = 1.96)
.30 - .14 <= errorÐ(h) <= .30 + .14
68% confidence interval
.23 <= errorÐ(h) <= .37