Seventh Class Toolbox so far Prospects for automating statistical reasoning Bayes nets Dempster/Shafer Kyburg (his seventieth birthday is friday!) Today's readings. <>, (!), (?), caps are mine. From Kyburg, Epistemology and Inference (collected papers) Lecture: Toolbox: RULE-BASED: logics horne/EXPERT SYSTEM SHELL/prolog MEDICAL INFORMATICS GROUP fuzzy defeasible/nonmonotonic/taxonomical trees/inheritance LOUI theorem-proving procedurally encoded knowledge decision calculi MEU (maximal expected utility) SANDHOLM mean-risk Allais (applicable theory?) Prospect theory (full theory?) arguing for decisions/practical reasoning LOUI probability calculi applied probability modeling KAMBEROVA bayes nets maximum entropy (objective bayes) SEARCH-BASED: game-based heuristic adversarial CHESS admissible adversarial objective-based heuristic simulated annealing MARK FRANKLIN gradient methods Monte Carlo methods heuristic modeling for exact methods RODIN optimization: SSM381 linear programming combinatorial optimization dynamic programming Kuhn-Tucker optimizing models of search SANDHOLM situation-based planning reacting game theory SANDHOLM constraint-satisfaction SALLY GOLDMAN LEARNING-BASED: computational regression techniques (coming to curve-fit) genetic algorithms neural nets feed-forward (standard) backprop learning KIMURA PNP GROUP recurrent (almost standard) KALMAN-KWASNY symmetric (hopfield -- constraint sat) PINKAS boltzmann (probabilistic simulation) statistical regression techniques (optimal curve-fit) linear non-linear bayesian classification SHANNON (med school) non-bayesian non-classical statistical dempster/shafer kyburg with clopper-pearson intervals/prob arguments LOUI decision trees ID3 Q4.5 SANDHOLM theoretical results SALLY GOLDMAN Bayes nets: use conditional independence to reduce the storage size of joint probability distribution. note that this is an excellent example of the tradeoff between representation and reasoning (space and time). cond indep: Pr(A | B,C) = Pr(A | B) so we say that B screens A from C (ex. A = dies from lung cancer, B = smoker for ten years, C= cool teen) actually, it must be that Pr(A=ai | B=bk, C=ck) = Pr(A=ai | B=bk) for all values that A, B, and C can take on! give example of net and representational savings. For example, if with A (2 values), B (3 values), C (4 values), D (5 values) and a linear topology: A <-- B <-- C <-- D, joint is 2x3x4x5 = 120. bayes net requires 2x3 + 3x4 + 4x5 = 38 values. in general combinatorial savings. is this knowledge really available? yes, in well-understood domains, or more precisely, those for which there is a causal or quasi-causal theory of variables' interaction. Statistical tool box (SSM325/326): order statistics time series analysis linear regression (l.s.e) neyman-pearson theory hypothesis testing (reject H0 --> accept H1?) confidence interval estimation (width vs. confidence) u.m.p.m.l. estimation (power vs. likelihood) neyman-pearson had competitors: fisher, good, lindley; we have entrenched np-theory in the social sciences. this is history, not philosophy or mathematics. but np-theory doesn't solve the reference class problem: sampling theory can address sample size and the significance of subpopulation, but cannot say among a combinatorial number of possible sampling classes which to use. also, the narrowest intervals have to be justified (on power, under the assumption that each estimate have the required likelihood). one could use all sorts of confidence curves if one wanted merely to satisfy np theory. bernoulli example. statisticians are mad: it sometimes suffices to know how to come to know what you are doing even if you don't know what you are doing. dempster-shafer 1. interval representation, which doesn't sum to 1. map example. 2. normalized orthogonal sum rule of combination. example. kyburg 1. interval representation, inherited from clopper-pearson (yes, arbitrary, but this is DESIGNED to fill-in the np-theory's gaps) 2. rule for choosing the reference class needs to be supplemented with XP classes in order to work well. disagreement of intervals is a cheap significance test. 3. implementation in rcstat dempster vs. kyburg. when [.1, .1] vs. [.9, .9] and neither is subclass, kyburg gets [0, 1] and dempster (under the usual interpretation) gets [.4,.6] (approximately). when [.7, .7] and [.8, .8] and neither is subclass, kyburg gets [.8, .9] (approximately), and dempster gets [.74, .76] (i am guessing). so the choice of calculus depends on how one views "combination" and "ignorance" -- is [.5, .5] how you want to represent ignorance? any bayesian would choose this. but lots of people want [0, 1] to show ignorance. also, when two independent reports are "positive", do you want the system to become more positive, or to average the positive reports? the question really is what the designer wants the system to do, and how one frames problems. no theory of induction is a magic box that can always tell you what to believe. they are really just canonical forms of statistical argument for what one ought to believe. Discussion: simple neomonopoly approaches accept loans that look good what is a good loan? don't let your balance drop too low how low is low? take a loan before buying when you have no cash Problem: None this week. Please program.