COMMON AI ARCHITECTURES HARD-CODED/EMBEDDED (IMPLICIT) LOGIC PROGRAM CONTROL PROGRAM RULE-BASED REASONING PROGRAM PROLOG PROGRAM CASE-BASED REASONING PROGRAM LEARNING PROGRAM NEURAL NET PROGRAM INDUCTIVELY REASONING PROGRAM PROBABILISTICALLY REASONING PROGRAM EXPECTED UTILITY & PROBABILISTICALLY REASONING PROGRAM EXPLICITLY LOGICALLY-DEDUCING PROGRAM FUZZY LOGIC PROGRAM CONSTRAINT SATISFACTION PROGRAM PLANNING PROGRAM OPTIMIZING PROGRAM HEURISTICALLY OPTIMIZING PROGRAM HEURISTICALLY OPTIMIZING THROUGH TREE-SEARCH PROGRAM HEURISTICALLY OPTIMIZING THROUGH GRADIENT-ASCENT PROGRAM HEURISTICALLY OPTIMIZING THROUGH GRADIENT-ASCENT+SIMULATED ANNEALING PROGRAM GENETIC PROGRAM GRAMMAR/GENERATIVE PROGRAM PREFERENCE a > b, "a is preferred to b" UTILITY what makes it possible to measure the intensity of preference? u(a) a real number? why not an integer? what is it that makes something extensive? are there other scales that are unique up to affine transformation? lotteries. independence. substitutivity. totality. is u a function of one variable? is it convex? strictly monotonic? how is u elicited/specified? PROBABILITY for A = {a,b,c}, mutually exclusive & exhaustive, if you tell me p(i), i in A, i can fill in p(s) s in 2^A, also do conditional p. what if you tell me some p's, e.g., p({a,b}) = .5, what is p({b,c})? or p(foo|bar) = .9, p(bar|moo) = .9, what is p(foo|moo)? use maximum entropy? what if p(foo|bar,moo) = p(foo|bar)? statistical inference from data: maximimum likelihood estimators? HEURISTIC OPTIMIZATION an objective function f(x), f(x,y), or f(x1, ..., xn). what is max f? what is argmax f? how long does f take to evaluate? .000001 sec? or 10 hrs? f could be a simulation with parameters x1, ..., xn could be booleans 1. random evaluation for (i=1; i<=1M; i++) { x = rand(); savebest(f(x)) } 2. gradient ascent x = rand() for (i=1; i<=1M; i++) foreach (x' in {x+delta, x-delta}) x = savebest(f(x')) 3. gradient ascent with monte carlo starts for (j=1; j<=1M; j++) { x = rand() for (i=1; i<=1M; i++) foreach (x' in {x+delta, x-delta}) x = savebest(f(x')) } 4. simulated annealing x = rand() for (i=1; i<=1M; i++) if (rand() < 1/e^-k(i)) foreach (x' in {x+delta, x-delta}) x = savebest(f(x')) else x' = rand({x+delta, x-delta}) 5. hybrid simulated annealing for (d=1; d<=n; d++) x(d) = rand() for (i=1; i<=1M; i++) if (rand() < 1/e^-k(i)) { S = Union(d'=1..n)({x+delta(d'), x-delta(d')}) S' = randomsample(i,S) foreach (x' in S') x = savebest(f(x')) } else { d' = rand(1..n) x' = rand( {x+delta(d'), x-delta(d')} ) } how much better is 5 than 4? (one recent paper: much better!)