1988 Extended Abstract for Society of Exact Philosophy DEFEASIBLE SPECIFICATION OF UTILITIES Abstract. This paper suggests non-monotonic reasoning about the utilities of outcome states in decision analysis, for two reasons: first, it allows decision theory to be used on the kind of problem spaces to which AI planning is accustomed; second, it allows incremental refinement of the decision models. In addition to calculating an expectation to determine the utility of a state, we can reason about the extent to which a state satisfies goals. This reasoning is defeasible. As more aspects of the state are considered, prima facie determinations of utility may be corrected. The fact that we can determine (defeasibly) a utility for every state, no matter how incompletely it is described, implies that decision trees can be bounded arbitrarily, as computational resource limitations may require. Presumably, in the long run decision theory will be integrated with . . . planners. Before that happens, how probability and utility estimates are arrived at in realistic circumstances will have to be solved. -- Charniak and McDermott, Introduction to AI [1] 1. Background. 1.1. Utility in Decision Theory and Goals in Planning. Decision analysis is the leading candidate for deciding among competing actions and conflicting goals in uncertain worlds. It is normally thought to require probabilities, utilities, and a set of alternative courses of action. Presumably, work on evidential reasoning will deliver probabilities suitable for AI's use, and planning has already discovered various ways of representing and composing sequences of primitive actions. This leaves the specification of utilities of outcome states. Operations research devotees can tell us how to assess utilities, when agents are unable to specify them directly and explicitly, but I claim that there is more to be said about the representation of utilities even when extraction of the information is easy, i.e., even when assessment is trivial. So I claim there is more to be said than "u(recovery) = 15", whether we have to arrive at the value, 15, through extensive interviewing using certainty equivalents and standard lotteries, or whether our agent is simply willing to assert that recovery would seem to be worth 15 utils. I will be concerned with the descriptions of the states of the world, such as "recovery": such descriptions are inherently incomplete, and we should examine whether utilities can be usefully assigned to incompletely described states of the world. But some preliminary remarks contrasting decision theory and planning are in order. AI's paradigmatic work on planning concerns deterministic worlds and incompletely described states or situations. A state is terminal and has value whenever it satisfies the sententially described goals which it is the object of the plan to guarantee. Computation consists of search through the tree of action sequences for a single path from initial to goal state: the fewer the paths considered, the better. Decision theory concerns indeterministic worlds. There is an important distinction between a completely described terminal state, and a lottery among such states. The probability of each state is known. All states have some value, and are ordered by relative desirability, represented in the reals. Action rarely guarantees specific desiderata, and every action has some probability of achieving the desiderata. Computation consists of evaluating the value of all nodes but the root, in the entire tree of action and event sequences. The value of a node is determined by the values of all of its children. The more paths in the decision tree, the more valid the analysis. In the worst scenario, all past work on deterministic planning will be inapplicable to decision-theoretic planning because of the differences in representation and procedure. This paper is a first step toward avoiding that worst-case scenario. Presumably, existing goal-achieving planning methods will be useful in heuristic evaluation of large decision trees, especially if there were some way of bounding risk in low-probability sub-trees (i.e., restoring some determinism), and some way of identifying properties of states that contribute significantly to utility valuation (i.e., identifying properties that can function as goals). My proposed representation of utility of states should make these heuristics possible. 1.2. Savage's Problem of Small Worlds and Abstraction. AI planning wants to describe states of the world by asserting sentences known to be true of the state. It sometimes leaves undetermined the truth value of sentences that are relevant to the valuation of a state. Assigning utility to a possible world described by a maximally strong sentence in a first-order language is no special problem; this is what Richard Jeffrey's theory of decision does [2]. Assigning utility to a set of possible worlds, W , described by a sentence, p , is also no problem, so long as we understand it as the weighted average of the utilities of each possible world, u(w(i)) , for w(i) in W , where the weight of u(w(i)) is prob(w(i)) . States are usually taken to be sets of possible worlds. However, decision theory can make no sense of the claim that the utility of a state, s , is some value x , **pending consideration of whether I know p to be true in s or not**. Suppose you know p is true in s , e.g., that (loaded gun) is true in the state allegedly described by (fired gun) . And suppose p is relevant to the valuation of s , i.e., it matters whether (loaded gun) . Then on the standard view, you should describe the state by (and (loaded gun) (fired gun)) and use the utility of this state, x(loaded) , instead of x , the utility of (fired gun) . Meanwhile, on the standard view, if you don't know whether p or ~p is true in s , you should assess the probabilities of p and ~p respectively, e.g., of (loaded gun) and (not (loaded gun)) , and use these probabilities to weigh x(loaded) and x(~loaded) , where x(~loaded) is the utility of the state described by (and (not (loaded gun)) (fired gun)) . You should definitely not use x unless x is this weighted average. In short, you cannot ignore relevant knowledge. Sometimes, however, we want to ignore relevant knowledge at various levels of abstraction. It should be possible to make utility claims under various abstractions. Above, we claim the utility of s is x at a granularity that ignores p's truth. This claim is defeasible; we will revise it at a finer granularity in which we consider the truth or falsity of p . It would be nice if x , above, indeed were the appropriately weighted average, prob( p | s ) x(loaded) + prob( ~p | s ) x(~loaded) . But I claim that it need not be, in order for a formalism to be useful. To insist that it be is to confuse incompleteness due to ignorance or indeterminacy (an epistemic matter) with incompleteness due to abstraction (a computational matter). Moreover, what often prevents us from assigning an x to be the utility of an s is the fear that x will not equal the appropriate weighted average, once we consider all of the uncertainties about s , and their ramifications. So we are forced to deepen our decision trees even for the most casual decisions. We are forced to reach for leaves that are quiescent states, wherein most of the relevant possibilities have been decided one way or the other. The problem is that we cannot use x for the utility of s unless we are prepared to equate it with the expectation, which requires further analysis. My finesse of this problem will be to allow use of x defeasibly. Leonard Savage, the foundational decision theorist, spoke of the problem of small worlds, which is the same abstraction problem here. Savage starts with the *world*, "the object about which the person is concerned," and a *state* (of the world), "a description of the world, **leaving no relevant aspect undescribed**" (emphasis added). He writes [3] In application of the theory, the question will arise as to which world to use . . .. If the person is interested in the only brown egg in a dozen, should that egg or the whole dozen be taken as the world? It will be seen . . . that in principle no harm is done by taking the larger of two worlds as a model of the situation. But computational harm is done by taking worlds too large. It can be painful to specify utilities for 2^12 states of eggs in a dozen, instead of 2^2 states of a brown egg, and also costly to manipulate decision trees over such states. In the problems that AI planning addresses, every set of possible worlds is a state, and the combinatorics are enormously worse. 2. Defeasible Calculation. 2.1. Defeasible Specification of Utility. The shrewd will give a terse specification of utilities: if a state of the dozen eggs entails that the brown egg is broken, then its utility is 0; otherwise, its utility is 1. Those who have access to a non-monotonic representational language will say: defeasibly u(x) = 0; if x |-- ~broken-brown-egg then u(x) = 1 . The latter specifies implicit exceptions to the first rule. This defeasible version is desirable for the same reasons we write bird(x) >-- flies(x) (i.e., if x is a bird then defeasibly, x flies); ~flies(Tweety) instead of ( bird(x) & ~(x = Tweety) ) --> flies(x) ~flies(Tweety) . In general, we can try to discover structure among our utility assessments. Defeasibility allows the economical use of structural principles that are only approximate, i.e., that admit of exceptions. One possibility is additive structure. The multi-attribute model of utility supposes that valuation can be split among attributes whose individual contributions are additive: u() = SUM(i) { k(i) u(w(i)) } where is an n-vector measuring the state in each of n relevant attributes, and k(i) / k(j) reflects the relative importance of the ith to the jth attribute. Likewise, we can suppose that the contributions to a state's utility of properties holding in that state will be the sum of their individual contributions. For some conjunctions of properties, the utility contributed is not equal to the sum of each individual contribution, but these will be exceptions handled by the defeasible reasoning. There will be a payoff for using the additive model if the exceptions are few. Each of (flat tennis-balls) , (busy tennis-courts) , (cold day) , (grouchy partner) , and (poor string-tension) may decrease by 10 utils the valuation of a state in which they hold, so that the utility of a state in which (and (busy courts) (flat tennis-balls)) holds is apparently -20. But in conjunction, (and (flat tennis-balls) (poor string-tension)) may be only as detrimental as a loss of 15 utils, so that a state satisfying (and (busy courts) (flat tennis-balls) (poor string-tension)) is apparently valued at -25 utils. We can continue to express exception upon exception, with the right representational language. It might be that if all of the five properties above hold, the attempt to play tennis is so absurd it's enjoyable, and worth 10 utils, instead of the -45 that we might defeasibly calculate. A non-monotonic logic with specificity defeaters allows this to be expressed economically. I will use my own defeasible reasoning system [4], though others could also be used. The required mechanism is that if a >-- c a & b >-- ~c then ~c can be inferred on a & b (assuming a & b entails a but not vice versa). Let there be a stock of properties P1 , . . ., Pn , relevant to the valuation of a state, and let a utility scale be determined (i.e., it is meaningful to talk of 5 utils). Any individual contributions and conjunctive contributions may be asserted, e.g. uconst(P1) = 12 ; uconst(P2) = 20 ; uconst(P1 & P2) = 3 ; uconst(P5) = 14 . We use two defeasible rule schemata: (1) 'uconst(P) = x & uconst(Q) = y' >-- 'uconst(P & Q) = x + y' (2) 'holds(P, s) & uconst(P) = x' >-- 'u(s) = x' . The first says that the utility of a state in which P and Q are true is the sum of the respective utilities in which just P and just Q hold, all things being equal. The second says that if P & Q holds in s , then the utility of s is defeasibly the utility of a state in which just P & Q contribute to the utility valuation. It is defeated if (i) P & Q & R are known to hold in s , (ii) we can compute uconst(P & Q & R) , and (iii) uconst(P & Q & R) differs from uconst(P & Q) . Example. Suppose holds(P1 & P2 & P5, s0) . What is the utility of s0 ? There are several competing arguments, three of which are <>: A1: 1. uconst(P1 & P2) = 3 2. 'holds(P1 & P2, s) & uconst(P1 & P2) = 3' >-- 'u(s) = 3' 3. holds(P1 & P2, s0) 4. therefore, u(s0) = 3 . A2: 1. uconst(P1) = 12 & uconst(P2) = 20 2. 'uconst(P1) = x & uconst(P2) = y' >-- 'uconst(P1 & P2) = x + y' 3. therefore, uconst(P1 & P2) = 32 4. uconst(P5) = 14 5. 'uconst(P1 & P2) = x & uconst(P5) = y' >-- 'uconst(P1 & P2 & P5) = x + y' 6. therefore uconst(P1 & P2 & P5) = 46 7. 'holds(P1 & P2 & P5, s) & uconst(P1 & P2 & P5) = 46' >-- 'u(s) = 46' 8. holds(P1 & P2 & P5, s0) 9. therefore u(s0) = 46 . A3: 1. uconst(P1 & P2) = 3 & uconst(P5) = 14 2. 'uconst(P1 & P2) = x & uconst(P5) = y' >-- 'uconst(P1 & P2 & P5) = x + y' 3. therefore uconst(P1 & P2 & P5) = 17 4. 'holds(P1 & P2 & P5, s) & uconst(P1 & P2 & P5) = 17' >-- 'u(s) = 17' 5. holds(P1 & P2 & P5, s0) 6. therefore u(s0) = 17 . Argument A3 defeats both A1 and A2 . A3 defeats A1 because its rule A3<4> is more specific than the rule A1<2> . A3 also uses more evidence than A1 . A3 defeats A2 because its conclusion A3<1> is more directly related to evidence than the conflicting conclusion A2<3> . More simply, A<1> is indefeasible while A2<3> is defeasible, so the former prevails. Thus, on my account of how conflicting defeasible arguments interact, A3 dominates A1 and A2 ; A3's conclusion is apparently justified, and apparently u(s0) = 17 . The beauty of this approach, which I will elaborate below, is that I have not said what other relevant properties might hold of s0 . If P3 is considered next, and found to hold of s0 , a different calculation might be in order. Nevertheless, until that calculation is performed, found to conflict with and thus supersede the current calculation, decision analysis can identify the optimal act under the provisional value of u(s0) = 17 . The beauty is that we can always do a quick calculation with the best information at hand. 2.2. Normative and Conventional Rules. So far I have not touched on the idea of expected utility for the valuation of lotteries. Expected utility calculations have been presumed so far. The utility of a state, s0 , which has an undetermined event E , has utility prob(E | s0) u(s0 & E) + prob(~E | s0) u(s0 & ~E) , when the respective probabilities and utilities of E in s0 and ~E in s are known. As calculations of u(s0 & E) and u(s0 & ~E) are defeated by better reasoning, based on more specific evidence as above, so too is the calculation of s0's expected utility improved. There are also deep reasons why expected utility calculations need not be preserved. The structural regularities expressed by defeasible additivity of attributes is not supposed to be a norm or obligation imposed by rationality. It is supposed to have descriptive validity. Or it is supposed to be an arbitrary convention adopted by the specifier of utilities for the sake of convenience, as a shorthand. In contrast, expected utility is supposed to be a norm. However, there are philosophical reasons why the distinction between norm and linguistic convention is untenable (this point is associated with Quine [5]). So we might take the expected utility rule for utilities of lotteries to be defeasible: 'u(s(E)) = x & u(s(~E)) = y & s = { s(E) / prob(E) ; s(~E) / prob(~E) }' >-- 'u(s) = prob(E) x + prob(~E) y' . This introduces some ambiguity, however. Sometimes u(s0) will be calculated by summing contributions of its attributes; sometimes it will be calculated by taking an expectation. There will have to be some way of indicating when one calculation defeats the other. 3. Advantages. 3.1. Tractable Specification. The representation presumes that there is structure to utility assignments. I exhibited an additive model; it could as easily have been multiplicative. Defeasibility contributes to easy specification only inasmuch as it allows nearly universal structure to be used in the specification of utility, with exceptions, and exceptions to exceptions, stated inexpensively. Both classical decision theory and the present view allow the energetic knowledge engineer to specify the utility of each and every state. Of course, that never happens. For state spaces as large as we want planners to consider, assuming structure is the only practical course. Increasing the ability to do utility calculation decreases the need for explicit specification. Taking the expected utility rule to be defeasible also permits recovery from incoherence, i.e., from violations of the classical axioms for preference. If we assert that u(s(E)) = 10 , u(s(~E)) = 0 , prob(E) = .5 , and u( { s(E) / .5 ; s(~E) / .5 } ) = 6 , this is a violation of the expected utility rule, since .5(10) + 0 = 5 is not 6. We can simply claim that we do not accept the classical axioms for this particular calculation and take the value to be 6. This is more of philosophical than practical interest: we will generally want our systems to conform to the classical axioms; we will want to add structure to our existing conception of preference, not delete from it. Moreover, if we want to deviate from the classical axioms, there are better ways of doing so (e.g., using alternative axioms, or formalizing the effects noted by psychologists). 3.2. Flexible Procedure. The major advantage of the representation is that it permits flexibility in the formulation of decision problems. A. Refinement on Demand. The logic of defeasible reasoning permits problem refinements to supersede original formulations. We may start with a very small world, in Savage's sense, in our evaluation of outcome states in our decision tree. For example, of relevant properties P1 , . . ., Pn , we might evaluate each state using just the contribution of P1 . <> If time permits a more detailed analysis, the world can be enlarged so that both P1 and P2 are relevant to the description of states, i.e., are used in the valuation of states. Figure 4 shows the valuation of states based first on {P1} , then on {P1, P2} , for uconst(P1) = 5 and uconst(P2) = 3 . Refinement need not be done uniformly. In figure 4, u(s(1, E)) can be evaluated on {P1, P2} while u(s2, ~E) continues to be evaluated on {P1} . This leads to unorderability of some competing utility calculations. Let prob(E) = .5 . Then u(a1 | s0) = .5(8) + .5(0) = 4 ; u(a2 | s0) = .5(5) + .5(0) = 2.5 . So apparently a1 >> a2 . On a different non-uniform refinement, evaluate u(s(1, E)) on {P1} and u(s(2, ~E)) on {P1, P2} . Then u(a1 | s0) = .5(5) + .5(0) = 2.5 ; u(a2 | s0) = .5(5) + .5(3) = 4 . So apparently a2 >> a1 . This conflict is settled by moving to a calculation more refined than each of the previous two. Of course, we try not to evaluate decisions on insufficiently fine granularities, but we acknowledge that we cannot always commit resources to the analysis at maximally fine granularity; instead, we iteratively improve our reasoning about utilities of states and actions. A different non-comparability arises when one reasons uniformly with the world {P1, P2} , first, then uniformly with the world {P1, P3} . Any disagreement could be resolved with the common refinement, {P1, P2, P3} . B. Bounding on Demand. The ability to calculate prima facie utilities of non-terminal states allows decision trees to be bounded at any leaf, to be defoliated. In the decision tree above, I took s(1, E) to be an outcome state, though perhaps some event F was undecided in s(1, E) , and F is relevant to valuation of states, i.e., ~( u( s(1, E & F) ) = u( s(1, E & ~F) ) ) . If this is true, technically s(1, E) is not a *state*, but is instead a *lottery*, { s(1, E & F) / prob(E & F) ; s(1, E & ~F) / prob(E & ~F) } whose utility is given by the appropriate expectation. However, we often want to calculate u(s(1, E)) as if it were a terminal state, purposefully ignoring for the moment the ramifications of F versus ~F . We do this defeasibly, too; expected utility calculations that explore the ramifications of F supersede the defeasible attribute-adding calculation that was done while pretending s(1, E) was a terminal state. This shortcut is especially convenient when prob(E & F) is unavailable for some reason, e.g., expense of calculation. It is especially plausible when attributes used in the non-probabilistic evaluation of u(s(1, E)) include the fact that "significant-chance-of-F" holds in s(1, E) . This allows us to bound a decision tree wherever desired. As with abstraction, bounding can be done non-uniformly and arbitrarily; there is no reason for all paths from root to leaf to have the same length. Again, there can be non-comparability: when trees bound in different places mandate different decisions, it cannot in general be said which provides the better mandate. Disagreement is resolved by taking a common more extensive tree. <> 4. Future Directions. I have proposed a computationally attractive methodology for representing and manipulating decision trees on spaces large enough to support non-trivial planning. It can be based on any of a number of non-monotonic reasoners that have specificity defeaters, and it requires only two axiom schemata. The explored advantages were 1. practical specification of utilities; 2. abstraction on demand; and 3. bounding on demand. With a little imagination, it should be possible to formalize heuristics in this representation for bounding low-risk paths and guessing which attributes are most significant in determining utilty. This proposal is not related to the satisficing idea of Simon [6], or the keep-things-simple idea of Harman [7]. Ideas from each should lead to improvements of this proposal. It should also be possible to give a qualitative version of defeasible calculation of preference. And it should be possible to reason defeasibly about probabilities and utilities simultaneously. This proposal does avoid the meta-decision regress of deciding how much resource should optimally be committed to the formulation and analysis of a decison problem. Reasoning is simply completed at any level, then iteratively improved with superior reasoning -- based on superior refinement and decision tree exfoliation. If the agent starts with an unreasoned inclination toward some act, then if *no* reasoning, however shallow, is completed by decision time, to defeat this inclination, there is at least an answer to the question of what to do. I see now no reason why researchers in planning and decision theorists cannot compete together. Let the games begin! 5. Bibliography. [1] Charniak, E. and McDermott, D. Introduction to AI, Addison-Wesley, 1984. [2] Jeffrey, R. The Logic of Decision, Chicago, 1965. [3] Savage, L. The Foundations of Statistics, Dover, 1954. [4] Loui, R. "A System of Defeasible Inference," Computational Intelligence 3:3, 1987. [5] Quine, W.V.O. From a Logical Point of View, Harvard, 1953. [6] Simon, H. The Sciences of the Artificial, MIT Press, 1969?. [6] Harman, G. Change in View, MIT Press, 1986.