Heuristic Search Planners
2
USC INFORMATION SCIENCES INSTITUTE
Planning as heuristic search
Use standard search techniques, e.g. A*, best-first, hill-climbing etc. Attempt to extract heuristic state evaluator automatically from the Strips encoding of the domain Here, generate relaxed problem by assuming action preconditions are independent3
USC INFORMATION SCIENCES INSTITUTE
Recap: A* search
Best-first search using node evaluationf(n) = g(n) + h(n)
where
g(n) = accumulated cost
h(n) = estimate of future cost
For A*, h(.) should never overestimate the cost. In this case, the solution will be optimal. Then h is called an admissible heuristic.4
USC INFORMATION SCIENCES INSTITUTE
Derive cost estimate from a relaxed planning problem
Ignore the deletes on actions BUT 00 still NP-hard, so approximate: For individual propositions p:d(s, p) = 0 if p is true in s
= 1 + min(d(s, pre(a))) otherwise
[min over actions a that add p]
5
USC INFORMATION SCIENCES INSTITUTE
Cost of a conjunction
How to compute d(s,pre(a)) or d(s,G) ? Different options: Additive: d(s, P) = sum d(s, p) over p in P Max: d(s, P) = max d(s, p) Then h(s) = d(s, G) Can compute d(.,.) in polynomial time6
USC INFORMATION SCIENCES INSTITUTE
Admissibility and information
Is h+ (additive version) admissible? How about h-max?7
USC INFORMATION SCIENCES INSTITUTE
Admissibility and information II
If h+ is not admissible, why would we use it rather than h-max?8
USC INFORMATION SCIENCES INSTITUTE
HSP algorithm overview
Hill-climbing search + restarts if plateau for too long Some ad hoc choices for the planning competition Hill-climbing search is not complete9
USC INFORMATION SCIENCES INSTITUTE
HSP2 overview
Best-first search, using h+ Based on WA* - weighted A*:f(n) = g(n) + W * h(n).
If W = 1, it00 A* (with admissible h).
If W > 1, it00 a little greedy 00generally finds solutions faster, but not optimal.
In HSP2, W = 510
USC INFORMATION SCIENCES INSTITUTE
Experiments
Does ok compared with IPP (Graphplan derivative) and Blackbox.11
USC INFORMATION SCIENCES INSTITUTE
Regression search
Motivation for HSPr HSP and HSP2 spend up to 80% of their time computing the evaluation function. Slow to generate nodes compared to other heuristic search systems. Search backwards from goal, then re-use cost estimates from s0 to the goal, since we always have a single start state s0. Common wisdom: regression planning is good because the branching factor is much lower12
USC INFORMATION SCIENCES INSTITUTE
HSPr problem space
States are sets of atoms (correspond to sets of states in original space) initial state s0 is the goal G Goal states are those that are true in s0 Still use h+. h+(s) = sum g(s0, p)13
USC INFORMATION SCIENCES INSTITUTE
Mutexes in HSPr
Problem: many of the regressed goal states are 00mpossible0000prune them with mutexes E.g in blocksworld (on(c,d), on(a,d), ..) is probably unreachable.14
USC INFORMATION SCIENCES INSTITUTE
Mutexes in HSPr
First definition:A set M of pairs R = {p, q} is a mutex set if
(1) R is not true in s0
(2) for every op o that adds p,
o deletes q
Sound, but too weak.15
USC INFORMATION SCIENCES INSTITUTE
Mutexes in HSPr, take 2
Better definition:A set M of pairs R = {p, q} is a mutex set if
(1) R is not true in s0
(2) for every op o that adds p,
either o deletes q
or o does not add q, and for some precond r of o,
{r, q} is in M.
Recursive definition allows for some interaction of the operators
16
USC INFORMATION SCIENCES INSTITUTE
Computing mutex sets
Start with some set of potential mutex pairs Delete any that don00 satisfy (1) and (2) above Keep going until you don00 delete any more Initial set? 00could be all pairs (usually too expensive)17
USC INFORMATION SCIENCES INSTITUTE
Initial set of potential mutexes
Ma = { {p, q} | some action adds p and deletes q} Mb = { {r, q} | {p, q} is in Ma, some action adds p,and has r in the precondition}
Initial set = Ma u Mb Mutex set derived from Ma u Mb is M*18
USC INFORMATION SCIENCES INSTITUTE
HSPr algorithm
WA* search using h+(s0) and M* W = 5 as before Prune states that contain pairs in M*19
USC INFORMATION SCIENCES INSTITUTE
Experiments comparing HSP2 and HSPr
Sometimes HSPr does better, sometimes HSP2 does better. Why? Two reasons (per B & G): Still have spurious states Since HSP2 recomputes the estimate in each state, it actually has more information20
USC INFORMATION SCIENCES INSTITUTE
Evidence for spurious states
Re-run HSPr using mutex set derived from all possible pairs. No difference in most domains Improvement in tire-world domain (with complex interactions) Slows down in logistics domain21
USC INFORMATION SCIENCES INSTITUTE
Branching factor
Varies widely from instance to instance. (Always seems greater in forward chaining though) Performance of HSP2 vs HSPr doesn00 seem to correlate with branching factor Other factors dominate, e.g. informedness of heuristic22
USC INFORMATION SCIENCES INSTITUTE
Derivation of heuristics
h+ has problems when there are positive or negative interactions Can efficient heuristics better capture the interactions? H^2 00 use the cost of the most expensive pair of goals Still admissible, more informative than hmax, still cheap Room for domain-dependent options?23
USC INFORMATION SCIENCES INSTITUTE
Comparing HSPr and Graphplan
Both search forwards in relaxed space, then backwards Planning graph encodes an admissible heuristic: hg(s) = j if j is the first level where s appears without a mutex Graphplan encodes IDA* efficiently as solution extraction 00but this makes it hard to use other search algorithms.24
USC INFORMATION SCIENCES INSTITUTE
Overall
Planning as heuristic search: HSP family are elegant, quite efficient for domain-independent, and use clear principles of search Simple algorithms and relatively thorough analysis 00make it easy to consider lots of extensions25
USC INFORMATION SCIENCES INSTITUTE
Ways to extend
Improving automatically generated heuristics More flexible action representations Probably easier to encode in forwards than backwards search Principles and format for encoding domain-dependent heuristics Both the estimate function and other control