CSCE 625 Homework Assignment #1 due: Tues, 9/15/09 This homework will give you some experience implementing search algorithms based on Ch. 3. This is primarily a programming assignment. You may use any language you like. You will have to turn in to the TA both a copy of your code, and a short written report describing your results (showing examples, and reporting some statistics). --------------------------- 1) Write a program to find longest-paths in random directed acyclic graphs (DAGs). Shortest-paths are a well-known problem in Computer Science. In fact, Dijkstra's single-source shortest-path algorithm can be thought of as a uniform-cost search in a weighted undirected graph, and the Floyd-Warshall algorithm can solve the all-pairs shortest path problem in polynomial time using dynamic programming. In contrast, the longest-path problem is NP-complete for undirected graphs, implying that a complete and tractable algorithm does not exist. However, for directed graphs, a variant of uniform-cost (selecting for increasing path length) will work. An example where you might want to identify longest paths is the contig assembly problem in DNA sequencing. Imagine that a long chromosome (~million nucleotides) is broken into overlapping fragments of ~500 bases long. Consider a graph where the nodes are the fragments, and edges between them represent significant overlaps (where the downstream end of one fragment has the same sequence as the upstream end of another fragment). The "contig" assembly problem is the goal to the longest connectable sequences of fragments, representing large contiguous portions of the chromosome. In this problem, you will implement a simple program that generates random DAGs and then determine some properties of their paths. A DAG can be generated by starting with a sequence of N ordered nodes (e.g. 1..N), and then randomly picking M edges (i,j) such that 1<=i python longest_path.py usage: python longest_path.py 2 sun> python longest_path.py 10 10 5 0 True (N-1 is reachable from 0) edges: [(4, 8), (8, 9), (0, 4), (1, 4), (4, 8), (2, 5), (6, 7), (3, 6), (7, 8), (5, 7)] shortest path: [0, 4, 8, 9] longest path: [0, 4, 8, 9] 1 True edges: [(3, 9), (3, 9), (5, 9), (4, 5), (7, 9), (0, 8), (5, 9), (8, 9), (4, 9), (6, 7)] shortest path: [0, 8, 9] longest path: [0, 8, 9] 2 False (N-1 is not reachable from 0) edges: [(5, 8), (2, 9), (2, 8), (1, 7), (8, 9), (5, 8), (2, 5), (6, 7), (6, 8), (5, 7)] 3 True edges: [(5, 6), (2, 4), (5, 9), (4, 9), (0, 8), (6, 9), (8, 9), (7, 8), (0, 7), (7, 9)] shortest path: [0, 8, 9] longest path: [0, 7, 8, 9] 4 False edges: [(3, 6), (3, 7), (1, 3), (0, 1), (4, 7), (1, 3), (5, 7), (1, 5), (3, 8), (2, 7)] nstates=10, nedges=10, reachable=3/5=0.60 avg len of shortest path: 3.33, stdev=0.47 avg len of longest path: 3.67, stdev=0.47 --------------------------- 2) Implement a search algorithm to solve the 8-puzzle (as described in the textbook). Start by choosing a representation for states, such as a linear array of length 9, or a 3x3 array. Next implement a successor() function, which generates the set of all legal moves, by sliding an adjacent tile into the current empty space. Note that there are 2-4 successors of every state. Estimate the size of the search space (a simple upper-bound is fine, although the textbook describes how not all states are reachable, and in fact can be broken into two disjoint subsets). goal state: 1 2 3 4 5 6 7 8 The successor function can be used to generate random problems by starting with the goal state and "scrambling" it by applying a sequence of random moves. In fact, the number of random moves can be used to control problem difficulty (puzzles generated by 5 random moves require at most 5 moves to solve them, though often less; puzzles generated by 50 random moves can take up to 25-30 moves to solve). Start by writing a simple BFS search algorithm using a queue. Test the algorithm on randomly generated puzzles and calculate statistics on the average length of solution, the maximum queue size, and the number of goal-tests performed. Next a simple modification of the BFS procedure will produce a DFS search. Use DFS to solve a sample of random problems and compare the statistics listed above to those for BFS. How does the optimal solution length scale up with the number of scambling steps? An example output is attached. You will probably find that the maximum queue size is smaller for DFS than BFS, but the solutions are much longer. In fact, there may be some problems that cannot be solved in a reasonable amount of time (or memory) by either method. Next, implement iterative-deepening search. Do this by writing a simple depth-limited search, and writing a wrapper around it to progressively increase the depth limit. See if you can solve even more complex problems (with deeper solutions) than by BFS or DFS. In this problem, checking for visited states will be crucial, since may alternative sequences of moves can produce the same state (in addition, operators are reversible). You will probably want to use a hash table to keep track of visited states efficiently. Also, remember that you have to keep track of path information in order to print out the final solution (sequence of moves) that solves the puzzle (returns to goal state) and measure its length. The checking for visited states must be done at the right time; it can be done prior to putting nodes in the queue, and also when they are de-queued. Even with this, there may be some problems that cannot be solved in a reasonable amount of time or space. You will probably need to implement some limits, such as when the queue or number of goal tests gets too large, at which point the program will be force to fail. Be careful not to let a runaway process eat up all the memory on your computer. ### Example output ### 3 sun> python tiles.py usage: python tiles.py 4 sun> python tiles.py BFS 20 goal: 1 2 3 4 0 5 6 7 8 init: 1 2 3 7 0 6 4 8 5 path: 1 2 3 7 0 6 4 8 5 1 2 3 7 6 0 4 8 5 1 2 3 7 6 5 4 8 0 1 2 3 7 6 5 4 0 8 1 2 3 7 0 5 4 6 8 1 2 3 0 7 5 4 6 8 1 2 3 4 7 5 0 6 8 1 2 3 4 7 5 6 0 8 1 2 3 4 0 5 6 7 8 solution depth: 8 num goal tests: 326 max queue size: 203