[PDF][PDF] Multiple Sequence Alignment Using Anytime A*.

R Zhou, EA Hansen - AAAI/IAAI, 2002 - cdn.aaai.org
R Zhou, EA Hansen
AAAI/IAAI, 2002cdn.aaai.org
Alignment of multiple DNA or protein sequences is a central problem in computational
biology. To create an alignment, gaps are inserted into sequences to shift characters to
matching positions and a scoring function is used to rank the biological plausibility of
alignments. Multiple sequence alignments are used to identify homologies among different
species that reveal evolutionary history from a common ancestor. They are also used to
discover genetic causes of certain diseases and to predict protein structure, which has …
Alignment of multiple DNA or protein sequences is a central problem in computational biology. To create an alignment, gaps are inserted into sequences to shift characters to matching positions and a scoring function is used to rank the biological plausibility of alignments. Multiple sequence alignments are used to identify homologies among different species that reveal evolutionary history from a common ancestor. They are also used to discover genetic causes of certain diseases and to predict protein structure, which has significant importance in the design of drugs. The multiple sequence alignment problem can be formalized as a shortest-path problem through a d-dimensional lattice, where d is the number of sequences to be aligned (Gusfield 1997). Dynamic programming is the traditional approach to constructing optimal alignments. Improved performance has recently been achieved using A*. However, the multiple alignment problem presents a difficulty for the classic A* algorithm. Its branching factor of 2d− 1 is so large that the size of the open list dramatically exceeds the number of nodes A* must expand to find an optimal solution.
Two solutions to this problem have been proposed in the literature. Yoshizumi et al.(2000) describe an extension of A*, called A* with Partial Expansion (PEA*). Instead of generating all successors of a node when it is expanded, PEA* inserts only the most promising successors into the open list. The “partially expanded” node is re-inserted into the open list with a revised f-cost equal to the least f-cost of its unexpanded successors, so that it can be re-expanded later. Use of this technique dramatically reduces the size of the open list, and PEA* can solve larger multiple sequence alignment problems than A*. Unfortunately, the reduced space complexity of PEA* is achieved at the cost of node re-expansion overhead. The tradeoff between space and time complexity is adjusted by setting a “cutoff value” C, which determines which successor nodes to add to the open list. Another way to reduce the size of the open list is to prune nodes from the open list if their f-cost is equal to or greater than a previously established upper bound, since such nodes will never be expanded by A*. This approach was first proposed by Ikeda and Imai (1999), who called it enhanced A*(EA*). One way to obtain an upper bound is to use the solu-
cdn.aaai.org
Showing the best result for this search. See all results