Fall 04

Computational Biology is the application of computational techniques to solve problems in biology, which involves DNA and proteins. Traditionally, people from various disciplines, such as computer science, mathematics and statistics, formulate and address these problems within their own disciplines. More recently, multi-disciplinary collaborations become the norm, which include participations of biologists and biochemists.

The main purpose of the course is to expose students to various active research areas in computational biology. Everyone who is interested in computational biology is encouraged to take the course. For most topics, considerable time will be spent on presenting latest research ideas, mostly from the computer science point of view. Emphasis will be placed on problem formulation, where many problems in genomics and proteomics will be seen as graph-theoretic or optimization problems. The focus of the lectures is on presenting the newest computational approaches from research papers after briefly describing classical approaches in each area.

- Approaches for DNA and EST sequence assembly, its formulation as the shortest common superstring problem, and other heuristic approaches.
- Computational formulations and algorithms for biological sequence comparison problems, including the longest common subsequence formulation, pairwise and multiple sequence alignment approaches, and techniques for biological database search.
- Combinatorial and statistical approaches to motif finding and its application to find regulatory sites, including statistical optimization techniques, clique-based graph-theoretic formulations, tree-based branch-and-bound techniques, and the random projection technique.
- Computational approaches to gene finding and gene structure prediction,
including
*ab-initio*and similarity-based approaches. - Scalable algorithms for comparative genomics and whole genome comparisons.
- Study of genome rearrangements as mathematical operations on permutations and inferring evolutionary relationships as phylogenetic trees.
- RNA and protein structure prediction and techniques for studying protein folding pathways with or without known native state.
- Probe selection problem for microarray design and approaches for clustering microarray expression data.
- Computational proteomics and finding similar substructures in biological networks by graph-based methods.

Homework Assignments (40%)

- Consists of short written assignments handed out every one or two weeks. These exercises will emphasize creativity in problem solving.

Presentation (40%)

- Towards the end of the semester, each student will give a short presentation either on a paper of interest or on a survey of a research area.

Final Exam (20%)

Computational Biology books

- Baldi P. and Brunak S. (2001)
*Bioinformatics: The Machine Learning Approach, Second Edition*. The MIT Press. - Clote P. and Backofen R. (2000)
*Computational Molecular Biology: An Introduction*. John Wiley & Sons. - Durbin R., Eddy S., Krogh A., and Mitchison G. (1998)
*Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids*. Cambridge University Press. - Gusfield D. (1997)
*Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology*. Cambridge University Press. - Pevzner P.A. (2000)
*Computational Molecular Biology: An Algorithmic Approach*. The MIT Press. - Setubal J.C. and Meidanis J. (1997)
*Introduction to Computational Molecular Biology*. PWS Publishing Company. - Waterman M.S. (1995)
*Introduction to Computational Biology: Maps, Sequences and Genomes*. Chapman & Hall.

Computer Science books

- Cormen T.H., Leiserson C.E., Rivest R.L., and Stein C. (2001)
*Introduction to Algorithms, Second Edition*. The MIT Press.

Biology books

- Lodish H., Berk A., Zipursky S.L., Matsudaira P., Baltimore D., and
Darnell J. (2000)
*Molecular Cell Biology, Fourth Edition*. W.H. Freeman.