18.417: Introduction to Computational Molecular Biology
With the availability of genomic, expression, and structural data, math
and computer science have changed the face of modern biology. This course
introduces the basic computational methods used to understand the cell on a
molecular level. We first focus on sequence alignment algorithms: dynamic
programming, hashing, suffix trees, Gibbs sampling. We then cover
computational approaches to: genetic and physical mapping, DNA sequencing,
genome assembly and gene annotation; RNA expression and secondary
structure; protein structure and folding; and molecular interactions and
dynamics.
This page will contain lectures and handouts from Fall 2001. Lecture notes
from previous offerings can be found online for the Fall of 1999 and
the Spring of
1998.
Because the content of this class will differ markedly from the last time
it was offered, the new lecture notes this year will appear some time after
the material has been covered. Even the lecture notes on similar material
are being updated. However, some of you may want to be able to look at the
lecture notes from the last time the course was offered - this may be a
valuable addition to your own notes for the purposes of doing the homework.
Schedule
(class days in red, holidays in blue, reg/add/drop dates in green)
September October November December
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 1 2 3 4 5 6 1 2 3 1
2 3 4 5 6 7 8 7 8 9 10 11 12 13 4 5 6 7 8 9 10 2 3 4 5 6 7 8
9 10 11 12 13 14 15 14 15 16 17 18 19 20 11 12 13 14 15 16 17 9 10 11 12 13 14 15
16 17 18 19 20 21 22 21 22 23 24 25 26 27 18 19 20 21 22 23 24 16 17 18 19 20 21 22
23 24 25 26 27 28 29 28 29 30 31 25 26 27 28 29 30 23 24 25 26 27 28 29
30 30 31
Lectures
Introduction
Sequence Alignments
- Lecture 2 (9/13/01) (Slides, Scribe Notes): Global alignments (Needleman-Wunsh).
- Lecture 3 (9/18/01): Local alignments (Smith-Waterman).
(Slides,
ps,
pdf)
- Lecture 4 (9/20/01): k-mer based methods (BLAST).
(Slides,
ps,
pdf)
- Lecture 5 (9/25/01): Advanced alignment methods (Gibbs sampling, suffix trees).
(Slides,
ps,
pdf)
Genome
- Lecture 6 (9/27/01): NOVA on genomics. (optional)
(Online material.
- Lecture 7 (10/2/01) : Genetic mapping.
(Slides,
ps,
pdf)
- Lecture 8 (10/4/01) : Physical mapping.
(Slides,
ps,
pdf)
- Lecture 9 (10/11/01): Genome I: Recombinant DNA and Sequencing technologies.
(Slides,
ps,
pdf)
- Lecture 10 (10/16/01): Genome II: Whole-genome shotgun (Arachne) and clone-by-clone sequencing (Walking).
(Slides,
ps,
pdf)
- Lecture 11 (10/18/01): Genome III: Population genomics, SNP discovery, disease mapping.
(Slides,
ps,
pdf)
- Lecture 12 (10/23/01): Genome IV: Gene recognition (Genscan) and cross-annotation (Rosetta).
(Slides,
Slides1,
Slides2,
ps,
pdf)
Transcriptome and Evolution
- Lecture 13 (10/25/01): Regulation I: Transcription regulation, microarray technology, expression clustering.
(Slides,
ps,
pdf)
- Lecture 14 (10/30/01): Regulation II: DNA binding sites, location analysis, regulatory motif prediction.
(Slides,
ps,
pdf)
- Lecture 15 (11/1/01): Ribozymes, RNA World, RNA secondary structure, non-coding RNAs.
(Slides,
ps,
pdf)
- Lecture 16 (11/6/01): Evolution: RNA world, multiple alignments, phylogeny.
(Slides,
ps,
pdf)
Protein Structure
- Lecture 17 (11/8/01): Introduction to protein structure.
(Slides,
ps,
pdf)
- Lecture 18 (11/13/01): Protein motifs I: hidden Markov models for MSA.
(Slides,
ps,
pdf)
- Lecture 19 (11/15/01): Protein motifs II: prediction (coiled-coil and beta-helix motifs).
(Slides,
ps,
pdf)
- Lecture 20 (11/20/01): Threading.
(Slides,
ps,
pdf)
Protein Dynamics
- Lecture 21 (11/27/01): Molecular mechanics.
(Slides,
ps,
pdf)
- Lecture 22 (11/29/01): Side-chain packing.
(Slides,
ps,
pdf)
- Lecture 23 (12/4/01): Drug discovery tools.
(Slides,
ps,
pdf)
- Lecture 24 (12/6/01): Lattice models for protein folding.
(Slides,
ps,
pdf)
- Lecture 25 (12/11/01): Simulating virus shell assembly.
(Slides,
ps,
pdf)
Handouts
Homework and Grading
Problem Sets:
-
Bi-weekly problem sets will be due for each section
listed above. The homeworks are meant for students to get more practice
with the material, use existing bioinformatics tools, implement algorithms
presented in class, think more on theoretical questions. The problem sets
will be handed out at the end of class before each section and due at the
beginning of class before the next section.
Final Project:
-
The first five problem sets will be on specific topics, and the last one
will be more open-ended in form of a final project. The final project can
be studying a paper in depth, surveying a field, implementing/extending an
algorithm, or proving theoretical results on a problem of the student's
choice.
Groups:
-
Working in groups of two is encouraged, especially among students of
different backgrounds (biology or computation). Each student will have to
submit their own write-up of the solutions. Final projects can be done in
larger groups if justified.
Scribes and Editors:
-
In addition to the five problem sets, each student is responsible for
scribing one lecture and editing one lecture. If you are interested in
scribing specific topics, let us know by email, otherwise, volunteers will
be chosen at the beginning of each lecture. It's thanks to the scribe and
editor volunteers that lecture notes are made available online. Details on
scribing will be distributed at the first lecture.
Further Reading: Reference Books
No book is mandatory for the class, and no book will treat all the
topics we cover. Since people ask us for book advice though, here's a
list of books you can consult for your personnal interest, and for further
information on individual topics.
- Introduction to Computational Molecular Biology,
Setubal, Meidanis
- Biological Sequence Analysis, Durbin, Eddy, Krogh,
Mitchison
- Algorithms on Strings, Trees and Sequences, Dan Gusfield
- Molecular Cell Biology, Lodish, Berk, Zipursky,
Matsudaira, Baltimore, Darnell
- Introduction to Protein Structure, Branden, Tooze
Bonnie Berger (bab@mit.edu) and Manolis Kamvysselis (manoli@mit.edu)