Compressive Genomics

CaBLASTP is a suite of homology search tools, powered by compressively-accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate to all known state-of- the-art tools including HHblits, DELTA-BLAST, and PSI-BLAST.
RQS (Read Quality-score Sparsifier) is an efficient de novo quality score compression tool based on traversing the k-mer landscape of NGS read datasets.
CAST is a set of tools that compress data in a way that allows direct computation on the compressed data. This approach reduces the computational task of operating on many highly similar genomes to only slightly more than that of operating on just one. We demonstrate this compressive architecture by implementing accelerated versions of both BLAST and BLAT.

Systems Biology

The Struct2Net program predicts protein-protein interactions (PPI) by integrating structure-based information with other functional annotations, e.g. GO, co-expression and co-localization etc. The structure-based protein interaction prediction is conducted using a protein threading server RAPTOR plus logistic regression.
IsoRank is an algorithm for global alignment of multiple protein-protein interaction (PPI) networks. The intuition is that a protein in one PPI network is a good match for a protein in another network if the former's neighbors are good matches for the latter's neighbors.
Concordia allows you to upload an Affymetrix HGU-133 Plus 2.0 CEL file to obtain the Unified Medical Language System (UMLS) concepts that it is most enriched for based on its similarity with the microarray samples in the Concordia database.
t-sample is an online algorithm for time-series experiments that allows an experimenter to determine which biological samples should be hybridized to arrays to recover expression profiles within a given error bound.


HapTree HapTree is a polyploid haplotype assembly tool based on a statistical framework.
ARACHNE is a program for assembling data from whole genome shotgun sequencing experiments. It was designed for long reads from Sanger sequencing technology, and has been used extensively to assemble many genomes, including many that are large and highly repetitive.
GLASS aligns large orthologous genomic regions using an iterative global alignment system. Rosetta identifies genes based on conservation of exonic features in sequences aligned by GLASS.
RNAiCut - Automated Detection of Significant Genes from Functional Genomic Screens.
MinoTar - Predict microRNA Targets in Coding Sequence.

Structural Bioinformatics

SMURFlite builds on SMURF, generalizing the Markov random field to potentially any beta-structural protein, by simplifying the MRF topology, while augmenting its sensitivity using a model of "simulated evolution." SMURFLite can detect homologous beta-structural proteins in superfamilies with adequate structural representatives.
MultiCoil2 predicts the location and oligomerization state (two vs. three helices) of coiled coils in protein sequences. Multicoil2 combines Multicoil's pairwise correlations with a HMM, resulting in a Markov random field. Multicoil2 significantly improves coiled-coil detection and dimer vs. trimer prediction over the original Multicoil. The original Multicoil program is still available for use.
SMURF (Structural Motifs Using Random Fields) detects homologous beta-propeller proteins. It extends profile HMMs to Markov random fields in order to model non-local hydrogen-bond interactions in the beta-propeller folds.
MATT is a multiple protein structure alignment program. It uses local geometry to align segments of two sets of proteins, allowing limited bends in the backbones between the segments.
The BetaWrap program detects the right-handed parallel beta-helix super-secondary structural motif in primary amino acid sequences by using beta-strand interactions learned from non-beta-helix structures.
Wrap-and-pack detects beta-trefoils in protein sequences by using both pairwise beta-strand interactions and 3-D energetic packing information
The BetaWrapPro program predicts right-handed beta-helices and beta-trefoils by using both sequence profiles and pairwise beta-strand interactions, and returns coordinates for the structure.
The MSARi program indentifies conserved RNA secondary structure in non-coding RNA genes and mRNAs by searching multiple sequence alignments of a large set of candidate catalogs for correlated arrangements of reverse-complementary regions
The Paircoil2 program predicts coiled-coil domains in protein sequences by using pairwise residue correlations obtained from a coiled-coil database. The original Paircoil program is still available for use.
The LearnCoil Histidase Kinase program uses an iterative learning algorithm to detect possible coiled-coil domains in histidase kinase receptors.
The LearnCoil-VMF program uses an iterative learning algorithm to detect coiled-coil-like regions in viral membrane-fusion proteins.
The Trilogy program discovers novel sequence-structure patterns in proteins by exhaustively searching through three-residue motifs using both sequence and structure information.
The ChainTweak program efficiently samples from the neighborhood of a given base configuration by iteratively modifying a conformation using a dihedral angle representation.
The TreePack program uses a tree-decomposition based algorithm to solve the side-chain packing problem more efficiently. This algorithm is more efficient than SCWRL 3.0 while maintaining the same level of accuracy.
PartiFold: Ensemble prediction of transmembrane protein structures. Using statistical mechanics principles, partiFold computes residue contact probabilities and sample super-secondary structures from sequence only.
tFolder: Prediction of beta sheet folding pathways. Predict a coarse grained representation of the folding pathway of beta sheet proteins in a couple of minutes.
RNAmutants: Algorithms for exploring the RNA mutational landscape. Predict the effect of mutations on structures and reciprocally the influence of structures on mutations. A tool for molecular evolution studies and RNA design.
AmyloidMutants is a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability.

Population Genetics

PC-select calculates GWAS association statistics using a data-adaptive GRM that improves power over standard mixed models while simultaneously avoiding confounding from population stratification.
MixMapper is an efficient, interactive method for constructing phylogenetic trees, including admixture events, using SNP data. MixMapper uses a two-phase approach by first building an unadmixed scaffold tree and then adding admixed populations by looking at allele frequency divergences. MixMapper expresses branch lengths in easily interpretable drift units.
ALDER provides a linkage-disequilibrium-based three-population test for admixture. ALDER extends ROLLOFF to provide a new form of weighted LD statistic, a statistical test for admixture, determination of minimum genetic distance at which to start curve fitting, and improved weighted LD curve fitting.