Home Page Image

The Ensemble Protein Database


A public resource housed at the University of Pittsburgh,
Department of Computational Biology
Curated by Prof. Daniel M. Zuckerman
Co-created by Prof. F. Marty Ytreberg (University of Idaho)

Skip the verbiage – take me right to the structures - link

What is the “Ensemble Protein Database”?  It is a recently begun (July, 2006) database of ensembles – i.e., sets – of protein structures generated by computer simulation.  It does not contain ordinary molecular dynamics trajectories, however.

NEW!  Statistical rotamer libraries of side-chain configurations. link

Who needs ensembles?  For starters, biology needs ensembles.  If proteins remained frozen in unique structures (such as can be found at Protein Data Bank), there would be no life.  By and large, proteins are little machines that function by moving.  Ensembles of structures are a good way of describing such motions.  We hope these ensembles will also be useful in computational drug design (to represent receptor flexibility) and in basic biochemical endeavors, perhaps in understanding allostery

But what about NMR ensembles?  In fact, this database was largely motivated by doubts surrounding NMR ‘ensembles.’  NMR ensembles are not representative of the true range of solution motions because the computational techniques used to generate the ensembles are biased toward homogeneity – i.e., to artificial similarity among generated structures.  NMR ensembles typically are generated via simulated annealing (a non-statistical protocol) followed by selection of an arbitrary number of arbitrarily low energy structures.  This is not a recipe for equilibrium sampling.

What is meant by “ensemble”?  In statistical physics, a field whose principles describe the way proteins fluctuate in solution, the word ensemble has a quite precise meaning, but it is used less carefully throughout structural biology.  The EPDB database, eventually, will include two types of ensembles: (1) the Boltzmann weighted ensembles expected based on statistical physics (what this means) and (2) ad hoc ensembles which are not Boltzmann-weighted but nevertheless include a broad range of motions and a very crude approximation to Boltzmann weighting (explained here).

Which proteins are included?  Initially, only small proteins (< 200 residues) and only ensembles generated in Zuckerman and Ytreberg groups will be included. 

To the ensembles!

Supported by the National Science Foundation and National Institutes of Health