As this blog reflects an intersection of my interests in technology and general science, I figured an ideal topic to start off with would be the Protein Folding Problem.
You’ve probably heard about the protein folding problem, although you may not have realized it. The most common references online to protein folding usually involve Folding@home (a distributed computing approach to the problem similar to SETI@home). Usually these references are entitled “Use your PS3 to cure cancer” or some other such overstatement. While there are no doubt real implications to understanding protein folding (including possible cancer therapeutics), such overstatements aren’t helpful in understanding exactly what you’re making your PS3 do.
So what is the protein folding problem? In the simplest terms it’s the manner in which a protein adopts its 3D conformation. If you can remember back to basic high school biology you might recall that the central dogma of biology is this:
DNA->RNA->protein
DNA is the set of instructions for making everything in a cell. RNA is an intermediate, essentially a specific subset of the instructions in your DNA that is then assembled into a protein (the biologists amongst you may take exception from this oversimplification, but for now we’ll stick with it) . You may know of proteins as they pertain to your diet, but in actuality proteins are much more- they are the molecular workhorses of the cell. Most of the chemistry and processes carried out in a cell are done by proteins. These include things like breaking down your food into energy, or recognizing viral particles. In the cell, proteins get stuff done.
Ok so what’s this whole folding business? Well proteins are composed of 20 basic building blocks called amino acids. String these amino acids together in a linear chain and you get a protein. In biology, these amino acids are represented by single letters; for instance DYKDDDDK represents an 8 amino acid protein starting with D and ending with K. Because amino acids are (essentially) the only components of proteins every protein can be represented by sequence of letters. Some proteins are very large (hundreds of letters) while others are small (like the 8 letter protein above) – given that there are 20 naturally occurring amino acid possibilities at each point and the unbounded size of the sequence there is an infinte number of potential protein sequences.
Now of course the human genome is finite, and you might recall we’ve gone ahead and sequenced the whole thing - and we’ve gotten pretty good at determining what parts of the DNA are actually turned into protein. So essentially we know what almost all the proteins in your cells are – at least on the level of these string-like representations. However your proteins do stuff based on their actual 3D structure. That is, when I write DYKDDDDK that doesn’t mean there’s a happy trail of letters in your body wandering about the cell and carrying out their business. Rather each letter represents a specific chemical structure. D for instance is Aspartic acid which looks like this:
Y is Tyrosine which looks like this:
You can string them together and see what DY looks like in 2D but of course the world is actually in 3D, and each of those oxygens, carbons, etc occupy a position in space relative to one another.
And therein lies the protein folding problem. How do you translate an amino acid sequence (i.e. DYDDDDK) into a three dimensional structure? Well it’s not an easy problem, as there aren’t a clear set of rules – the position of each part of each amino acid can be influenced by a whole host of factors including the amino acids neighboring it, the presence of water, and many other things. You can start to see that the problem begins to get very complex, much too complex for a mere human to think about.
Enter the computers. Looking at 3D structures of proteins that people have determined experimentally, there’s no clear set of rules (although some trends become evident) – however this is science and there are theoretically some set of axioms we started with (yes, even in biology). Things like the electric charge of these amino acids (things that are positive will be attracted to things that are negative, neutral things will want to pack in the interior and hide from charged water molecules) are known and can be used to calculate energy maps for specific conformations. You can do multiple iterations and find the most stable (i.e. lowest energy) conformation and use it as your predicted protein structure. This is essentially what Folding@home is doing (at least this is one approach to the problem, and the one I believe that Folding@home is taking). You can even do this for known structures and see how good your algorithm is at returning the known structure.
So why does anyone care what a protein looks like in 3D? Well the 3D structure is important for understanding the function of the protein and that’s a very important thing to understand. If you protein is a drug target (say an HIV protein) knowing its 3D structure could help you design a drug that binds a specific structure on the protein and thereby inhibit its function (i.e. viral replication). A lot of diseases are the results of misfolded proteins (cystic fibrosis, Mad Cow Disease) so understanding the folding process itself is an interesting thing.
So there you have it a basic introduction to the protein folding problem and one computational approach to it. There are other approaches as well, and I’ve definitely oversimplified things. If you’re curious I suggest the Folding@home website, wikipedia, or a simple google search to explore further.
Tags: folding, Folding@home, protein, Protein Folding
August 2, 2008 at 7:02 pm
Thanks !
August 9, 2008 at 3:59 am
Thanx Doc
I was looking around the web searching for a simplistic explanation of the process and found this