Human Growth Hormone, UCSF, Genentech and a Whole Lotta Money

December 16, 2007

So the Mitchell Report is out, casting steroids and human growth hormone (HGH) into day-to-day discourse. Although it would be interesting to talk about what HGH does and how it can enhance athletic performance, I’m willing to bet there are others out there who are already doing that. Instead I thought I’d talk a little bit about another dark side of HGH that you probably haven’t heard about - the patent dispute between the University of California, San Francisco (UCSF), and Genentech, Inc.

Before I begin, I should disclose that I am a graduate student at UCSF and have heard various versions of this story in different amounts of detail from faculty and other potentially biased sources, but I have attempted to research the issue as best I can, and everything here is based entirely on reporting in Science, The Washington Post, and publications of equivalent believability. This is, to the best of my knowledge, the history of HGH.

Growth hormone is essentially a protein which acts as a signaling molecule - HGH is the human version of this hormone. When the protein binds its receptor, signaling pathways are activated that stimulate the growth of the cell (this is, of course, a vast oversimplification, but should serve for the purposes of this story) Because of its ability to trigger these pathways, growth hormone has many therapeutic uses including treatment of multiple sclerosis.

Recall from my earlier posts that the central dogma of biology is that protein is encoded by RNA which is in turn encoded by DNA. Thus there is a DNA sequence which tells a cell to build growth hormone. One common technique in molecular biology is to “copy” a specific piece of DNA (in this case the DNA coding for HGH) and place it into bacteria. The rationale for this is that you can grow lots and lots of bacteria and then extract and purify the protein, cheaply and quickly.

In 1977, Dr. Peter Seeburg, then in a postdoctoral position in Dr. Howard Goodman’s lab at UCSF successfully copied the DNA that coded for HGH and UCSF was awarded the patent for the gene. Incidentally, UCSF’s version of the copied DNA included the DNA sequence that encodes for HGH but with an additional 48 nucleotides (i.e. 48 more characters) added on at the end (these nucleotides are present in the human genome, but don’t actually serve any purpose in producing the protein) - an unimportant scientific distinction, but one which would be important legally years later.

In 1987, Seeburg left UCSF for Genentech where his job would focus on expressing that DNA in bacteria, so that Genentech could grow lots and lots of the bacteria, which would then make lots and lots of HGH, which Genentech would then use to make lots and lots of money. In fact Genentech eventually did just that, producing the drug Protropin which went on to produce over $2 billion in sales.

In 1990, UCSF sued Genentech for $400 million for infringing their HGH patent. Genentech’s response was that they had developed their HGH from DNA that was independent from the original UCSF DNA. This could have actually been entirely possible, except that, at trial, Seeburg would testify that he actually HAD copied the UCSF DNA.

When Seeburg left UCSF, Robert Swanson, then president of Genentech, sent a letter to Seeburg’s old boss, Dr. Goodman asking for the DNA that Seeburg had worked on. Goodman apparently refused so Seeburg, fresh out of his postdoc, visited his old lab on midnight of New Year’s Eve, and took copies of the DNA with him. Seeburg justified this “midnight raid” claiming that it was customary for scientists to take work they had produced with them to their next positions (it is actually quite common, if not quite proper, to do so within academia), and that he had gone at the late hour merely to avoid Dr. Goodman with whom he was no longer on friendly terms. UC later found out about this incident and in 1980 settled for $2 million with Genentech, but still retained full patent rights over their DNA construct.

Genentech, not wishing to use any of UC’s intellectual property, decided to duplicate the old work and isolate the DNA for HGH themselves and then introduce it into bacteria. Except that they couldn’t get it to work. So Seeburg, with, allegedly, the knowledge of his Genentech coworker Dr. David Goeddel, decided to just use the UC construct to make their bacteria. They even published a Nature paper with results from their supposedly novel construct. At trial, Seeburg, no longer with Genentech and testifying on behalf of UC, stated that much of the data from the Nature paper with all new “UC free construct” was fudged. Goeddel denied knowing that Seeburg had cheated, and Seeburg’s coauthors on the Nature paper denied Seeburg’s account of falsified data (although the “midnight raid” and lifting of UC property was corroborated).

In the legal battle, Genentech argued that because the UC sequence contained those extra base pairs, the Genentech DNA, which contained only the HGH DNA, was substantially different, and an entirely new invention (many patent lawyers actually thought Genentech had a decent case, and that Seeburg’s testimony, although indicative of intent to copy, had no bearing on the final legal issue of whether the Genentech construct was sufficiently different). In 1999, a nine member jury ruled 8 to 1 in favor of UC, however the split decision saved Genentech from damages that could have been as high as $1.2 billion. Rather than risk losing on appeal, and with UC having already invested $20 million in legal fees, the two sides settled for $200 million. $50 million went to building my home away from home, Genentech Hall, - the first building at the new UCSF Mission Bay campus:

Genentech

Picture taken from (http://www.pbase.com/klaorman/image/11306539)

$30 million went to the UC general fund, $35 million went to research at UCSF and the remaining $85 million was split amongst the original inventors and collaborators, including Dr. Seedburg, whose individual share ended up as $17 million.

I’ve glossed over a lot of the science and legal issues involved here, partly because I’m no lawyer and I don’t know enough to say what really happened here, but it’s clear that HGH has had a colorful history that continues today.


Folding 2.0?

November 30, 2007

I got to listen to a talk today given by Dave Baker, one of the big names in the protein folding field, and I was not disappointed. There’s a lot of potential material from today’s talk that would be good subject matter for this blog, and I’ll no doubt post more about folding in the future, but the highlight of the talk was FoldIt!

Before I get into too much detail about FoldIt specifically there are a couple things about protein folding I glossed over in the last post. In computer modeling of protein folding, there are usually two major problem areas - sampling and scoring.

For a given protein, we know the fundamental building blocks (amino acids) that compose it and the order in which they are connected - the whole problem of protein folding is how that linear chain “curls up” (i.e. folds) in 3D space. If we were to take every potential position and assign it a score (sometimes called the energy score) we could build a gigantic scoring landscape. Imagine the Grand Canyon: every potential spot you could stand represents a particular conformation (i.e amino acid A is a certain distance and certain angle relative to B) and your elevation represents how good that conformation’s score is (let’s say the lower you are the better). Any step you took in one direction would represent slightly changing the conformation (maybe you pull a certain bond a little further apart) and if you had to walk uphill to that new position your changes would be bad, while walking downhill would be good. A good folding algorithm will try and walk down the canyon as far as possible until it finds a position where every possible next step would take it uphill (once again I’m oversimplyfing for clarity, but this is the general idea).

What are sampling and scoring in this analogy? Scoring is basically your ability to recapitulate the Grand Canyon. That is, if you built a computer model of the canyon, your map is going to be of limited resolution (depending on the manner in which it was built) and will not quite recapitulate the real Grand Canyon. I’m not really going to talk much about scoring this post so even if it doesn’t make sense, read on. Sampling is easier to think about - it’s basically how much of the Grand Canyon you are able to visit. If you walk across the whole canyon you can be sure that your lowest point is the actual lowest point in the entire canyon since you walked across the whole damn canyon. Had you walked only 50% of it, you may have found a place in the canyon that’s pretty low , but that unwalked portion contains an even lower point. Sampling is a typically computational intense process - as fast as computers are today, it takes a long time calculate the score for every position. Furthermore you can always take smaller steps between points (think of someone with a large stride as compared to someone with a small stride…the large strider may “stride” past and miss a pathway the small strider will see), so there’s really an infinite amount of points to sample.

Now one way to approach the problem of sampling is by increasing computing power. The most famous example of this is SETI@home, or Folding@home mentioned yesterday. There’s a similar program for called rosetta@home, which is developed by the Baker lab. “rosetta” refers to the program which makes the calculation to assign a score for a given conformation. Simply put, rosetta@home is a screen saver which uses sophisticated algorithms to move around amino acid side chains and try to pack the 3D protein structure into its lowest energy (i.e. most stable) state. By installing rosetta@home you donate your computer’s idle time to performing these calculations which are then communicated back to the Baker lab.

This all well and good, but the Baker lab went even one step further and developed FoldIt! - a computer game similar to rosetta@home. I like to think of FoldIt as the Web 2.0 approach to protein folding (Web 2.0 is a poorly defined keyword first introduced by Tim O’Reilly but which generally refers to harnessing the power of the collective to accomplish tasks - think of wikipedia, digg, or youtube). FoldIt! is a computer game that allows you to change the 3d structure of the protein by moving parts of it around, while rosetta scores your conformation on the fly. Basically instead of having the computer decide which path to take down the canyon, you’re able to run haphazardly around and try to find the lowest point on your own. The great thing is you don’t need to understand anything about protein folding, all you need to do is understand that you need to move thing around and watch your score go up. You’ll quickly realize certain obvious things - having parts of the protein overlap is bad (steric clash in scientific terms) and fitting things into empty space is good, but all you honestly need to do is look at the score and try to make it go up by whatever means necessary. Your scores are submitted back to the Baker lab website and compared against all other players’ scores. Whereas rosetta@home uses the idle time of thousands of computers, FoldIt! uses the idle time of thousands of people and their computers, making it potentially even more powerful.

In an email to students and faculty before his talk, Dave described FoldIt like this:

“We are developing a multiplayer interactive protein folding and design game for both education and research-our hope is that large groups of people interacting with computers and with each other through the multiplayer game may be able to solve hard optimization problems that neither computers nor people can solve alone. Please help us to test and improve the game! “

If we go back to the canyon analogy, rosetta@home is like having thousands of people walking the canyon so you can accomplish the task faster, but the manner in which they walk is still pretty systematic and similar. That is if there are valleys that look like a lot like the bottom of the canyon but aren’t actually (local minima), the similar nature of everyone’s walking manner will cause people to still end up there. In FoldIt! people are running all over the canyon - some are flapping their arms, some are walking backwards - a lot of energy and effort is wasted, but the naivety of the approach frees it from harmful biases and allows hidden paths that were never known before to be discovered. It’s entirely possible that analysis of human generated structures will reveal a few key rules that algorithms were missing entirely - these rules could be folded into new algorithms and would hopefully allow rosetta to generate a better model in the same amount of sampling.

I think FoldIt! is a great idea, and it’ll be interesting to see if it leads to any new ways to thinking about the folding problem. I think this sort of approach is good for areas where we’re still fundamentally unsure of the best ways to approach the problem - the utility might be more limited in fields where we have a pretty good idea of what we’re doing.

I should note that Dave did not present this as a 2.0 approach and was much more bullish on its education prospects (although he was hopeful for its research utility), and that this “naive 2.0 approach” is my own interpretation of his project, but I think the idea itself is something that might have real application in other fields.

To download FoldIt! go here, the game is not yet live (they’re aiming for early next year), but there are playable puzzles and a real time leaderboard.


What exactly is Folding@home doing? The Protein Folding Problem

November 28, 2007

As this blog reflects an intersection of my interests in technology and general science, I figured an ideal topic to start off with would be the Protein Folding Problem.

You’ve probably heard about the protein folding problem, although you may not have realized it. The most common references online to protein folding usually involve Folding@home (a distributed computing approach to the problem similar to SETI@home). Usually these references are entitled “Use your PS3 to cure cancer” or some other such overstatement. While there are no doubt real implications to understanding protein folding (including possible cancer therapeutics), such overstatements aren’t helpful in understanding exactly what you’re making your PS3 do.

So what is the protein folding problem? In the simplest terms it’s the manner in which a protein adopts its 3D conformation. If you can remember back to basic high school biology you might recall that the central dogma of biology is this:

DNA->RNA->protein

DNA is the set of instructions for making everything in a cell. RNA is an intermediate, essentially a specific subset of the instructions in your DNA that is then assembled into a protein (the biologists amongst you may take exception from this oversimplification, but for now we’ll stick with it) . You may know of proteins as they pertain to your diet, but in actuality proteins are much more- they are the molecular workhorses of the cell. Most of the chemistry and processes carried out in a cell are done by proteins. These include things like breaking down your food into energy, or recognizing viral particles. In the cell, proteins get stuff done.

Ok so what’s this whole folding business? Well proteins are composed of 20 basic building blocks called amino acids. String these amino acids together in a linear chain and you get a protein. In biology, these amino acids are represented by single letters; for instance DYKDDDDK represents an 8 amino acid protein starting with D and ending with K. Because amino acids are (essentially) the only components of proteins every protein can be represented by sequence of letters. Some proteins are very large (hundreds of letters) while others are small (like the 8 letter protein above) - given that there are 20 naturally occurring amino acid possibilities at each point and the unbounded size of the sequence there is an infinte number of potential protein sequences.

Now of course the human genome is finite, and you might recall we’ve gone ahead and sequenced the whole thing - and we’ve gotten pretty good at determining what parts of the DNA are actually turned into protein. So essentially we know what almost all the proteins in your cells are - at least on the level of these string-like representations. However your proteins do stuff based on their actual 3D structure. That is, when I write DYKDDDDK that doesn’t mean there’s a happy trail of letters in your body wandering about the cell and carrying out their business. Rather each letter represents a specific chemical structure. D for instance is Aspartic acid which looks like this:

DD

Y is Tyrosine which looks like this:

Tyrosine

You can string them together and see what DY looks like in 2D but of course the world is actually in 3D, and each of those oxygens, carbons, etc occupy a position in space relative to one another.

And therein lies the protein folding problem. How do you translate an amino acid sequence (i.e. DYDDDDK) into a three dimensional structure? Well it’s not an easy problem, as there aren’t a clear set of rules - the position of each part of each amino acid can be influenced by a whole host of factors including the amino acids neighboring it, the presence of water, and many other things. You can start to see that the problem begins to get very complex, much too complex for a mere human to think about.

Enter the computers. Looking at 3D structures of proteins that people have determined experimentally, there’s no clear set of rules (although some trends become evident) - however this is science and there are theoretically some set of axioms we started with (yes, even in biology). Things like the electric charge of these amino acids (things that are positive will be attracted to things that are negative, neutral things will want to pack in the interior and hide from charged water molecules) are known and can be used to calculate energy maps for specific conformations. You can do multiple iterations and find the most stable (i.e. lowest energy) conformation and use it as your predicted protein structure. This is essentially what Folding@home is doing (at least this is one approach to the problem, and the one I believe that Folding@home is taking). You can even do this for known structures and see how good your algorithm is at returning the known structure.

So why does anyone care what a protein looks like in 3D? Well the 3D structure is important for understanding the function of the protein and that’s a very important thing to understand. If you protein is a drug target (say an HIV protein) knowing its 3D structure could help you design a drug that binds a specific structure on the protein and thereby inhibit its function (i.e. viral replication). A lot of diseases are the results of misfolded proteins (cystic fibrosis, Mad Cow Disease) so understanding the folding process itself is an interesting thing.

So there you have it a basic introduction to the protein folding problem and one computational approach to it. There are other approaches as well, and I’ve definitely oversimplified things. If you’re curious I suggest the Folding@home website, wikipedia, or a simple google search to explore further.


Inner Life of the Cell Explained

November 27, 2007

If you’ve spent a lot of time on the internet and have a passing interest in biology, you’ve probably run across this video, “The Inner Life of the Cell”. While fellow science nerds and I have had fun identifying things like ribosomes translating RNA or microtubule dissociation, the MCB department at Harvard (which developed the video) has a less widely distributed version, which includes explanations of each animation. You can view the explanation video here by clicking on the “Inner Life: View the Animation” (the first picture). Flash or Quicktime is required.


Welcome

November 25, 2007

This is my second venture into the online blogopolis, although the first one lasted the length of the post. The idea behind this blog is post on some issues I find interesting - mainly basic biological research (i.e. biochemistry, recent Science or Nature papers) and general technology issues (i.e. Web 2.0, p2p, etc). While this may seem an odd combination, I hope that there are people out there who hold similar interests as these are both important developing fields that actually do face some similar challenges (i.e. patent reformation), and about which there is a lot of misinformation. I don’t pretend to be an expert on all these issues, but I will endeavor to post well thought out and supported ideas without reiterating other, more well written blogs. My first real post should be coming soon!