The Human Genome Project

Scientific Prison. Sydney Brenner remarked that the task of mapping human DNA could be like a vast prison for scientists. He joked that offenders could be allocated a "stretch" of DNA to mapits length depending on the severity of the crime. Minor offenses such as borrowing a colleague's bottles of enzyme without asking would merit a kilobase of DNA, while scientific fraud would attract a punishment of a whole chromosome to analyze. (p 62 in S. Aldridge 1996. The Thread of Life. Cambridge Univ. Press. ).

The Human Genome Project is best understood as the 20th century's version of the discovery and consolidation of the periodic table (Lander 1996). In the period from 1869 to 1889 chemists realized that it was possible to systematically enumerate all atoms and to arrange them in an array that captured their similarities and differences. The building blocks of chemistry were rendered finite, and the predictability of matter gave rise to the chemical industry and the theory of quantum mechanics.

The Human Genome Project aims to produce biology's periodic tablenot 100 elements, but 100,000 genes; not a rectangle reflecting electron valences, but a tree structure depicting ancestral and functional affinities among the human genes. The biological periodic table will make it possible to define unique "signature" for each building block. Just as chemists can recognize atoms by mass and charge alone, biologists will be able to build detectors that allow each gene to be recognized from 20 well-chosen nucleotides or each protein from a distinctive fragment. As Lander (1996) states, "We live in a time of breathtaking transitions in the biological sciences. Molecular genetics has spawned a new revolution every decade and has now brought us to the brink of a global vista on life."

Fundamental Principles of Genetics and Inheritance

  1. Living organisms are made of proteins. Humans have around 50,000 different proteins that can be classified into three groups: structural proteins (e.g. muscle, blood, connective tissue, skin, liver, etc.), enzymes (lactase), and hormones (e.g. estrogen, testosterone).

  2. Proteins consist of amino acids. The sequence, arrangement, and types of the 20 amino acids determine the type of protein. Just like there are only 26 letters of the alphabet from which thousands of words are created, different combinations and sequences of the amino acids form the large number of proteins. The average protein is made up of around 500 amino acids.

  3. Genes determine the type of protein synthesized. Genes are the unit of heredity encoding the information needed to specify the amino acid sequence of proteins and hence of particular traits. Humans have around 50,000 to 100,000 genes.

  4. DNA (deoxyribose nucleic acid) contains the genes of an organisms. A gene is a segment of DNA located in a particular place on a chromosomethe structure upon which genes are carried in cells. Humans have 23 pairs (46 total) of chromosomes. Like a language code such as the Morse code consisting of dots and dashes used to encode certain letters, genes consists of a combination of base pairsguanine (G), cytosine (C), adenine (A) and thymine (T) to encode for amino acids and ultimately for proteins. DNA is used to code RNA (two typesmessenger and transfer RNA) for plugging proteins into appropriate sites. The base uracil (U) is substituted for thymine (T) in RNA.

    Identification and Evaluation of Genes

  5. The universal genetic codetriplets of bases. An RNA strand consists of a sequence of 3-base groups or triplets that indicate: (i)when to START a new protein (this is the first triplet, referred to as a codon, in the sequence and designates the start of a new gene), (ii)the type and order of amino acids to plug in, and (iii)when to STOP (end of gene). There are a total of 64 3-way combinations of G, C, A and T (so-called triplets) such as UUU, CAU, UAA and so forth. Each of these triplets are codes for different amino acids (although some redundancy). For example UUU codes for the amino acid phenylalanine, CAU codes for histidine, and UAA is a STOP code. It is estimated that the human genomethe sum total of DNAconsists of 3 billion base pairs. One of the most profound arguments that all organisms evolved from a common ancestry is that all species use the same genetic code. Examples of the codes include the following:
    UUU Phenylalanine
    UUA Leucine
    UUG Leucine
    AUU Isoleucine
    AUG START
    UAA STOP
    AAG Lysine
    GAU Aspartic acid
    GAG Glutamic acid
    UGA STOP

    Note that UUA and UUG are redundant codesthey both encode for the amino acid leucine.

  6. Mutation. An insertion, a deletion or a change in any base in a specific piece of DNAa mutationcan disrupt the production of a protein coded for, stop the protein from being produced, or upset the normal function of the protein produced.

  7. Genomic abnormalities include chromosomal and genetic. Chromosomal abnormalities are of two main types:

    1. numerical--can be either an addition or a reduction in the normal chromosome number. Down syndrome is caused by three copies of chromosome 21 being present instead of the usual two.

    2. structural--changes in chromosomal structure.

    Genetic abnormalities are either monogenic or polygenic, the former is single gene diseases (e.g.Tay-Sachs, sickle-cell anemia) and the later is the term used to describe diseases caused by multiple genes (e.g. most cancers). About 3,000 genetic disorders have been identified (Dawson 1996). Genetic tests for diseases such as cystic fibrosis, breast cancer, colon cancer, and sickle cell anemia are being developed or are already in use.

There has been a rush of biogenetic firms for marketing genetic tests for ever-increasing number of gene-related disorders. In the future a diagnostician will turn to say page 250 of Chapter 10 and, comparing it to your genomic readout, see how a misspelling or typographic error in your DNA might give your family HCM (hypertrophic cardiomyopathy--thickening of the heart muscle). Your DNA is divided into introns which are noncoding stretches of filler and exons which are the protein coding regions. Within the exons are control elements which are special DNA sequences which modulate the duration, the amplitude at the area (tissue site)--like the liver, brain or heart--of protein expression. Also within the exons are the codons--sequential triplets of DNA bases that specify which of the body's 20 amino acids will be added next to the long chain of amino acids that make up different proteins. Patients who have misspellings at codon 403 get one wrong amino acid in the chain of thousands. That is what a mutation is. Each genome is filled with misalignments. Some never manifest themselves, some cause allergies. But a misspelling at 403 can kill you, misshaping the head of the myosin protein in such a way as to impair the interaction of the main heart proteins--myosin and actin--causing the heart second by second of every day over the course of a life, to improperly contract and thicken. (from C. Siebert, New York Times Sunday Magazine, September 17, 1995).

MOLECULAR BIOLOGY

Molecular biology has emerged during the last decade as one of the most profound developments in the biological and biomedical sciences. Study of life at the molecular level is creating a basic epistemological shift in biological research from an approach that is hypothesis-driven to one that is discovery-driven. Broad acceptance of this new strategy is having a major impact on how scientific research is funded and conducted at national and international levels. The fundamental approach of molecular biology is the study of life through genomics that is leading to the creation of a universal periodic table of life that will reflect common genetic properties and patterns of ancestral and functional affinities among the genes of both plants and animals, thereby unlocking the record of 3.5 billion years of evolutionary innovation. Comparison of related organisms will reveal regulatory regions and key architectural features of proteins that can be used as Rosetta stones for translating and understanding informational pathways and for deciphering biological complexity. Ultimately the field is leading to the creation of global tools of genomics that will revolutionize the medical, health, environmental and agricultural sciences.

The transition of molecular biology from a research "tool" to a transformational concept began in the mid-1980s with the creation of the Human Genome Project (HGP). Although the original objective of the HGP was to create a sequence map for humans, it quickly became clear that the same approach could benefit from knowledge of the genomic sequences of model organisms such as bacteria, yeast, nematodes, Drosophila and mice. The field of infomatics began to emerge as the demands for storing, processing and analyzing the enormous amounts of sequence data rapidly increased.

Genomics research is reframing biology as an informational science concerned with how to decipher and manipulate information classified into one-of-three types: (1)one-dimensional digital code of genes and chromosomes, the new science of which is referred to as genomicsthe study of many genes; (2)three dimensional protein (so-called folding problem) which catalyze life as well as give it shape and form, the new science of which is called proteomicsthe study of many proteins; and (3)four-dimensional complex systems and networks such as the brain which involves emergent properties of memory, consciousness and the ability to learn. This new area is referred to as systems biology and is concerned with identifying elements, determining function, correlating expression, and defining subsystems. Deciphering information classified according to this scheme provides a framework for intervention through one-of-three types of genomic controls: (1)transcriptional control between the genomic sequence and mRNA; (2)translational control between mRNA and the protein product; and (3)post-translational control between the protein product and the functional protein product. The importance of this framework is that it underscores the concept of information flow from genome through system as well as provides epistemological continuity for designing experiments and analyzing data from a wide range of model species. Indeed, the concept of comparative genomics is based on the unity of life; that the informational pathways of different organisms have shared processes, genes, regulatory regions, and even chromosome functions. For example, Stanford biologists discovered that a set of highly conserved proteins is encoded by a minority of genes in two organisms with their genomes completely sequencedthe nematode and yeast. These genes carry out the core biological processes shared by these two eukaryotes including intermediary metabolism, DNA and RNA metabolism, protein folding and degradation. It is likely that these same core processes are conserved in mammals including humans.

IMPORTANCE OF MOLECULAR BIOLOGY AND GENOMICS

Both the short and long term importance of sequence-based biological research is profound. First, the agriculture of the future will be based on precisely honed genetic fitness of the domestic ungulates such as sheep, goats, cattle, pigs, of poultry including chickens and turkeys, and the main grasses (wheat, rice, maize), crucifers and legumes. Knowledge gained in a few major crops, ungulates or poultry species can be pooled and applied across the board. This is because the order of genes in most of the related species is conserved. Thus genetic engineering in one species for resistance to disease, to parasite and/or insect attack, for rapid growth, or for quantity and quality of product can be used in the other related ones. Advances in one species will have a multiplier effect.

Second, the conventional quantitative genetic or artificial selection approach to plant and animal breeding will be replaced by the use of genetic engineering and cloning. This will change both the quantitative and qualitative traits of domestic plants and animals. Quantitative traits can be perpetuated through cloningdairy cows with the highest milk production, beef cattle with the greatest rates of gain, and sheep with the most prolific growth of woolcreating herds of hyper-producers. But no species of plant and animal possesses genes to produce all proteins or amino acids. Thus qualitatively different traits can be genetically engineered such as blue roses, rice varieties with maize-type proteins, and cows that produce goat's milk. These types of plants and animals could never be created through selection since their genomes do not have genes to produce these proteins or traits.

Third, new biomedical and agricultural technologies are emerging, many of which did not even exist in concept several decades ago. These include the use of transgenic domestic animals such as cows, sheep, and goats as bioreactors (pharming)living factories that produce therapeutic human proteins using a promotor which directs the expression of these protein to the mammary glands for milk-derived proteins (e.g. insulin, hepatitis B vaccine) or to their blood (e.g. haemoglobin). Another emerging area is xenotransplantationthe harvesting of organs (kidneys; hearts) from transgenic animals such as pigs for transplantation into humans. Molecular approaches are being used in an attempt to discover a protein which will "cloak" the foreign organ in the human body to prevent rejection.

Fourth, comparative biology will help to identify enhancements for humans as well as for domestic plants and animals that lie within the realm of possibility through their existence in other living creatures. The enhancements for humans might include hearing (bats), olfaction (canines), vision (falcons), and oxygen efficiency (crocodiles) and for domestic animals could include the incorporation of genes from selected wild species for rapid growth, high speed, milk and eggs with high protein content, large or small size, docile or aggressive behavior, intelligence or search specificity. Once genomic sequences are known and gene functions specified, genetic engineering for a host of enhancements may become routine.

Fifth, future ethical, legal and social efforts will require acute scientific vision to anticipate the problems and propose safeguards. For example, individuals will be faced with the choice of whether to obtain global views of their own genomes and the need to interpret the information. Genomics research has implications for the genetics of intelligence, propensities for substance abuse and addiction, for homosexuality, for risk taking and for impulsive and violent behaviors. Issues surrounding the privacy of genetic information, genetic counseling such as for preimplantation diagnosis of embryos, for therapeutic abortion, and for germline engineering will pose important ethical and legal problems for social scientists and family planning specialists. Economists will be increasingly concerned with the implications from natural resource economics, to marketing and development and international trade, textile scientists with designer fabrics, and environmental scientists with toxin-eating bacteria. In short, all aspects of academia will be impacted by this new biology.

Future Goals of Genomics (after Lander 1996)

The current goals of the Human Genome Project include: (1)developing a physical map of the chromosomes; (2)sequencing; and (3)determining gene function. The future goals include the following:

  1. Identify Animal Model-Human Sequence Similarities. Often the putative function of a newly isolated human disease gene is revealed by its sequence similarity to a well-studied gene in another organism. Used in the creation of transgenic animals with defective genes. Need to create animals that develop sickle cell anaemia, haemophilia and Alzheimer's disease to testing gene therapies for these diseases.

  2. Systematic identification of all common variants in human genes. The human population has vast genetic diversity. Yet human diversity is also quite limited in that most genes have only a handful of common variants in their coding regions. Examples suggest that common variants may hold the secret to many disease susceptibilities. This is to be distinguished from the Human Genome Diversity Project which seeks to identify rare variants in far-flung populations in order to reconstruct human evolution and migration.

  3. Rapid de novo sequencing from other organisms. Comparative DNA sequencing will unlock the record of 3.5 billion years of evolutionary experimentation. It will not only reveal the precise branches in the tree of life, but will elucidate the timing and character of major evolutionary innovation. Comparison of related organisms will directly reveal regulatory regions and key architectural features of proteins. Sequence differences hold the key to understanding how nature generates such a diversity of form and function with such an economy of genesproducing elephants, gazelles, mice, and humans from the same basic mammalian repertoire of genes.

  4. Increased attention to ethical, legal and social issues. As genetic readouts increase in power and decrease in costs, the potential for intrusive applications will skyrocket. Future ethical, legal and social efforts will require acute scientific vision to anticipate the problems and propose safeguards. Individuals will be faced with the choice of whether to obtain global views of their own genomes and the need to interpret the information.

Issues arising

  1. Genetic testing. This will enable us to determine whether people (or fetuses) are likely to develop any of a wide range of genetic illnesses. Is there any limit that ought to be placed on the treatment of genetic conditions? Good genes replace bad genes. When should "good" genes be used to replace "bad" genes? For example, bubble baby gene replacement is good. However, what about genes for height or intelligence?

    "A genetic revolution has been occurring in biology for several decades, and it is rapidly affecting the population at large. The development of the Human Genome project is fueling the revolution, accelerating the discovery of new genetic connections to old human problems. First, the targets will be the genetics of disease, then the genetics of deviant behavior, and then, as many feel is likely, the genetics of human enhancement." (from Conrad, P. 1996. Growing concerns, Science 274, 1147).

  2. Screening adult-onset diseases. Consider screening in utero for diseases that do not manifest until many years after birth. Should we be screening for Huntington's disease in utero? Is it fair for the parents to make a decision about aborting a fetus because that fetus may have a crippling and ultimately fatal disease 45 years after its birth? Do you want the child to possess this potentially explosive information? How does it affect a person to know that he or she will suffer from this cruel disease sometime later in life?

  3. Liability. There are a great many potential liability issues that arise out of the availability of genetic testing. First, physicians who know of the existence of genetic screening and do not offer it to the patient may be legally liable, just as they would be liable for failing to provide a nongenetic diagnostic tool. Second, physicians could be liable for revealing confidential genetic informationor for not revealing it. A genetic screen may indicate that the patient's siblings are at high risk of some genetic condition. Should they be informed of that risk even though that would be a breach of confidentiality? Or if they do not tell the third party, is this a breach of obligation?

  4. Genetic Screening by Employers. Employers may seek to do a genetic screen of their employees to ensure that those that are hyper-susceptible to some risk do not suffer exposure. However, screens could be used by employer to hire only persons unlikely to be subject to any illness. Ultimately preemployment genetic screening will effectively eliminate some people from the job market.

  5. Educational Screening. Some skills (mathematical ability) are partly (and perhaps largely) genetically based. Should genetic information be used to track studentstrade school versus university? How is the use of genetic information any different from the use of IQ? What is the proper genetic make-up of a doctor? Do we want more scientific brilliance or compassion? That is not a scientific question but a social policy question. Might we track the wrong people into medical school?

  6. Criminal Justice. Genetic information could be valuable in the criminal justice system beyond DNA fingerprinting. Some criminal defendants may have a genetic propensity to commit certain kinds of crime. Is the genetic predisposition of a person admissible in an individual prosecution? Even if genetic evidence were not admissible at the guilt stage, it may make sense to introduce it at the sentencing phase. But should the fact that one is genetically disposed to commit a crime result in a longer sentence (because cannot be rehabilitated) or in a shorter sentence (the genetically driven convict is less culpable and thus less deserving of punishment). A genetic propensity might be used as basis for denying bail. If genetic tools make such predictions easier, should this society be more willing to allow for pre-crime detention? If we know someone is likely to commit a crime, why not arrest that person before he or she actually commits the deed?

  7. Life Insurance. Insurance is a bet and to be fair, it requires that both sides have the same information. The American health insurance system is not really a system to insure against unpredictable and catastrophic risks, but rather a method for prepaying medical expenses, then treating those with potentially costly genetic conditions less favorably may be inconsistent with the cost-leveling function of the system.

  8. Big Brother. Should government be permitted to maintain a bank of genetic profiles as they do now with fingerprints? This is a powerful form of identification. However, this extraordinary amount of information about each one of us also give the government and whoever else has access to it, extraordinary amount of power over us.

LITERATURE

Anderson, W. F. 1998. Human gene therapy. Nature, 392 (supp):25-30.

Dawson, K. 1996. Genetics: a scientific sketch. Pp5-12 in: Birth to Death. D. C. Thomasma and T. Kushner (Eds.). Cambridge University Press, Cambridge.

Schwartz, R. 1996. Genetic knowledge: some legal and ethical questions. Pp21-34 in: Birth to Death. D. C. Thomasma and T. Kushner (Eds.). Cambridge University Press, Cambridge.

Lander, E. S. 1996. The new genomics: global views of biology. Science 274:536-539.

Schuler, G. D. et al. 1996. A gene map of the human genome. Science 274:540-546.

Table 1. Number of megabases (mb; mega=million) in each of the 22 (plus X and Y) human chromosomes (Chr) and example diseases or traits associated with genes on each. (from Schuler, G. D. et al. 1996). For example, chromosome #1 has approximately 236 million based (i.e. Adenine, Cytosine, Guanine or Thymine)

Chr mb Example

1 236 Gaucher disease; Alzheimer's
2 255 some colon cancers; Waardenburg syndrom (deafness, etc.
3 214 one type of lung cancer
4 203 Huntington's disease; Ellis-van Creveld syndrome (6-fingered dwarfism)
5 194 diastrophic dysplasia; plant homolog of human steriod
6 183 juvenile onset diabetes
7 171 association with cystic fibrosis
8 155 Werner's syndrome (premature aging); Burkitt lymphoma
9 145 melanoma associated with mutation; tuberous sclerosis
10 144 multiple endocrine neoplasia; gyrate atrophy of eye retina
11 144 multi-disease system w/predisposition to cancers; cardiac arrythmia
12 143 Zellweger syndrom; susceptibility to phenylketonuria
13 114 Wilson's disease (basal ganglia of brain); breast cancer
14 109 Alzheimer's disease associations
15 110 Marfan syndrome
16 131 adult polycystic kidney disease
17 92 many mutations associated with early-onset breast and ovarian cancer
18 85 loss of DPC4 gene causes pancreatic cancer to grow rapidly
19 67 myotonic dystrophy; coronary artery disease
20 72 severe immunodeficiency caused by missing enzyme, adenosine deaminase
21 60 Lou Gehrig's disease
22 56 Neurofibromatosis; DiGeorge syndrome
X 164 mental retardation; Duchenne muscular dystrophy
Y 59 testis-determining factor

Return to the Course Syllabus