Human Genome Project

Why was Human Genome Project Started?

The Human Genome Project (HGP) was an international scientific research endeavour that attempted to determine the base pairs that make up human DNA and map them for their functions and corresponding attribution to physical features. It is the world's largest collective biological project whose planning began in 1990 and it was declared complete on April 14, 2003. It achieved the "whole genome" sequence in May 2021.

It was a 13-year collaboration between the US Department of Energy and the National Institutes of Health. In the early years of the HGP, the Wellcome Trust (UK) was a significant partner, with additional financing coming from Japan, France, Germany, and China, among others.

Objectives of the Human Genome Project

i) It was responsible for identifying all 20,000-25,000 genes found in human DNA

ii) It determined the sequences of the 3 billion chemical base pairs that comprise human DNA.

iii) It led to the creation of a database of human genome pairs and used data analysis tools for better understanding

iv) It transferred pertinent technologies to other businesses

vi) Addressed any ethical, legal, or societal issues that might arise as a result of the effort.

How can DNA sequencing help?

Learning about the DNA sequences can provide insight into an organisms’ intrinsic capabilities which can be applied to problems in health care, agriculture, energy generation, and environmental remediation, as well as providing hints to human biology.

How was the sequencing of DNA done for Human Genome Project?

The Human Genome Project used 2 methods for sequencing:

1. Expressed Sequence Tags were used to identify all genes that expressed as RNA in a unified manner (ESTs).

2. Random Sequencing: The alternative method involved randomly sequencing the entire genome, including both coding and non-coding sequences, and then assigning functions to various segments of the sequence - a term referred to as Sequence Annotation.

How does DNA Sequencing work?

The entire DNA sequence of a cell is split and turned into random segments of comparably smaller size before being cloned using special vectors. This is done because DNA is a very long polymer, and sequencing extremely long sections of DNA presents technical challenges.

The cloning led to the amplification of each piece of DNA, allowing for rapid sequencing.

Bacteria and yeast were the most often used hosts, and the vectors were referred to as BAC (bacterial artificial chromosomes) and YAC (yeast artificial chromosomes), respectively (yeast artificial chromosomes). The fragments were sequenced automatically using DNA sequencers that adhered to Frederick Sanger's guidelines. It's also worth noting that Sanger is credited with creating a technique for determining the amino acid sequences of proteins.

Following that, these sequences were ordered according to the presence of some overlapping sections. This necessitates the production of overlapping pieces for sequencing using highly specialised computer programmes that have been developed especially for the purpose.

The sequences were then annotated and chromosomally allocated. The sequence of chromosome 1 was completed only in May 2006. (this was the last of the 24 human chromosomes 22 autosomes and X and Y- to be sequenced).

Another difficult challenge was resolving the genetic and physical mappings on the genome. This was generated using information about polymorphisms in restriction endonuclease recognition sites and microsatellites, which are repeating DNA sequences.

Findings of Human Genome Project :

The human genome has 3164.7 million nucleotide bases. While the typical human gene is 3000 bases in length, dystrophin, the longest known human gene, is 2.4 million bases in length.
The estimated total number of genesis is 30,000, a considerable decrease from prior estimates of 80,000 to 1,40,000 genes. Almost all individuals have almost identical nucleotide bases (99.9 per cent).
More than half of the genes discovered have uncharacterized functions.
Less than 2% of the genome codes for protein production
Repetitive sequences comprise a sizable portion of the human genome. Repeated sequences are DNA sequences that occur multiple times, sometimes hundreds or thousands of times. Although they are not thought to have direct coding functions, they provide information about the structure, dynamics, and evolution of chromosomes.
Chromosome 1 has the greatest number of genes (2968), while chromosome Y has the fewest (231).

What are some future applications of Human Genome Project?

Utilizing DNA sequences to generate pertinent knowledge will drive future studies, resulting in a greater understanding of biological systems. This enormous task will require the talents and ingenuity of tens of thousands of people.

Thousands of scientists from a variety of disciplines operate in both the public and private sectors worldwide. Possessing the Human Genome sequence has the potential to offer a whole new approach to biological research.

Previously, researchers concentrated their efforts on a single or a few genes at a time. Now that we have whole genome sequences and new high-throughput technologies, we can address challenges on a systematic and much greater scale.

They can examine all of the genes in a genome, or all of the transcripts in a particular tissue, organ, or tumour, or they can examine how tens of thousands of genes and non-proteins collaborate in interconnected networks to orchestrate life's chemistry.

The Human Genome Project has also helped in other genome sequencing projects such as HGP Write, the HapMap Project, 100,000 Genome Project, Genome India project among others.