Bio-Iinformatics
By: David • Essay • 1,655 Words • November 18, 2009 • 1,309 Views
Essay title: Bio-Iinformatics
The completion of a "working draft" of the human genome--an important milestone in the Human Genome Project--was announced in June 2000 at a press conference at the White House and was published in the February 15, 2001 issue of the journal Nature.
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.
What Is a Biological Database?
A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence.
For researchers to benefit from the data stored in a database, two additional requirements must be met:
* easy access to the information
* a method for extracting only that information needed to answer a specific biological question
The data in GenBank are made available in a variety of ways, each tailored to a particular use, such as data submission or sequence searching.
At NCBI, many of our databases are linked through a unique search and retrieval system, called Entrez. Entrez (pronounced ahn' tray) allows a user to not only access and retrieve specific information from a single database but to access integrated information from many NCBI databases. For example, the Entrez Protein database is cross-linked to the Entrez Taxonomy database. This allows a researcher to find taxonomic information (taxonomy is a division of the natural sciences that deals with the classification of animals and plants) for the species from which a protein sequence was derived.
What Is Bioinformatics?
Biology in the 21st century is being transformed from a purely lab-based science to an information science as well.
Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.
Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:
* the development and implementation of tools that enable efficient access to, and use and management of, various types of information
* the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences
Why