Login

sumeet.1889 · 08-16-2017, 10:17 PM

BIOINFORMATICS

ABSTRACT

Bioinformatics is a newly emerging interdisciplinary research area which is defined as the interface between biological and computational sciences. Bioinformatics itself is not a well-defined term, so we can say; this deals with the computational management of all kinds of biological information, whether it s about genes and their products, whole organisms or ecological systems. Bioinformatics work as to gather, store, analyze and integrate biological and genetic information that is applied to developmental biology, evolutionary Biology, or to gene based drug discovery.

In last few decades, advances in molecular biology have allowed the increasingly rapid sequencing of large proteins of the genomes of several species like Bakers yeast. The human genome project, designed to sequence all 24 human chromosomes is also progressing, sequence databases, as Gene-bank and EMBL is growing at exponential rates. The most pressing task in bioinformatics involves the analysis of sequence information. Computational biology - the name given to this process. Set of genetic instruction, making an organism its genome is contained in long thread like DNA molecules packed into chromosomes. These sequence of chemical units in the DNA is kind of code specifying the structure of protein molecules, which carry out most of the functions of living cells.

DNA sequencing, sequence alignment, biological database and retrieval systems are discussed in the preceding chapters. Another aspect of bioinformatics is DNA computation and formation of bio-chip. Guinness world record recently (2003) recognized a computer that performs 66billion operations/second with 99.8% accuracy. The area of 1c.c can contain 10 trillion DNA molecules and with this amount the computer can hold 10 terabytes of data and can perform 10 trillion calculations at a time.

AN INTRODUCTION TO BIOINFORMATICS

Biology in the 21st century is being transformed from a purely lab based science to an information science as well.

Bioinformatics is a new global wave. Simply speaking bioinformatics is the management, analysis and interpretation of biological data. It is a convergence of plant and animal science, mathematics and information technology. Bioinformatics is application of, computer science, mathematical and statistical tools in the area of life science.

Thus bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable a global perspective from which unifying principles in biology can discerned at the beginning of the genomic revolution a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues, but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.

Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease state. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequence, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology .

Important sub-disciplines within bioinformatics and computational biology include:
- The development and implementation of tools that enable efficient access to, and use and management of, various types of information; and
- The development of new algorithm (mathematical formulae) and statistics with which to assess relationship among members of large data sets, such as methods to locate a gene within a sequence, protein structure and/or function, and cluster protein sequences into families of related sequences.

The Guinness world records recently (2003) recognized a computer that can perform 66billion operations a second with 99.8% accuracy, more than 100,000 times the speed of the fastest pc as the smallest computing device ever constructed. This amazing speed was feasible, thanks to DNA.

DNA Computation-THE ENIGMA.
DNA computing in the literal sense is the use of DNA (Deoxyribose Nucleic Acid) molecules, molecules that encode genetic information for all living beings, in computer. This is accomplished in a suspended solution of DNA, where certain combinations of DNA molecules are interpreted as a particular result to a problem encoded in the original molecules present. DNA computation relies on devising algorithms that solve problems using the encoded information in the sequence of nucleotides that make up DNA s double helix-the bases Adenine, Guanine, Cytosine, and Thymine (A, G, C, and T, respectively) and then breaking and making new bonds between them to reach the answer, DNA computing is one of the fastest growing fields in both Computer Science and Biology, and its future looks extremely promising.

In the fore-front of this industry are Israeli scientists from the Weizmann Institute of Science in the Rheovot who developed this molecular computing machine composed of enzymes and DNA molecules instead of silicon microchips. The DNA computer is also self powered i.e. the new design made of -and fuelled by DNA. The scientists discovered that single DNA molecule can yield all the energy needed to run a computation. The machine is so small that a tiny droplet could hold up to three trillion of these DNA computers, in total performing 66 billion operations a second.
Significance of DNA computing:
The computer became the first programmable autonomous computing in which the input, output, soft-ware, hard-ware were all made of DNA computer can work without an external energy source.

Conventional electronic computers process information as electrical impulses through circuits etched onto silicon chips, but technology is approaching the physical limits of miniaturization. Technologies except that sometime between 2010 and 2015, the long march of Moore s Law- which states that computing power doubles every 18 months or so- will come to a sudden halt. This is where DNA computers will play vital role. The new computer design uses naturally occurring enzymes as the hard-ware . Each computational step requires two complementary DNA molecules one that performs an input and other that performs a soft-ware role. DNA computers have the potential to take computing to new levels, picking up where Moore s Law leaves off.

Advantages to using DNA instead of Silicon:
As long as cellular organisms, there will always be a supply of DNA makes it a cheap resource. Unlike the toxic materials used to make traditional microprocessors, DNA biochips can be made cleanly. DNA s key advantage is that it will make computers smaller than any computer that has come before them, while at the same time holding more data. Speed (100 *faster than a fast supercomputer). The energy efficiency (Computers built by humans waste about a billion times more energy per operation). More than ten trillion DNA molecules can fit into an area no longer than one cubic centimeter (0.06 cubic inches). With this small amount of DNA, a computer would be able to hold ten terabytes of data, and perform ten trillion calculations at a time.

THE DOMAIN OF COMPUTATIONAL BIOLOGY
Dealing with the biological entities which are represented in mathematics and computer parlances, bioinformatics made the job easy for the scientists and researchers. It gave new dimensions with the help of which man can t even think of where actually he is going to land. Bioinformatics is an excellent tool in the development of biology.. Here the importance of internet too comes into existence by the help of which many resources can be known. In very simple words we can say that the hard labor of scientists has been eased out by the help of the tools via internet.

Sequencing: -
Genome in just the sequence of all DNA and is a complete set of determined DNA sequence of the genetic material of the particular organism. There are several tools and techniques that involve the study of genome. Each of the tools and techniques so employed is based on some standard principles and fundamental percepts.
DNA sequencing means determining the number of nucleotide sequence of a DNA strand, the strand is labeled at one end and then spitted into one of the nucleotides. The fragmentation is done by the help of electrophoresis gel to find the length of the sequence and the presence of each nucleotide. Automated gel (florescent color emitter) based sequencing technology are also available to sequence the chromosomes and this process of detection is faster, quiet
accurate and economical. In the total sequence of the nucleotides one single line denotes the information about the protein created by amino acid sequence.
Manual sequencing is done by the help of the following steps:-
1. Single strand of DNA is prepared which is to be sequenced.
2. Template DNA is supplied with
a. mixture of all four nucleotide in fixed quantities( DNA polymerase)
b. Mixture of all four di-deoxynucleotides.Finally the mixture is DNA polymerase +diNTPS for A, G, T, C.
3. The chain elongation proceeds normally until by chance DNA polymerase inserts a di-deoxynucleotide instead of deoxynucleotide. If the ratio is high then some DNA strands add several hundreds of nucleotides before the process come to halt.
4. After the reaction period, fragments are separated by length from longest to shortest.
Resolution is so high that the difference of one nucleotide is enough to separate the strand from the next shorter or next longer strand.

.

It is important for bioinformatics professional to know the means to identify the similarities and differences in sequences. There are several means to determine the similarities and homologies of the various sequences.
HOMOLOGY SIMILARITY
1. statement about evolutionary history 1. not so
2. two or more sequences have a common ancestors 2. two or more sequences are similar by same criterion
3. sequences are compared and it s results have profound impact 3. sequences are just compared by some method and is logically weaker

Alignment: -
It is a hypothetical concept of positional homology between bases or amino acids. It represents evolutionary relationship between the protein sequences by placing them side by side. There are co dons that encode the starting and ending of the sequence. Alignment is correct if the events in historical past are represented. Correct alignment means the co dons matching with that of the ancestral proteins. Correct sequences can be reconstructed to get the ancestral sequences and indicates the substitutions, insertions and deletions.
This is a standardized approximation where we check the alignments through several ways. It is known to us that mutation in a gene could change the protein or gene in several ways

1. Elimination of certain gene could produce a GAP in the sequence
2. Introduction of a new DNA could produce a new sequence where we can address it as an INSERTION.
3. Substitution of any DNA could also change the sequence.

.
Optimal alignment:
There is no rule for optimal alignment so there s no such thing a single best alignment is possible. This forms the base line and the basis of which the sequences are compared. We can categorize the alignment into two forms:
1) Global alignment
2) Local alignment.
Global alignment assumes that the alignment of two proteins or genes is basically similar to the entire length of their corresponding sequences to each other, from one end to other.
Local alignment doesn t work under any assumptions. Entire sequence of the corresponding proteins or genes searched and attempted to match segments out of sequences. No attempt is made to force a sequence the alignment. It merely compares the part that has good similarity.

Multiple sequence alignment is a mathematical model and is resorted to for the following reasons:
1) Generation of concise information
2) Illustration of dissimilarity between groups of segments. Multiple alignments arrange a set of sequences in a scheme where position are believed to be homologous are written in a common column and all similar sequences can be compared in one single figure or table.

There is an important biostatic tool that approximates the biological event with the mathematical formulae and models. This tool is called Substitution matrix. These are of three types which are widely used-a) Point aligned matrix b) Blocked substitution matrix c) Gonnet.

.

All biological sequence databases work on the percept of similarity tables either explicitly or implicitly. These databases make use of number of biological parameters: evolutionary model, structural properties and chemical properties such as charge, polarity, shape.
1.Alignment of pairs of sequences
Shows how much each pair is related
However it should be noted that each of the algorithm model is specific in application, utility, and environment. Thus, one needs to exercise judgment in choosing the algorithm and model.

4. List of common sequence changes

Used to improve sequence alignment
2. Alignment of a group of sequence
Shows which of the position correspond to one another in the sequences

3.List of alignment scores and changes in each position
Used to predict phylogenetic relationship amongst various sequences

Biological Databases:
A biological database consists of a large value of biological data which is organized in a consistent pattern. A database comprises of one or more files, each file has many records, and each record has same set of data fields and contains the same type of information in them. There are several types of biological database. Each of it has own purpose data elements and features. For the researchers point of view the data so stored in the databases needs to be easily accessed through standard means and the results so obtained should be free of unwanted data. Most of the databases organize their data by gene sequence, sometimes it varies. Classification of Biological Databases lists a few databases and depicts one of the methods of classification of biological databases:

1) Fundamental type of biological data:
a) Nucleotide sequences b) Protein sequences
c) Protein sequences patterns d) Macromolecular 3D structure
e) Gene expression data f) Metabolic pathways.

2) Primary / Derived data:
a) Primary databases-
The experimental results are fed directly into the database and stored in it where data isn t verified for authenticity.
b) Secondary databases-
Data elements are stored in the database only on verification of its authenticity. c) Aggregate of many databases-links to other data items, combination of data, and consolidation of data.

3) Technical design:
a) Flat files
b) Relational database (SQL)
c) Objected-oriented database
d) Exchange/ publication technologies (FTP, HTML, CORBA, XML ).

Retrieval system:
It s very important block in database. It s a user interface and helps the user to retrieve data from the database through user friendly, easily understandable query formats. Sequence Retrieval System (SRS) and Entrez are examples of these databases. They are web-oriented systems, which have a well defined web interface for integrating
Heterogeneous databases. In many cases, biological databases are multitudes of heterogeneous databases which are interlinked. Search and retrieval of data in these databases is based on the principle of pre-made indexes. These are a set of all possible /relevant items that are found in the documents of that database.
There are various approaches for searching a data:
a) Using keywords
b) Through accession numbers and identifiers
c) Through reference of literature.

Data in bioinformatics is basically in the form of records having a number of fields. The very aim of storing data on computer databases is to facilitate searching. Typically every record has two names, each following a particular set of rules:
a) Accession code- its basically a number which have combination of numerals and alphabets
b) Identifier- is a generic string of letters and digits which is easily interpretable in some meaningful way by the human, for instance as a recognizable abbreviation of full protein or are name of gene
c) Sequence submission
CONCLUSION

The year 2003 marks the 50th anniversary of DNA discovery. Now, standing at this era of computers we can easily say it s the dusk of computer era which waits the dawn of DNA era. From all the previous discussion we can conclude that the advancement made in the field of microbiology, biotechnology, genetics and use of computational techniques in the management of all kind of biological information and also the storage, organization , and indexing of sequence information leads to the formation of the interdisciplinary research area known as bioinformatics. We can infer from all the previous discussion that bioinformatics has rendered studies, involving sequence, easy. One needs to understand the methods by which biological entities, events, tests and the like are represented in mathematical and computer parlances. Bioinformatics is the new global wave. Drug firms like SmithKline Beecham, Pfizer etc are positive about contribution of bioinformatics in drug research area. Companies of agricultural produce like Monsanto, Cargill seeds etc are utilizing in the plant sciences domain. IT companies like IBM, sun micro system, wipro etc anticipate that bioinformatics will be an important revenue domain for them.

Bibliography

Search Engine: google, AltaVista, Lycos, hotbot.
Website: bioinformatics .org, lmb.unimuenchen.de,ornl.gov,ucsc.edu,ncbi.nlm.nih.gov,pnas.org.
Magazine: Science reporter, Bioinformatics.
Newspaper: Science articles of The Telegraph, The Statesman, and Hindustan Times.
References: Michael Paul Stewart publications, bioinformatics at Chalmers,

nithin balagopal.a · 08-16-2017, 10:17 PM

[attachment=3050]

Content
What is bioinformatics.
What is computational Biology.
Data mining.
Application of data mining.
Net accessible resources.
Sequence Analysis.
What can be Done with sequence Analysis.?
Identification of protein primary sequence from DNA sequence.
Tips for searching Database.
The process of Evolution.
Principle and their Importance.
Conclusion.

What is Bioinformatics.
Bioinformatics describe any use of computer to handle biological information. In practice, the definition used by most people in narrower, bioinformatics to them is a syononym for computational molecular biology, the use of computers to characterize the molecular components of living.
What is data mining.
Data mining is the process by which testable hypothesis are generated regarding the function or structure of gene or protein of interest by idenfenite similar sequence in better characterized organism.
Application of data mining:-
Include fraud detection, credit card scoring and personal profile marketing. Skillful interpretation of data can enhance customer relation, direct marketing, trend analysis, financial market forecasting and international criminal investigations.

Net accessible resources:-
Two main world wide web sites provide information on data mining:-
The data mine: This includes pointers to FTP-able papers, and two large data mining bibliographies. It attempts to provide links to as much of the available data mining information on the net as is possible. Run by Pryke , at the University of Birmingham.
Knowledge discovery mine: The knowledge discovery mine has the KDD FAQ, a comprehensive catalog of tools for discovery in data ,as well as back issues of the KDD-Nugget mailing list. Run by leading KDD researcher Gregory Piatetsky-Shapiro.
What is sequence Analysis.
Sequence analysis is the process of trying to find out something about a nucleotide or amino acid sequence, employing in silico biology techniques. You may have sequenced a gene yourself, and wish to learn what the long string of letters representing base, actually code for. You may want to confirm that you indeed cloned a gene successfully, or you might want to learn about a sequence of DNA that you know absolutely nothing about. You may want to know if a worm has a similar protein to a human one..
What can be done now with sequence Analysis
Given the pessimistic view of sequence analysis presented in the previous section, why do we even bother with it? In the first place the attempted to find methods for successful sequence analysis is a research goal in its own right; one whose potential rewards are so vast as to make it of the first importance. In the second place, although there are many things that sequence analysis cannot yet do , there are many very worth while things that can currently be done with sequence analysis and these will be summarized in this section.
Identification of protein sequence from DNA sequence
The computer programs which are used to infer protein sequence from DNA sequence provide information which can be used to be help approach a solution. For example, if you are trying to find out in a DNA sequence a protein is encoded, it is very used to know what peptides would be encoded by all six reading frames. A stretch containing many stop codons is a poor candidate for encoding a protein. This will not absolutely tell you where the protein sequence starts and stops, but it will you guess where that might occur. Programs exist for doing these . In fact there are many factors you can used to guess where in a DNA sequence a protein sequence might reside; use of the expected codon bias, presents of characteristic sequences representing regulatory signals in the DNA and so forth. One family of programs integrates a variety of these approaches , and using either explicit algorithms or trained neural nets ,makes a prediction.
Tips for searching database.
Use latest database version
Use blast first, then a finer tool (fasta, search, blitz , sweep, block et al)
Search both strands when using FASTA. This is automatically done in GCG
Program.
Translate sequence where relevant
Search 6-frame translation of DNA database
EO<0.05 is statistically significant, usually biologically interesting
Check also 0.05 <EO< 10, as you might find interesting stuff
Pay attention to abnormal composition, t causes biased scoring
Split large queries
If>1000 for DNA,>200 for protein
If the query has repeated segments, delete them and repeat search
The process of evolution.
Indeed, homologous proteins arise from mutations in a common ancestor coding gene. Through the process of gene divergence, some gene mutations have been accepted by natural selection because they preserved the folding and function of the coded protein. This could be represented by schematic tree where several genes come from a common ancestor gene.
Principle and their importance
Sensitivity Versus Specificity
There are different ways to estimate similarity between two sequences, allowing us to modify the sensitivity and specificity of the results when performing a sequence database search with a query sequences . If the sensitivity is high, more distantly related sequence as the S. griseus protease will be retrieved.
Continue ..
However, unrelated sequences as the endochitinase will also be returned. On the other hand if the specificity is high , only closely related sequences will be returned but, in this case, distantly related ones will be missed . Thus, a researcher has to know how he could manage this problem .And this is one additional reason explaining why biologists should not treat software as a black box .

Window approaches
In particular, in comparing two sequences, a dot matrix can be used where one sequence is written out horizontally and the other is written out vertically . A dot I placed at the intersection of a row and a column for each matched pair of letters. If the frequency matched letters between two sequences is high, particularly in DNA sequences , which are composed of only four building blocks , the background noise is high . In order to reduce the noise, one can place a dot only when several joined letters are matched. The numbers of joined letters evaluated together is called the window size.
Efficient use of program
When performing a database search , a research must know that he can improve his results . If he knows the principles, the use of windows, he will be increase the sensitivity by decreasing the window size parameter. This will improve the ability of the program to recognize distantly related sequences . Alternatively , he will be able to increase the specificity by increasing the window size parameter ..
conclusion
This is important for a researcher who wants to use the programs available for sequence analysis to acquire a reliable knowledge of biocomputing. Knowing the capabilities and the draw backs of the program will help us to use them in a more accurate and efficient way.

sanjovincent · 08-16-2017, 10:17 PM

to get information about the topic bioinformatics full report ,ppt and related topic refer the page link bellow

http://seminarsprojects.net/Thread-bioin...s-projects

http://seminarsprojects.net/Thread-bioin...ull-report

http://seminarsprojects.net/Thread-data-...nformatics

http://seminarsprojects.net/Thread-bioin...-computing

http://seminarsprojects.net/Thread-beyon...er-centric

http://seminarsprojects.net/Thread-semin...nformatics

[email protected] · 08-16-2017, 10:17 PM

[attachment=6843]

Bioinformatics - a brief overview

Dr. Arun G.Ingale

ASSOCIATE PROFESSOR
Department of Biotechnology
School of life Sciences
North Maharashtra University, Jalgaon

What is bioinformatics?

Application of information technology to the storage, management and analysis of biological information
Facilitated by the use of computers

nehamishra12 · 08-16-2017, 10:17 PM

to get information about the topic bioinformatics full report ppt and related topic refer the page link bellow

http://seminarsprojects.net/Thread-bioin...s-projects

http://seminarsprojects.net/Thread-bioin...ull-report