Data Analytics in Bioinformatics. Группа авторов
Читать онлайн книгу.to high emission of harmful carbon dioxide gas, which is a major cause of global climate change.
The study of different microorganisms that uses carbon dioxide as their leading source will help to reduce atmospheric carbon dioxide levels.
Biotechnology
In bioinformatics, biotechnology is used to identify organisms and micro-organisms which can be useful in dairy industries and food manufacturing companies. For example micro-organisms like Lactococcus lactis involved in dairy industry for the manufacture of buttermilk, cheese, yogurt, etc.
Crop Improvement
Study of bioinformatics involves study of DNA, RNA sequence, prediction of function and structure of protein of plant genomes.
Genetic Knowledge of plants has shown the organisation of genes of plants and this knowledge is used for producing improved insect resistant crops and makes plants more productive and the protein model helps to improve genes of plants.
Insect Resistance
Soil-borne bacteria like Bacillus thuringiensis makes proteins that are toxic to some insects.
Genes of these soil-borne bacteria have been studied and successfully transferred to cotton, potatoes and maize to control many serious pests [4, 5].
These bacteria facilitate to repel insect attack so the practice of using insecticides in plants can be reduced with the study of the protein produced by them and hence the nutritional content of the plants can be improved.
Development of Drought Resistance Varieties
Genetic knowledge of plants helps to develop varieties of crops with a great tolerance of soil alkalinity, iron toxicities and have the capability to grow in reduced water condition. This also allows crop development in substandard soil regions to create more agricultural land and to increase crop production [7].
Comparative Studies
To understand the functions of genes, inherited diseases mechanisms and evolution of species we need to analyze and compare the genetic substance of different species.
Bioinformatics tools are also applied to make comparisons between the numbers, locations and biochemical functions of genes in different organisms [5, 6].
There are a wide range of applications of bioinformatics in the domain of diagnosis, medicine, agriculture, biotechnology. Studying and using different tools of bioinformatics will allow researchers to extend knowledge far more efficiently and effectively through data analysis and experiments. This will fasten the major discoveries more accurately.
3.1.3 Issues with Bioinformatics
Section 3.1.1 discusses different applications of bioinformatics. These applications come with many challenges when it is associated with some issues related with the data or the devices used for collection or analysis of it. So addressing and analysing these issues are required for proper execution and effective result. This subsections below discusses different issues that are faced when the biological study is conducted.
3.1.3.1 Issues Related to Structure
Study of DNA and protein includes problems like protein structure prediction as they are represented in 3D data, so structure prediction, alignment and analysis become a difficult task. The prediction of protein three-dimensional structure from sequence can be solved with the application of ANN.
Most of the biological networks such as protein–protein interaction networks, gene regulatory network, etc. are difficult to interpret and build due to the complexity of biological system. So using graph-theoretic methods these massive range of networks are displayed in graphs which makes classification very difficult using traditional methods.
3.1.3.2 Sequence Analysis
Classification of RNA, Protein Sequence and DNA become a challenge because of difference and similarity of many organisms.
Issue with Genome Sequence
A Genome denotes to the complete set of chromosomes of an organism consisting of DNA. Genome sequencing, is a way of mapping out DNA or ordering DNA for organizing, processing and interpreting the sequences, which again requires improvements in sequencing strategies. Each sequencing of DNA faces challenges in searching the sequence pattern, designing, analyzing and interpreting the data.
In gene findings and genome annotation: Gene finding suggests for prediction of nucleotide sequence such as introns and exons in DNA-sequence segments, whereas genome annotation is a process of gene sequencing to find out the gene coding regions to analyze protein sequence [8]. It involves study of the repetitive DNA within the genome, emulated from either same or nearly same sequence.
In sequence comparison: Sequence comparison is the process of comparing two or more than two sequences. Availability of large amount of sequences in genomic database requires proper categorization of DNA and protein sequence. So sequence comparison helps assigning a hypothetical structure and function to a sequence for identification, design and interpretation of sequence [8].
Analysis of sequencing or DNA sequencing is an important task because it helps to detect individual genes that are associated with a disease. When a disease affects an individual, its protein or genes get altered, that causes gene sequence alteration. So it becomes very important to detect these genes to find the cure of the disease. Traditional methods of gene detection were based on trial and error method. Now the advancement in Data mining and machine learning like Neural Network (NN) allow more precise study of genes and its sequence to simplify the task [9]. Many machine learning algorithms are used to classify the normal and abnormal genes with a great accuracy.
Solution to above problems involves following steps
Collection of Biological Data
Building Computational model
Analyze and solve problems of computational model
Test the computation algorithm
Evaluate the performance of the model.
3.2 Biological Datasets
Bioinformatics deals with various biological datasets being collected at different levels of omics data such as
Genomic Sequence data
Protein Sequence data
Microarray data
Structure data (Structure of RNA and protein)
Chemical data
Disease data.
Based on the type of data Biological database can be divided in to two categories:
a. Primary DatabaseThese kinds of databases are archival in nature because these databases are created by the experimental results submitted directly by researchers. These databases are populated with protein sequence, nucleotide sequence or macromolecular structure etc. [10].Example: Protein Data Bank (PDB), GenBank, DNA Data Bank of Japan (DDBJ), Gene Expression Omnibus (GEO).
b. Secondary DatabaseThese databases are either manually created or extracted from result analysis of primary database to create more structured records for easy retrieval of data [10]. Example: Swiss-port (it is protein sequence database maintained by Swiss Institute of Bioinformatics, Switzerland and the European Bioinformatics Institute, UniProt Knowledgebase.