Genotyping by Sequencing for Crop Improvement. Группа авторов
Читать онлайн книгу.used to screen for allele diversity using PCR from ten genotypes and the amplicons were sequenced followed by sequence comparison to identify SNP. SNPs were also identified through mining a large number of EST sequences in EST databases, which are generated through improved sequencing technologies (Soleimani et al. 2003). These SNPs are further validated using PCR (Batley et al. 2003). These approaches allowed the identification of mainly gene‐based SNPs, but their frequency is generally low. Additionally, SNPs located in low‐copy noncoding regions and intergenic spaces could not be identified.
Several assays have been developed for genotyping based on identified SNPs which include, allele‐specific hybridization, primer extension, oligonucleotide ligation, and invasive cleavage (Sobrino et al. 2005). Besides, DNA chips, allele‐specific PCR, and primer extension were also attractive options since these are suitable for automation and can be used for the development of dense genetic maps. Allele‐specific hybridization was used for the identification of polymorphism in 570 genotypes of soybean (Coryell et al. 1999).
1.5 Recent Advances in Molecular Marker Technologies
The improvement of Sanger sequencing technology in the 1990s combined with the beginning of EST and genome sequencing projects in model plants led to the spurt in the identification of variation at the single‐base resolution (Wang et al. 1998). From 2005 onward, the emergence of NGS platforms such as Roche 454, Illumina HiSeq2500, ABI 5500xl SOLiD, Ion Torrent, PacBio RS, Oxford Nanopore, and advances in bioinformatics tools simplified the process of identification of genome‐wide SNPs and changed the face of molecular marker technology. NGS‐based genotyping platforms such as genotyping‐by‐sequencing (GBS), whole‐genome resequencing (WGR), and high‐density SNP arrays helped to type thousands of SNPs in a single reaction in hundreds of individuals.
1.5.1 Genotyping‐by‐Sequencing (GBS)
GBS is an NGS‐based reduced representation sequencing technique for the identification of genome‐wide SNPs and genotyping large populations (Bhatia et al. 2013). GBS is a one‐step approach for the identification and utilization of markers in a single reaction. It is a complexity reduction procedure where a combination of restriction enzymes is used to separate low copy sequences from high copy repetitive regions. In general, GBS involves the sequencing of fragments generated through restriction digestion of the genome on the NGS platform. In this process, the DNA of the population is digested with RE followed by ligation of RE‐specific adaptors containing genotype‐specific barcode sequences and sites for binding PCR and sequencing primers (Figure 1.1). The fragments thus generated can be PCR amplified and an equal volume of PCR product from different individuals are pooled in a tube. The fragments in the pool can be selected based on their size and sequenced on the NGS platform. The choice of restriction enzymes depends upon the complexity and size of the genome. Presently, different versions of GBS are available, which includes RAD‐seq (restriction associated DNA sequencing), ddRAD‐seq (double‐digest restriction associated sequencing), SLAF‐seq (specific‐locus amplified fragment sequencing), Rest‐seq (restriction DNA sequencing), Skim GBS (skim‐based GBS) (Bhatia 2020). These versions differ with respect to fragment size selection, the extent of complexity reduction, and genome coverage. Since GBS is a population‐dependent genotyping method, to make it cost‐effective a low‐depth sequencing is adopted which caused a high rate of missing data. The low‐depth sequencing makes it an ineffective genotyping approach in heterozygous populations. GBS has low genome coverage due to reduced representation sequencing.
Figure 1.1 An example of GBS and GBS data analysis workflow for identification of SNP markers.
GBS is being widely used to capture SNPs and other marker variations by NGS. GBS overtook the conventional genotyping procedures involving the use of traditional markers such as RAPD, AFLP, SSR, and many others in terms of time, labor, and cost involved. As an example, GBS can generate data of thousands of markers in a large population in a week, which can be analyzed in a month (Bhatia et al. 2018). The approach has been utilized in the mapping of several economically important traits in a number of crop plants (Poland and Rife 2012). Most of the developing countries have in‐house computational facilities that are being used for GBS analysis. Few online servers are also available, where GBS analysis can be done using in‐built pipelines such as cyverse (www.cyverse.org); however, these are unable to analyze the large dataset. Further speed of analysis depends upon the internet speed. Alignment of NGS‐based reads and calling SNPs and Indels are the two major steps in GBS analysis, for which several pipelines are available publically such as Stacks, IGST, GB‐eaSY, TASSEL‐GBS, FAST‐GBS, UNEAK, etc. (Wickland et al. 2017).
Another important pipeline widely used for NGS data analysis is dDocent pipeline (www.dDocent.com) which is a simple bash wrapper to quality analysis, assemble, map, and call SNPs from almost any kind of RAD sequencing (Puritz et al. 2014). However, most of these pipelines are hard to code for a student with little bioinformatics background. Most of these pipelines vary with respect to the complexity of the genome and computational space required. Besides there are several bioinformatics tools such as BWA, Bowtie2, SAM tools, GATK, BCFtools including a set of Perl utility scripts (Kagale et al. 2016) that can be used for GBS data analysis. However, there should be knowledge of the installation and usage of these tools for proper utilization in data analysis. With the advancements in NGS approaches, GBS has become a widely used approach in plant breeding and genetics, particularly for understanding complex quantitative traits.
DArT‐seq GBS (https://www.diversityarrays.com/technology‐and‐resources/dartseq/) somehow overcomes the limitation of the missing data point. The technique is an extension of traditional DArT technology where DArT representations are sequenced on the NGS platform. The fragment sequencing enables a dramatic increase in the number of genomic fragments analyzed and an increase in the number of reported markers thus making it a cost‐effective technology than the initial DArT method.
1.5.2 Whole‐Genome Resequencing (WGR)
WGR with high coverage and depth overcomes the limitations of GBS due to missing data points and heterozygous calls. In general, WGR involves the sequencing of enough DNA fragments (>5×–20×) to cover the whole genome of an organism. Due to sequencing cost, the technique is suitable in crop plants having smaller genome sizes such as rice. In such cases, GBS can be replaced by resequencing of a larger size population at 5–6× depth. However, WGR for few samples can be done at a much higher read depth of 10–20× as in the case of the BSA‐seq approach (Nguyen et al. 2019). One of the important BSA‐seq‐based approaches is quantitative trait loci (QTL)‐seq developed by Takagi et al. (2015) in rice. Later this technique has been widely used in several crop plants. Takagi et al. (2015) developed a pipeline for analysis of the whole genome sequence of bulks and identification of causative variants. WGR has been used in several studies for identification of genome‐wide SNPs, genotyping mapping populations for construction of high‐density linkage maps and QTL mapping, linkage and genome‐wide association studies (GWASs), of reference genome improvement, and genomic selection (Poland and Rife 2012; Bhatia et al. 2013; Chung et al. 2017; Nguyen et al. 2019).
1.5.3 SNP Arrays
Along with GBS, high‐density DNA array‐based SNP chips or SNP arrays have become a widely used SNP detection platform for high multiplex genotyping. SNP arrays work by hybridization of DNA fragments with allele‐specific oligonucleotide probes (SNP probes) and fluorescence‐based detection of signals. In general, SNP arrays can be roughly categorized into two types based