Genetic Analysis of Complex Disease. Группа авторов
Читать онлайн книгу.rel="nofollow" href="#u9fe3d5a3-2b27-5576-a758-0124285c398f">Chapters 3 and 4.
Develop Study Design
Developing your study design and delineating the phenotype are not independent steps. Review of the available data may indicate that a trait as originally defined has little or no evidence of a genetic component. However, there may be strong evidence that a subset of the trait is strongly genetic. For example, there had for many years been debate about the role of genetics in Alzheimer’s disease. Over time it became increasingly clear that a subset of individuals with the onset of Alzheimer’s disease before age 65 existed and strongly clustered in families with apparent autosomal dominant inheritance. Within each of these families, Alzheimer’s disease appeared to be caused by a single gene. By restricting sample collection and genetic analysis to these types of families, three genes (APP, PSEN1, and PSEN2) were identified with mutations causing early‐onset Alzheimer’s disease (Goate et al. 1991; Levy‐Lahad et al. 1995; Rogaev et al. 1995; Sherrington et al. 1995).
The exact approach to the disease gene discovery process should be outlined as completely as possible before the project gets underway. With the clinical phenotype in hand, it is possible to determine the best strategy for defining what type of dataset to collect. Participant recruitment is perhaps the longest and most labor‐intensive step in the entire process. It is imperative that the enrollment of participants (particularly if studying multiple members of the same family) proceeds with careful consideration of the wishes and norms of the participating individuals, families, and communities. The rights of individuals to participate or refuse participation should receive careful consideration, and the informed consent process should provide adequate explanation of the study and answer any questions, and, critically, confidentiality must be carefully protected. These issues are outlined in detail in Chapter 5.
Determination of the study design (case–control, cohort, case series, family‐based) is based on the characteristics of the phenotype, the estimated genetic model, and the research objective. For example, the existence of large families with apparent Mendelian segregation suggests that a single major gene could be detected, and a family‐based study would be appropriate. A phenotype with weaker estimated heritability, a pattern of recurrence risks suggesting many genes of small effect, and little familial aggregation would suggest that a case–control study design is most feasible. The process of selecting a study design to answer a research question is reviewed in Chapter 4.
It is also important to have some sense of the sample size required to identify the genes being sought. When pedigree structures are already available in family‐based studies of single‐gene disorders, power is easily calculated with high confidence for specific genetic models using computer simulation programs. For complex traits, however, genetic models are not as easily specified in advance, and computer simulations often must consider a range of parameter values for the genetic model to describe the power across several competing alternatives. Chapter 12 provides an overview of the available approaches and tools for sample size, power estimation, and genetic simulations.
Family‐Based Studies
Family‐based studies include large extended families, smaller multi‐case families (often affected sibpair or other affected relative pairs), and discordant sibpair studies. Depending on family structure and number of individuals collected, these families may be used in linkage analyses (as discussed in Chapter 6) or association studies (Chapter 8). Depending on the genetic architecture of the trait and the frequency of the disease‐associated alleles being sought, this design may offer increased power over population‐based designs.
Population‐Based Studies
Several types of observational designs may be considered for population‐based studies, including case‐series, case–control, and longitudinal cohort designs. The possible sampling frames for these types of studies include simple random samples of a defined geographical area, clinic‐ or hospital‐based samples, convenience samples such as voluntary registries or biobanks, or hybrids of these (e.g. health‐system‐based biobanks linked to longitudinal electronic health records). These designs became much more frequent with the advent of high‐throughput genotyping technologies, which enabled the efficient study of very large samples of unrelated individuals through GWAS (Chapter 9), an approach with substantially greater power than a similarly sized family‐based study.
Approaches for Gene Discovery
There are two general, but not mutually exclusive, ways to approach gene discovery for complex traits. The first is to take a genome‐wide screening approach. Genomic screening can aim to identify areas of genetic linkage in family‐based designs (Chapter 6) or areas of association in either family‐ or population‐based designs (Chapters 8 and 9). A good genomic screen will attempt to cover the entire human genome using markers evenly spaced across the genome. Current high‐throughput genotyping technologies enable genotyping of hundreds of thousands to millions of single nucleotide polymorphisms in a rapid, inexpensive manner for use in linkage or association studies. More recently, high‐throughput sequencing technology has been used to screen the entire coding sequence of the genome (WES) or the entire genome (WGS) for trait‐associated variants, without first conducting genome‐wide linkage or association studies. As sequencing costs continue to decline, a shift to “genotyping by sequencing” is likely, in which results from WGS might be used to conduct a genome‐wide screen and follow‐up in a single molecular experiment. These same high‐throughput genotyping and sequencing technologies allow large‐scale examination of gene expression (through gene expression microarrays or RNA‐Seq) and epigenetic changes (through methylation arrays or Methyl‐Seq) in trait‐relevant tissues. The results of such experiments are often used in conjunction with genome‐wide screens to identify high‐priority candidate genes for follow‐up studies. These technologies and their application to genomic studies are discussed in Chapter 10.
In contrast to the genomic screening approach, a directed screening approach may be used. This approach, sometimes termed a “candidate‐gene” approach, focuses on an area of the genome selected for examination based on prior information. The additional information could come from many sources, including results from a previous genome‐wide screen, results from gene expression studies, genes suggested by pathophysiology, or candidate genes identified in model systems. For example, multiple sclerosis is an autoimmune disease in which the myelin sheaths around nerves are attacked and often destroyed. This information suggests that certain genes, such as the human leukocyte antigen genes, T‐cell receptor genes, and the myelin basic protein gene, are prime candidates for analysis. The strength and weakness of this approach arise from the confidence in the role of these genes. If the evidence is strong that a direct role is played, only a few such genes may need to be tested to find a trait‐associated variant. If the evidence is more circumstantial, then many genes may have equal justification for being studied, and not much is gained over conducting a genome‐wide screen. Such studies are now most often conducted as follow‐up of prior genomic screens or other hypothesis‐generating experiments.
Analysis
Genomic Analysis
Generally, genome‐wide genotyping or sequencing is the first