Genetic Analysis of Complex Disease. Группа авторов

Читать онлайн книгу.

Genetic Analysis of Complex Disease - Группа авторов


Скачать книгу
is the recent analysis of 50 000 individuals in the MyCode Community Health Initiative successfully identified rare variants underlying cardiovascular traits and lipid levels (Dewey et al. 2016). The rapid and continuing decrease in whole‐genome sequencing (WGS) costs suggests that within a few years, it will be possible (and perhaps commonplace) to test the CDRV hypothesis using WGS in large sample sizes – essentially performing genome‐wide association for common and rare variants with direct genotype determination via sequencing.

      Study design, laboratory methods, and analytic approaches differ by trait type (Mendelian or complex) and hypothesis being tested (rare disease‐rare variant, Mendelian positional cloning; CDCV [GWAS]; CDRV [WES or WGS and individual variant or set‐based association]). These approaches are described in the following sections.

image

      This section discusses the steps in Figure 1.2, providing an overview of each component and a guide to the chapter(s) providing more detail on these points.

      Define Disease Phenotype

      The first step in any disease gene discovery process is to know what phenotype is being studied. This may sound obvious, but specifying the exact measures that will be used to reliably and validly determine the phenotype is often overlooked in the rush to move forward. There are three aspects that need to be considered: clinical definition, determining that a trait has a genetic component, and identification of datasets that can be studied.

      Clinical Definition

      The phenotype assignment must be done in a rigorously consistent fashion. Even a small rate of phenotype error might alter analytic results – in some cases leading to false‐positive results and in others to false‐negative results. Thus, which data will be used to assign the trait status must be carefully determined. Must detailed clinical records of an examination specifically addressing the phenotype be obtained and reviewed for consistency on every participant? Is the self‐report of a participant or a participant’s relative sufficient? Is a note documenting a diagnosis (but no examination findings) from a medical record adequate? Or is direct examination of every participant using a standardized research protocol required? Additionally, investigators must consider whether to collect additional biomarker data (e.g. antibody titers, protein assays) or clinical tests (e.g. electroencephalogram, electrocardiogram, magnetic resonance imaging) that might correlate with the trait of interest. The goal of the phenotyping protocol is to standardize procedures, minimize error in determining the phenotype, and maximize the power of the dataset to detect genes underlying the trait.

      Determining that a Trait Has a Genetic Component

      It is critical that as much as possible be known about the genetic basis of a complex trait prior to determine the most appropriate study design for gene identification. That a trait “runs in families” is insufficient evidence, since this phenomenon can occur for several reasons other than shared genetic susceptibility, including shared environmental exposure and biased ascertainment. As outlined in Chapter 3, there are numerous lines of evidence that can be examined, including family studies, segregation analysis, twin studies, adoption studies, heritability studies, and population‐based risks to relatives of probands (the initially identified individual with disease). For most traits being contemplated, some such data already exist in the literature. A thorough review of this literature may provide most of the necessary information and point out any missing data. The data may not only indicate the strength of the genetic effect on the trait but also give some indication of the underlying genetic model. For example, there may be obvious evidence of a single “major” gene, such as in Huntington’s disease, or multiple genes interacting in complex ways, such as in multiple sclerosis (Sadovnick et al. 1996).

      Identification of Datasets

      It is helpful to identify early on what potential datasets exist or can be collected. Do large families exist or are most cases apparently sporadic? Are large cohort or case–control studies available? Are there repositories of multiplex families with associated clinical data available? Are there existing clinical networks or large specialty clinics available? Is the necessary phenotype data available in a biobank linked to an existing electronic health record? The answers to


Скачать книгу