Network Medicine: Complex Systems in Human Disease and Therapeutics
Rate it:
Open Preview
40%
Flag icon
Using different -omics data types has the potential to provide mechanistic insights regarding how the impact of genetic variants is biologically transduced to cause disease.
40%
Flag icon
Our knowledge of the molecular networks influenced by disease-related genetic variants is typically fragmentary.
40%
Flag icon
Reconstruction of context-specific gene regulatory networks may be necessary to understand genetic predisposition to complex diseases.
40%
Flag icon
Chen and colleagues performed multiple -omics analysis in blood samples from a single individual repeatedly over more than 1 year of observation. In addition to performing whole-genome DNA sequencing, they obtained repeated assessments of transcriptomics (using RNA-seq), proteomics (including autoantibody profiles), and metabolomics, thus creating an integrative personal -omics profile (iPOP) (Chen, Mias, et al. 2012).
40%
Flag icon
Future complex disease studies will need to consider the value of longitudinal multiple -omics assessments in uncovering the etiologies of those diseases.
40%
Flag icon
One of the key goals of network medicine is to create more meaningful classification systems for complex diseases based on etiology.
40%
Flag icon
Key ongoing challenges in applying such genetic networks include addressing the impact of linkage disequilibrium between SNPs as well as population stratification on network structure and subtype identification.
40%
Flag icon
Pharmacogenetics may assist in identifying individuals likely to benefit from specific pharmacological treatment and in avoiding treatment of individuals at high risk for adverse events.
41%
Flag icon
Early hopes that identifying alterations in the DNA sequence (genetic variants) would lead us quickly to the root cause of human disease or that simply looking at patterns of gene expression could inform us about the functional underpinnings of the phenotypes we observe were quickly dashed.
41%
Flag icon
Encoded within the human genome are approximately 20,000 protein-coding genes, something on the order of fivefold more isoforms, more than 1000 microRNAs, and multiple noncoding epigenomic states, all of which can affect the functioning of the cell.
41%
Flag icon
Much of the progress we have made in understanding disease phenotypes has come from analyzing gene transcriptional data—making static measurements of the abundance of RNA levels for different cellular states and using these data to develop network models representing the dynamical processes driving biological systems.
42%
Flag icon
Fortunately, new DNA-sequencing technologies are allowing the generation of increasingly large and complex datasets comprising multiple -omic assays from individual samples, including genome-sequence data, transcriptomic data, and genome-wide data on patterns of epigenetic modification.
42%
Flag icon
Microarrays quantify the amount of mRNA that is captured, or bound, to a set of complementary sequences (probes) that are themselves attached to a solid substrate
42%
Flag icon
As new sequencing technologies have become more robust and cost-efficient, the sequencing of RNA (or RNA-seq) has begun to replace microarrays as a means of assessing gene-transcript levels.
42%
Flag icon
While there are advantages and disadvantages to both microarrays and RNA-seq, and the analysis of data from each requires careful preprocessing to eliminate artifacts in order to estimate gene expression levels accurately, both have been widely used in transcriptomic network modeling.
42%
Flag icon
For gene regulatory networks, transcription is the output of an underlying process wherein the concentration of mRNA in a cell or population of cells is mediated by the context-specific behavior of a variety of controlling factors
42%
Flag icon
A common assumption in this analysis is that genes whose expression was highly correlated across samples were under common regulatory control and hence “coregulated.”
42%
Flag icon
Since these correlation-based similarity matrixes are symmetric across the diagonal, networks generated using these measures are generally undirected. They also include information relating every pair of genes, instead of just relationships between TFs and target genes; thus, regulatory relationships are confounded with coregulatory correlations.
43%
Flag icon
It soon became obvious that networks created in this way did not accurately represent the underlying regulatory processes. Highly correlated pairs of genes, which were the most common associations in the networks, were likely to be commonly targeted by an upstream TF rather than to regulate each other
43%
Flag icon
Furthermore, co-expression networks estimated from the Pearson correlation did not retain many of the properties that were already beginning to be associated with biological networks, such as a scale-free degree distribution.
43%
Flag icon
It is important to note that WGCNA is specifically tuned for finding sets of co-expressed genes with greater accuracy rather than modeling the regulatory network connecting those genes.
43%
Flag icon
While linear correlation is a useful measure of relatedness, some scientists recognized that biological interactions may be nonlinear and that these would be missed by simple linear measures such as Pearson correlation
43%
Flag icon
Linear measures, such as the Pearson correlation, can easily capture the relationship between A and B in the top plot, but for nonlinear relationships, such as the one shown in the bottom plot, a measure such as mutual information (MI) is more informative.
43%
Flag icon
Two methods that use mutual information as a starting point to infer gene regulatory networks are Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) and Context Likelihood of Relatedness (CLR).
43%
Flag icon
ARACNe seeks to address this issue by evaluating all such “triads” of nodes in a network, and removes the edge in this triad for which there is the least evidence of direct regulation.
43%
Flag icon
Unfortunately, ARACNe’s ability to reconstruct useful networks in other contexts has been limited. One reason for this limitation may be a consequence of the algorithm removing all triads in the network, a structure that is important in feedback and feed-forward loops
43%
Flag icon
Rather than pruning specific edges by comparing triads, CLR instead prunes edges based on local structure in the mutual information by normalizing the mutual information matrix by recasting it into z-score units.
43%
Flag icon
In their paper, Faith and colleagues demonstrated that CLR outperformed ARACNe in a benchmark Escherichia coli gene expression dataset.
43%
Flag icon
Despite their limitations, both CLR and ARACNe have been applied to the reconstruction of networks in many varied systems and remain well cited in the field of transcriptomics.
43%
Flag icon
Co-expression networks may well capture direct regulatory relationships, but these cannot be distinguished from indirect associations based on similarity of expression patterns. The result is often a series of many-to-many associations between genes with correlated expression in which the strongest associations are not necessarily those that are most relevant to understanding regulatory processes.
43%
Flag icon
Statistical methods have also been adapted for use in the reconstruction of gene regulatory network models. One of the main motivations for using these is that the score predicted for each edge in the network has a probabilistic interpretation with weights and errors.
43%
Flag icon
Statistical approaches for modeling gene regulatory networks generally fall into two main classes.
43%
Flag icon
The first frames network inference as a series of regression problems wherein the expression level of each target gene is predicted by a combination of the expression levels across a set of potential upstream regulators. The second casts the problem of finding regulators as a classification problem in which new targets of a TF are predicted by comparing each potential target gene’s expression profile to the profiles of known “true” and “false” targets.
43%
Flag icon
Regression approaches generally employ a resampling scheme, such as bootstrapping, to determine a score for each regulatory interaction that assesses the probability that the coefficients in the regression equation wt are nonzero.
43%
Flag icon
In contrast to regression-based approaches that try to predict the regulators of each gene, classification approaches look at the problem from the opposite direction and try to predict the targets of each TF by conceptualizing regulatory network reconstruction as a feature-selection model
43%
Flag icon
Classification methods rely heavily on the “prior” information used to build regulatory network predictions.
44%
Flag icon
Further, as explicated below, despite the large increase in genomic information over the past decade, only a subset of all known TF regulators have high-quality, condition-specific, validated regulatory interactions.
44%
Flag icon
One limitation of both regression-based and classification-based approaches is that after predicting each gene’s regulators, or each TF’s targets, it is necessary to perform a postprocessing step to stitch together these sets of predictions into a global network.
44%
Flag icon
Bayesian networks represent an alternative approach to network modeling that requires edges to be directed. Formally, a Bayesian network is a directed acyclic graph (DAG) whose vertices are random variables X1, … , Xn that are probabilistic, can be discrete or continuous, and describe variation across conditions. In this context, each variable has a conditional distribution given its parents P(Xi|Parents(Xi)) and is independent of its nondescendants given its parents. Consequently, Bayesian networks allow only dependencies between a node and its parents, and conditional independence statements ...more
44%
Flag icon
Part of the attraction of these models is that the edges do not necessarily represent direct interactions but can represent the influence of a number of undetected genes, proteins, or metabolites that, in many ways, allow us to overcome the imperfect knowledge of the relationships that exist in the systems we study and incompleteness in the experimental data.
44%
Flag icon
However, application of Bayesian network analysis to more “realistic” datasets (such as tumor vs. normal, treated vs. control) failed to provide similarly useful insights and, as a result, is rarely used in analysis of expression profiling data.
44%
Flag icon
The most significant reason for this is the computational complexity of learning the structure of the networks, a problem that has been shown to be nondeterministic polynomial time (NP)–hard (Chickering 1996), implying that an exact computational solution is not possible.
44%
Flag icon
For example, Wolpert and Macready (1997) noted that the use of domain-specific knowledge can provide a useful bias that leads to near-optimal solutions in exploring the state space of a particular problem.
44%
Flag icon
Although Bayesian networks allow high resolution of correlation structure in large datasets, they are fundamentally acyclic graphs and therefore cannot include feedback loops that are important for many biological processes, including the cell-cycle processes that Friedman and colleagues first studied.
44%
Flag icon
It has become increasingly clear that inferring regulatory networks from gene expression data alone results in, at best, an incomplete model.
44%
Flag icon
Other methods include gene expression information when doing enhancer mapping in an attempt to incorporate even more distal enhancers that may be regulating a target gene; these complex methods are much more computationally intensive and do not lead to a significant improvement in functional predictions based on validation experiments.
44%
Flag icon
PANDA (Passing Attributes between Networks for Data Assimilation; Figure 8–7) (Glass, Huttenhower, et al. 2013) is a promising new method that borrows an idea called message passing (or affinity propagation) from communication theory (Frey and Dueck 2007) to integrate diverse sources of genomic data and to model the flow of information in complex regulatory networks.
44%
Flag icon
One key feature of PANDA is its emphasis on agreement between data elements in a network neighborhood.
45%
Flag icon
Because PANDA considers multiple types of relationships between both regulators and their targets, the method can incorporate multiple independent data sources.
45%
Flag icon
Although a wealth of gene expression data has been generated over the past decade, most biological inference has been based on statistical tests at the level of individual genes (with high rates of spurious associations) followed by functional meta-analysis using gene set enrichment techniques.