Location:Home >> Detail
Med One. 2017 Apr 25; 2:e170005. DOI:10.20900/mo.20170005.
1 Department of Hematology, Ruijin Hospital affiliated to Shanghai Jiao Tong University, School of Medicine, Shanghai, China, 200025;
2 National Cancer Institute, 9000 Rockville Pike, Bethesda, Maryland, 20892.
Corresponding to: Lei Xu, Ph.D, National Cancer Institute, 9000 Rockville Pike, Bethesda, Maryland, 20892. Email: email@example.com; Tel: 1-301-402-5535.
Background: B-cell chronic lymphocytic leukemia (B-CLL) is the most common type of leukemia in adults, with the underlying mechanisms remains unclear. The aim of this study is to investigate the novel genetic risk of B-CLL by systematically reviewing the published literatures and performing a meta-analysis.
Methods: A comprehensive search of electronic databases was completed by using Illumina BioEngine. Twenty one B-CLL case/control bio-sets from four different studies were selected, including 195 B-CLL cases and 31 controls. The selected top B-CLL risk genes were further analyzed by integrating an online open source B-CLL genetic database. Pathway enrichment analysis (PEA) and network connectivity analysis (NCA) were conducted to identify potential functional association between target genes and B-CLL.
Results: One novel gene (NRIP1) and two known genes (INPP5F and LEF1) were identified through the meta-analysis as top target genes for B-CLL. These genes play important roles within multiple B-CLL genetic pathways and tightly related to known B-CLL target genes. NCA results also revealed strong functional association between these genes and B-CLL.
Conclusion: This study identified known as well as novel B-CLL target genes and their functional pathways that involved in the B-CLL pathogenesis. Our results may provide new insights into the understanding of the genetic mechanisms of B-CLL.
Chronic lymphocytic leukemia (CLL) is the most frequent B cell leukemia in elderly patients, the onset age of CLL are mostly over 50, with few occur in children[2-4]. The cellular origin of CLL is still debated, although this information is critical to understanding its pathogenesis. It has been hypothesized that both environmental and genetic factors play important roles in the development of CLL[5-7].
A large number of genetic studies of B-CLL have been conducted to explore candidate genes for the disease with both case-control studies and family-based studies[5, 6, 8-12]. Many studies have shown increased familial risk for CLL, and an ~8.5-fold increased relative risk in first-degree relatives. In addition, genome-wide association (GWA) studies identified multiple CLL susceptibility loci and novel genetic variants from familial CLL but not seen in sporadic CLL[16,17]. Furthermore, multiple modality genetic data from peripheral blood samples were employed to identify B-CLL genetic determinants[8-11]. These previously studies built a solid background for B-CLL genetic research, which could be leveraged for the discovery and evaluation of novel risk genes.
However, the risk estimates from individual studies often lack statistical power due to limited sample sizes and sample specificities in terms of phenotype characteristics. It is also difficult to come to a consistent conclusion as results are spread over a large number of independent studies. Therefore, a meta-analysis of multiple studies could provide a higher power assessment of the genetic risk factors of B-CLL.
Based on four recent studies (2004-2012), a meta-analysis was performed in this study. Integrating a curated B-CLL genetic database (B-CLL_GD), the top genes from the study were further analyzed. The B-CLL_GD database was constructed using a large scale literature knowledge database, Pathway Studio (PS) ResNet database. In recent years, PS ResNet database has been widely used to study modeled relationships between proteins, genes, complexes, cells, tissues and diseases (http://pathwaystudio.gousinfo.com/Mendeley.html). Our study identified novel B-CLL genes and evaluated the effectiveness of integrating meta-analysis and PS ResNet database to identify and evaluate novel B-CLL risk genes.
A systematic search of electronic databases was conducted using Illumine BaseSpace Correlation Engine (http://www.illumina.com). Figure 1 presents the diagram for the data selection. The ‘B-cell chronic lymphocytic leukemia’ search result identified 28 B-CLL studies. Further filter criteria included: (1) the organism is Homo sapiens; (2) the data type is RNA expression; (3) the study is B-CLL case vs. healthy control study (or include case/control bio-sets). In total, 21 bio-sets (B-CLL case/control comparisons) from four studies satisfied the study selection criteria and were included in this systematic review and meta-analysis.
The B-CLL_GD is B-CLL targeted knowledge database online available at ‘Bioinformatics Database’ (http://database.gousinfo.com/). The database is updated monthly or upon request. The current version of B-CLL_GD is composed of 753 B-CLL target genes (B-CLL_GD→Related Genes), 125 pathways (B-CLL_GD→Related Pathways), and 159 related diseases (B-CLL_GD→Related Diseases). The database also provides supporting references for each B-CLL-Gene relation, including the titles and the sentences where the relation has been identified (B-CLL_GD→Ref for Related Genes). This information could be used to locate a detailed description of how a candidate gene/drug is related to B-CLL.
Using B-CLL_GD, further analysis of the B-CLL target genes from the meta-analysis were conducted, including identifying their related B-CLL pathways (B-CLL_GD →Related Pathways) and genes (B-CLL_GD →Related Genes). Here we defined two genes as functionally related if they play roles within same genetic pathway. Pathway enrichment analysis (PEA) was conducted using Pathway Studio to identify genetic pathways potentially linked to B-CLL. The gene-disease relationships were identified using the network building module of Pathway Studio.
Screened by the selection criteria, 21 B-CLL case/control comparison bio-sets from four independent studies were retrieved and assessed (see in B-CLL_Meta→Selected Datasets). Only one of the four datasets contained one case/control study (GSE19147). In this study, researchers analyzed T-cells isolated from CD3+ T-cells of patients with B-CLL, providing insights into the role of T-cells in B-CLL. Other three datasets contained separate case/control studies were available at NCBI GEO (ID: GSE2466, GSE26725 and GSE36907).
Datasets GSE26725 was designed to study the relationship between MYB (v-myb myeloblastosis viral oncogene homolog) and miR-155 host gene in B-CLL, containing 4 separate case/control studies: (1) B-CLL of diseased patients vs. normal CD19+ PBMCs (peripheral blood mononuclear cells of healthy subjects); (2) B-CLL of high-risk Rai stage 3-4 vs. normal CD19+ PBMCs; (3) B-CLL of intermediate-risk Rai stage 1-2 vs. normal CD19+ PBMCs; (4) B-CLL of low-risk Rai stage 0 vs. normal CD19+ PBMCs. Other identified datasets (GSE36907) contained two separate case/control studies to detect the cellular origin and pathogenesis of CLL, including: (1) B-CLL with mutated IgV vs. normal naive CD27- IdD+; (2) B-CLL with wildtype IgV vs. normal naive CD27- IdD+. The third identified datasets (GSE2466) contained 14 separate case/control studies, finding that a gene dosage effect may exert a pathogenic role in B-CLL, as well as genomic signature for the VH mutational status might be sex-related. All the bio-sets studies are available at http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE2466 Statistics of the included bio-sets are presented in Table 1.
Note: c-c refers to case vs. control.
The Meta-analysis results were deposited into the ‘Bioinformatics Database’ (http://database.gousinfo.com), named as B-CLL_Meta. The top three genes (Score > 90) from the meta-analysis appear in Table 2., with more detailed statistics presented in B-CLL_Meta→Top 3 Genes. The full Meta-analysis results are presented in B-CLL_Meta→Full Gene List. A gene's score is defined by the meta-analysis Illumina BaseSpace Correlation Engine (http://www.illumina.com), which is based on the statistical significance and consistency of the gene across the queried bio-sets. The higher the score, the greater importance of the gene is for the case/control comparison.
Score: A gene's score is based on the statistical significance and consistency of the gene across the queried bio-sets. Specificity: A gene's specificity is the number of bio-sets in which the direction of a gene's regulation matches the selected filter. Associated Pathway: The known B-CLL related Pathways (B-CLL_Meta→Related Pathways) that contain the gene. GO ID is provided if any. Gene Connectivity: The number of known B-CLL related genes (B-CLL_GD→Related Genes) that connect with the target gene.
Out of the three genes listed in Table 2., only one gene were not included in the database B-CLL_GD (NRIP1), which suggested that it may be a novel B-CLL risk gene. Further study using the B-CLL_GD showed that, this novel gene was enriched within multiple B-CLL target pathways and was connected to many other genes that were linked to B-CLL (Table 2., B-CLL_Meta → Related Pathways). Fig. 2 shows the 14 B-CLL pathways including these three genes. To note, two of these 14 pathways were among the top 10 B-CLL pathways (B-CLL_Meta → Related Pathways), including positive regulation of cell proliferation (0008284) and negative regulation of apoptotic process (0006916), as shown in Fig. 2.
The weight for a two-node edge is the number of shared genes by the two Pathways; The larger the size and brighter the color of a node, the larger the number of B-CLL candidate pathways including the gene.
Additional functional network connectivity analysis (NCA) using PS showed that, the novel gene from this meta-analysis (NRIP1) presents strong functional association with B-CLL. These genes influence the pathogenic development of B-CLL through multiple pathways (Fig. 3). Each relation (arrow) were supported by one or more references (see B-CLL_Meta → NRIP1).
Although many previous genetic studies have been conducted to discovery genetic risk factors for B-CLL, combining the results from these separated studies by using meta-analysis could lead to a higher statistical power and more robust point estimate for the disease. In this study, meta-analysis was performed on 21 B-CLL case/control bio-sets extracted from four recent studies. The B-CLL target genes from meta-analysis were sorted by gene score, which is based on the statistical significance and consistency of the gene across the queried bio-sets. Meta-analysis results suggested three top risk genes (INPP5F, NRIP1 and LEF1) for B-CLL (Score > 90), and one of them is novel according to a recently updated database B-CLL_GD. Further analyses were conducted to study the possible correlation between B-CLL and these three genes, especially the novel gene.
Analysis using B-CLL_GD showed that the two known B-CLL target genes, INPP5F, and LEF1, are among the top B-CLL_GD genes with supports from multiple independent studies (see B-CLL_GD → Related Genes). Results from PEA showed that these three known genes and the one novel gene (NRIP1) are enriched within multiple B-CLL pathways (B-CLL_Meta → Related Pathways) and linked to hundreds of other B-CLL genes. These results support the relationship between these genes and B-CLL.
Additional network connectivity analysis (NCA) revealed multiple possible functional associations between B-CLL and the novel gene (Fig. 3). It has been shown that overexpression of NRIP1 could increase the mRNA levels of TNF-alpha, while TNF-alpha plays an important role in the progression of B-CLL. TNF-α promotes the proliferation of malignant cell clones, therefore inhibition of TNF may have therapeutic method in CLL. This suggests that NRIP1 may play a role in the development of B-CLL through a NRIP1 → TNF → B-CLL pathway.
Many studies indicated that INPP5F and LEF1 may have potential role in the therapeutic strategies of CLL[22,23]. More potential connections between these genes and B-CLL could be identified from the B-CLL_Meta database (see B-CLL_Meta → NRIP1, INPP5F and LEF1), which is available in the open source ‘Bioinformatics Database’ (http://database.gousinfo.com).
Nevertheless, there are several limitations of this meta-analysis. The number of B-CLL patients and healthy controls were not well match (195 B-CLL cases and 31 controls). The unbalanced case/control comparison may influence the accuracy of the results. Additionally, due to the limitation of the space, we mainly focused on the most significant genes (Gene Score > 90). Genes with less significance from this meta-analysis may also poses potential linkage to B-CLL. One could refer to the full gene list of 100 top genes of this meta-analysis, which is presented in the database B-CLL_Meta.
In summation, this meta-analysis supported the correlation between two genes (INPP5F and LEF1) and B-CLL, and revealed one novel potential risk gene (NRIP1) for the disease. Network analysis supported the meta-analysis results and identified potential functional pathways and mechanisms, through which these genes play important roles on B-CLL. Findings in this study provide new insights into the current genetics research on B-CLL.
Authors claim no conflict of interests.