Sign in Register Submit Manuscript

Qingres Home

Location:Home >> Detail

Med One. 2016 Oct 25;1:e160024. DOI:10.20900/mo.20160024.

Article

Network based genetic marker evaluation for Lewis lung carcinoma

Xiangyao Lian1, Dexiang Yang2, Shaolong Cao3*

1Department of Oncology, the Affiliated Hospital of Chengde Medical University, Chengde 067000, Hebei Province, P.R. China;

2Department of Respiratory, the People's Hospital of Tongling 244000, Anhui, P.R. China.

3Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston 77054, USA.

Correspondence: Shaolong Cao, Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston 77054, USA, Email: scao@mdanderson.org; Tel: 504-453-5259.

Published: 10/25/2016 18:33:41 PM

ABSTRACT

Background: Lung cancer is the most common cancer among males and females. During past years, many studies has been performed seeking genetic impact factors associated with lung cancer, posing the necessity of systematically study of the genetic network underlying Lewis lung carcinoma (LLC) model.

Methods:LLC-Gene relation data were extracted from the ResNet 11 Mammalian database, containing 175 LLC candidate genes (nodes). Pathway Enrichment Analysis, Sub-Network Enrichment Analysis, Network Connectivity Analysis and Network Metrics Analysis were conducted to study network attributes and select the top nodes (genes). Additionally, LLC-Drug and Drug-Gene relation data were employed to study the LLC-Gene relation at the small molecule level.

Results: 166/175 genes are enriched in 134 LLC candidate pathways (p < 1e-07), demonstrating strong gene-gene interactions. Metrics analysis revealed 4 genes, IL6, TNF, VEGFA, HIF1A, as top candidates for LLC, in terms of replication frequency, network centrality and functional diversity. Additionally, 169 out of 175 LLC candidate genes demonstrated strong interaction with 213 out of 253 LLC effective drugs, supporting the identified LLC-gene relationship.

Conclusion:Our results suggested that the genetic causes of LLC were linked to a genetic network composed of a large group of genes. The gene network, together with the literature and enrichment metrics provided in this study, laid the groundwork for further biological/genetic studies in the field.

INTRODUCTION

Lewis lung carcinoma is a tumor discovered by Dr. Margaret R. Lewis of the Wistar Institute in 1951. In the worldwide, lung cancer is the most common cancer among males and females in terms of both incidence and mortality [1]. To elucidate the underlying mechanisms of drug-mediated antitumor activity, LLC cells were utilized in experiments [2]. As such, significant research into the causes are being explored.

In recent years, an increasing number of articles have been published reporting over a hundred genes and their altered activity related to disease. For example, TNF-α-TNFR1 pathway is proposed therapeutic intervention of lymphatic metastasis that promote tumor growth and invasion [3]. Significantly decreased expression levels of HIF1A is suggested by many independent studies to be associated with LLC [4]. Additionally, the IL-6 family is highly up-regulated in many cancers and considered as one of the most important cytokine families during tumorigenesis and metastasis [5]. Mediated the modulation of the balance between VEGFA and TSP-1 can indeed transiently improve anti-tumor efficacy in LLC [6]. Observations from these previous studies are valuable in studying the genetic basis of the pathogenic development of the disease. Meanwhile, there are dozens of new LLC risk genes being reported every year, posing an increased need for a systematic evaluation of each gene’s potential pathogenic significance to the disease.

Nevertheless, no systematic analysis has evaluated the quality and strength of these reported genes as a functional network/group to study the underlying biological processes of LLC. In this study, instead of focusing on a specific gene, we attempted to discover the comprehensive genetic network underlying the pathogenic development of the disease. We hypothesized that candidate LLC genes are functionally linked to each other, playing roles as an network through multiple pathways influencing the pathogenic development of LLC.

MATERIALS AND METHODS

The study was laid out as follows: 1) Disease-Gene relation data analysis to identify LLC candidate genes; 2) Enrichment analysis on the identified nodes to study their pathogenic significance with LLC and identify candidate LLC pathways; 3) Gene-Gene Interaction (GGI) analysis to test the functional association between these reported genes; 4) Metrics analysis to generate node attributes and identify top nodes; 5) LLC-Drug-Gene relation data to study the LLC-Gene relation at the small molecule level.

Acquisition of LLC-gene relation data

The LLC-gene relation data were acquired from Pathway Studio ResNet 11 Mammalian database updated August 1st, 2016. The genes identified were used as the candidate network nodes. The ResNet® Mammalian database is one part of PS ResNet Databases, a group of real-time update network databases that includes curated signaling, cellular process and metabolic pathways, ontologies and annotations, as well as molecular interactions and functional relationships extracted from the 35M+ references covering entire PubMed abstract and Elsevier full text journals. The databases can be used for data mining and pathway building. The full ResNet Databases also include Plant database, and Targeted database.

Updated weekly, the ResNet® Mammalian database contains information for over 6,600,000 functional relationships for human, rat, and mouse, linked to all of their original literature sources. Entities in the database include: 1) 142,110 proteins; 2) 106,732 small molecules; 3) 8863 cell processes; 4) 15,911 diseases; 5) 5,038 functional classes; 6) 4,387 Clinical parameters; 7) 559 complex; and 8) 767 cells. For more information about the PS ResNet Mammalian databases, please refer to http://pathwaystudio.gousinfo.com/ResNetDatabase.html.

Identification LLC candidate pathways

To better understand the underlying functional profile and the pathogenic significance of the reported genes, a pathway enrichment analysis (PEA) was performed, through which candidate LLC pathways were identified. In PEA, Fisher-exact test is employed to test the hypothesis that a gene set with same number of genes could reach a same or higher overlap with the pathways tested. The PEA were performed against pathway database of over 2,000 manually curated PS genetic pathways, including Cell processes pathways, Expression Targets Pathways, Immunological Pathways, Inflammation Pathways, Metabolic Pathways, Nociception Pathways, Signaling Pathways and Toxicty Pathwyas. In addition, we also compared with entire GO database and Pathway Studio Ontology database. Additionally, a sub-network enrichment analysis (SNEA) was conducted [7]. SNEA approach is similar to that of PEA, where a given gene set is compared to the sub-networks pre-defined within Pathway Studio ResNet Database. In this study, we performed SNEA using all LLC candidate genes against diseases related sub-networks, with the purpose to identify diseases that share a genetic basis with LLC.

Gene-Gene Interaction analysis

Both literature and pathway based GGI were conducted to study the associations between the AD candidate genes. The literature based GGI (LGGI) was performed using Pathway Studio, which identified connectivity between given genes/proteins. The weight of an edge from LGGI is the number of scientific references underlying a reported gene-gene relation. The pathway based GGI (PGGI) analysis was conducted using the identified candidate pathways. The weight of an edge is the number of pathways where both nodes/genes were included.

Metrics analysis

For the gene-gene network built through the aforementioned steps, we proposed 4 attributes for each node, including 2 literature based metric scores (RScore and AScore), and 2 enrichment based metric scores (PScore and SScore). The proposal of these metrics was based on the logic that if a gene satisfies the following conditions, it is linked to LLC with high probability: (1) the gene has been frequently observed in independent studies to be associated with LLC (high RScore); (2) the gene plays roles within multiple pathways associated with LLC (high PScore); (3) the gene demonstrates strong functional linkage to many of other genes that were associated with LLC (high SSCore). Additionally, we proposed the AScore to represent the history of each LLC-gene relation and discover novel genes (e.g., AScore = 1 for the genes identified in this year, 2016).

Two literature metrics

We define the reference number underlying a gene-disease relationship as the gene’s reference score (Rscore), as shown in Eq. (1).

Two enrichment metrics

where w is the weighted adjacency matrix. The cell is greater than 0 if the node i is connected to node j, and its value represents the weight of the tie. Note: for network edges built by PNCA,CDW∈ [0,N*M] , where M is the total number of candidate pathways.

In Eq. (5), when 0 < α < 1, both high degree and strong ties are favorably measured, whereas, for values of greater than 1, lower degrees and stronger ties are favorably measured [8]. In this study, we set α = 0.5, such that the node degree and node strength are equally evaluated.

Two literature metrics

We hypothesized that regulation of significant LLC candidate genes contributes to the treatment of the disease, and therefore these candidate genes should present upstream regulation relations with drugs that are effective in treating LLC.

To test the potential relationship between LLC candidate genes and LLC effective drugs, LLC-Drug and Drug-Gene relation data were extracted and analyzed from ResNet 11 Mammalian database. All Drugs within the relation data sets have been shown effective in treating LLC and all genes have been identified as LLC candidate genes through LLC-Gene relation data. Related supporting data were provided in Supplementary Table S6a - S6e.

RESULTS

LLC-gene relation data with literature metrics

Study on the LLC-Gene relation data identified 175 LLC candidate genes, supported by 256 references. The full gene list and related information, including metric scores and related pathways can be found in Supplementary Table S1a, while all the 256 supporting references are listed in Supplementary Table S1b, including AD-gene relation types, reference titles and the sentences where an AD-gene relationship was identified. To note, gene with ‘m_*’ and ‘r_*’ represent genes identified in mouse and rat, respectively.

For the 175 genes associated with LLC, 103 (58.86 %) genes presented Regulation relationship to the disease, 23 (13.14 %) with Genetic Change, 43 (24.57 %) with Quantitative Change and 7 (4.00 %) with State Change. To note, 15 (8.57 %) genes have been reported to have multiple relationships with the disease. Specifically, 160 (91.43 %) genes presented 1 type of relationship to the disease, and 15 (8.57 %) with 2 (Fig 1 (a)). For detailed definition and description of these relation types mentioned above, please refer to the ‘Relations: Definitions and Annotations’ section at http://pathwaystudio.gousinfo.com/ResNetDatabase.html.

Publication date distributions of the underlying 256 articles supporting the gene-LLC relationships were presented in Fig 1 (b), with novel genes reported in each year in Fig 1 (c). To note, these articles have an average publication age of only 6.3 years, indicating that most of the articles were published in recent years. Additionally, our analysis showed that the publication date distributions of the articles underlying each of the 175 genes were similar to that presented in Fig 1 (b).

FIGURE 1
Fig 1 Histogram of the publications reporting gene-disease relationships between LLC and 175 genes. (a) Number of genes for different relation types; (b) number of article publications by year; (c) number of novelty genes identified in each year.

Among these 175 genes, 11 novel genes were reported in 2016 with ASocre = 1, which were listed in Table 1. For comparison purposes, Table 1 also listed the top 11 genes with the highest Rscore (in descend order). Full results were provided in Supplementary Table S1a.

TABLE 1
Table 1 Top 11 genes reported associations with LLC ranked by different scores
Enrichment analysis Enrichment analysis on all 175 genes

In this section, we presented the Pathway enrichment analysis (PEA) and sub-network enrichment analysis (SNEA) results for all 175 genes. The full list of 134 pathways/gene sets enriched with 166/175 genes (p-values < 1e-07) has been listed in Supplementary Table S2a. Among these enriched 134 pathways/gene sets, we identified 2 pathways that are related to the immune system (with 55 unique genes), 5 pathways to cell apoptosis (71 unique genes), 11 pathways to cell growth and proliferation (88 unique genes), 4 pathways to protein phosphorylation (37 unique genes) and 5 pathways to protein kinase (38 unique genes). In addition, we identified one ontology term related to aging (26 unique genes) [11-18].

Besides PEA, we also performed a SNEA using Pathway Studio with the purpose of identifying the pathogenic significance of the reported genes to other disorders potentially related to LLC. A list of the top 126 results p-value < 1e-50 (FDR corrected, q = 0.005) was provided in Supplementary Table S3a. In Table 2, we present the top 10 pathways/groups enrichment analysis results and top 10 Sub-networks enrichment analysis results by the 175 genes.

TABLE 2
Table 2 Top 10 results of PEA and SNEA by the 175 genes reported

Note: The p-value for each pathway/gene set in the table was calculated (q=0.005 for FDR correction) using one-tailed Fisher-exact test against the hypothesis that the gene set tested were not associated with the corresponding pathway/gene set.

Gene-Gene interaction

Both LGGI and PGGI were performed to generate two types of networks with the same nodes but different types of weighted edges. To note, LGGI is literature based while PGGI is pathway based. In this study, we used LGGI to test possible gene-gene interactions with literature reports. Building the pathway score (PScore) and the pathway based network significance score (SScore) was based on the PGGI.

LGGI analysis

LGGI was performed on the top 11 genes with the highest RScores and AScores (from Table 1) to generate gene-gene interaction networks. As shown in Fig 2 (a), there were 83 connections among all 11 genes from RScore group. In contrast, genes within the AScore group demonstrated only 44 relations among 11 genes (Fig 2(b)). This observation was consistent with the PEA and SNEA, suggesting that genes with the smallest AScore were not as functionally close to each other as were those from the RScore group.

FIGURE 2
Fig 2 Connectivity networks built by 11 genes from different groups. The networks were generated using Pathway Studio.
PGGI analysis

PGGI showed that, among the 175 genes (network nodes), there were 6,886 edges connecting 166 genes with 9 genes connected to no other nodes, as shown in Supplement Table S4. The average node strength (sum of the adjacent edge weights) of the network was 122.28, and the node strength for the 9 unconnected genes was signed with 0. Fig. 3(a) and Supplement Table S5 presented the network adjacency matrix of the genetic network built using the 166/175 genes.

FIGURE 3
Fig 3. Comparison of different metrics ranking the 175 genes. (a) Adjacency matrix of the genetic network built with the 166/175 genes as nodes and PGGI generated weights; (b) A Venn diagram of top 11 genes selected by different metrics.
Enrichment metric analysis results

Using the network built by PGGI, we generated two biological metrics, pathway score (PScore) and network significance score (SScore), for each gene (See Supplementary Table S1a). The value of a PScore represented how many LLC candidate pathways involved the node, and a SScore showed how significant the node was to the network.

To study the relationship between the two enrichment metrics and the two literature metrics, we conducted a cross-analysis of the top 11 genes selected using different scores, and presented a Venn diagram in Fig. 3(b) (Oliveros, 2007-2015). There were a strong overlap between PScore group and SScore (11/11). These 11 genes are the ones that related to the most pathways that were significantly enriched. Among these 11 genes, 4 genes were identified to be the overlap of SScore, PScore and RScore groups, including IL6, TNF, VEGFA, HIF1A, with RScore = 6.25 ± 2.63 refrences, PScore = 50.75 ± 10.69 pathways, SScore = 2.64 ± 0.33.

On the other hand, there were 7 genes observed in both PSCore group and SScore group, but not in RScore group, including: TGFB1, CCL5, CCL2, PDGFB, PTGS2, SERPINE1, MMP9. These genes play roles within many significant pathways with the disease (42.43 ± 5.97 pathways), and demonstrated strong network centrality (SScore = 2.38 ± 0.18). Although they were old (ASocre: 5.14 ± 3.76 years) and were not frequently replicated (1.29 ± 0.49 references), our results suggest that they were worthy of further study.

To note, we identified that there is no overlap between AScore group and any other groups, suggesting that these novel LLC candidate genes were supported with a fewer studies, linked to less LLC candidate pathways and genes. This observation was consistent with the PEA and SNEA.

Supports from LLC drug study

Study on the LLC-Drug relation data showed that there were 255 drugs that have been proven effective in treating LLC Disease, supported by 422 studies (see Supplementary Table S6a).

The Gene-Drug relation study showed that there were over 2000 relations connecting 169 out of 175 LLC candidate genes and 213 out of 255 LLC drugs, supported by more than over 20K references (see Supplementary Table S6b). The 6 unrelated genes include MIR511, MIR545, IL27RA, CCDC88A, ND6 and DESI2. Additionally, we observed that the 11 genes from RScore group presented over 1000 relations with 191/255 LLC drugs, supported by more than 15K references (see Supplementary Table S6c). Meanwhile, the 11 genes from AScore group presented 539 relations with 151/255 LLC drugs, supported by more than 5,798 references (see Supplementary Table S6d). We presented the top 1,000 Gene-Drug relations in Fig. 4.

FIGURE 4
Fig. 4 The LLC effective drugs and their relation with the candidate LLC genes. (a) The top 1,000 relations between the 191 drugs and 11 genes from RScore group; (b) The 539 relations between the 151 drugs and 11 genes from AScore group.

DISCUSSION

The risk of lung cancer is believed to be linked to a large genetic network. In many studies, LLC cells were utilized in experiments to elucidate the underlying mechanisms of drug-mediated antitumor activity of lung cancer. Results from this study revealed a complex genetic network underlying the pathogenic development underlying LLC. Network node (gene) and edge (gene-gene interaction) attributes were studied and presented, with small molecular level support identified from LLC-Drug and Drug-Gene relation data analysis that acquired from ResNet 11 Mammalian database.

PEA results showed that most LLC candidate genes identified were included in the pathways previously implicated with LLC (Supplementary Table S2a), including 2 pathways that were related to the immune system (with 55 unique genes), 5 pathways to drug effects (62 unique genes), 5 pathways to cell apoptosis (71 unique genes), 11 pathways to cell growth and proliferation (88 unique genes), 4 pathways to protein phosphorylation (37 unique genes), 5 pathways to protein kinase (38 unique genes) and one ontology term was related to aging (26 unique genes) [11-18]. Although there may be false positives from the separate studies, we hypothesized that the majority of these literature reported genes, especially the ones that were identified from significantly enriched pathways, should be functionally linked to LLC.

Moreover, the LGGI analysis showed that many of the frequently reported genes relatedto LLC were functionally associated with one another (Fig 2), supported by thousands of scientific reports. PGGI results confirmed the observation and showed that 166/175 nodes presented a strong connectivity with each other (average node degree: 41.48 edges). The results indicated that these functionally linked genes possessed higher opportunities as true discoveries than that as noise (false positives). It was less likely that the gene network as a whole was falsely perturbed [19].

In addition to PEA, we performed a SNEA, which provided high levels of confidence when interpreting experimentally-derived genetic data against the background of previously published results (Pathway Studio Web Help). SNEA results demonstrated that over 70 % of the 175 genes also identified as causal genes for other disorders associated with lung cancer. For example, atherosclerosis was in strong association with lung cancer, that was studied underlying the LLC cells experiment [20], which supported the hypothesis that these genes were functionally associated with LLC.

For each node of the LLC genetic network, we proposed 4 metric scores as node attributes to evaluate their significance in terms of: 1) publication frequency (RScore), 2) novelties (AScore), 3) number of associated LLC candidate pathways (PScore), and 4) network centrality (SScore). Using the proposed quality metrics scores, one was able to rank the genes according to different needs/significance and pick the top ones for further analysis (see Supplementary Table S1a). Specifically, we observed that 4 genes frequently replicated (with high RScore) also demonstrated high SScore and PScore, such as IL6, TNF, VEGFA, HIF1A (see Fig. 3(b)). These genes have an average support of 6.25 ± 2.63 references, and were connected to multiple significantly enriched LLC candidate pathways (50.75 ± 10.69 pathways). Moreover, these genes presented the highest network centralities (SScore = 2.64 ± 0.33), suggesting that they are important nodes for the whole disease network and likely pose biological significance.

Alternatively, there were 7 genes observed in both PScore group and SScore group, but not in RScore group. Although these genes were old in terms of ASocre (5.14 ± 3.76 years) and were not frequently replicated (1.29 ± 0.49 references), they played roles within multiple LLC candidate pathways (42.43 ± 5.97 pathways) and demonstrated high centrality for the whole network (SScore = 2.38 ± 0.18). For example, the gene PTGS2 (SScore = 2.30), although reported 13 years ago and thus far only 2 references supported its relation with LLC, was linked to 166/175 genes and played roles within 35 significantly enriched LLC candidate pathways, many of which have been implicated with LLC: angiogenesis, inflammatory response, ROS in Triggering Vascular Inflammation, response to lipopolysaccharide, positive/negative regulation of cell proliferation, positive regulation of apoptotic process, IL1A Expression Targets, IL1B Expression Targets, F2 -> STAT1/NF-kB Expression Targets, PAF Expression Targets, TLR4/AP-1 Expression Targets, CXCL12 Expression Targets, HMGB1 Expression Targets, TNF/AP-1 Expression Targets, CD40LG/NF-kB/ELK-SRF/CREB/NFATC Expression Targets, positive regulation of nitric oxide biosynthetic process[11-17][21-29].The observations suggested that these genes may play significant roles in the pathogenic development of LLC and were thereby worthy of further study.

Through the LLC-drug relation data study (see Supplementary Table S6a), we showed that the majority of these drugs (213/253) regulate most of the LLC candidate genes (169/173), supported by over 20 thousands references (Supplementary Table S6c). Moreover, we observed that 11 genes from RScore group presented over 1000 relations with 191/255 LLC drugs, supported by more than 16,145 references (see Supplementary Table S6d) and the 11 genes from AScore group presented 539 relations with 151/255 LLC drugs, supported by more than 5,798 references (see Supplementary Table S6e).

Our results help to understand the underlying mechanisms and biological processes of LLC and support the hypothesis that the majority of the 175 LLC gene pool identified plays a role in the pathogenic development of lung cancer. For example, Mitomycin C has shown anti-LLC effect with incomprehensive mechanisms [30,31]. Our study showed that Mitomycin C presented 34 relations with 34/175 genes that were related to LLC (See Supplementary Table S7), while these 34 genes regulate the pathogenic development of LLC. Our results provided supporting information in understanding the anti-tumor effect of Mitomycin C.

Nevertheless, this study has several limitations that should be considered in future work. Although the 175 gene-LLC relation were supported by 256 articles, it is still possible that some gene-LLC associations were left uncovered. Additionally, although the proposed metrics help in ranking the genes and selecting the top ones with specific significance, further network analysis using more complex algorithms (e.g., graph theory) and more data sets may extract additional meaningful features to identify biologically significant genes to the disease.

We conclude that LLC is a complex disease whose genetic causes are linked to a network composed of a large group of genes. This study provided a comprehensive weighted genetic network with node attributes for LLC, which could be used as groundwork for further biological/genetic studies in the area.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

FUNDINGS

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

REFRENCES

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

All Rights Reserved © Copyright 2016 Qingres Co., Ltd .

Powered by Qingres Limitd.