• 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • 2021-03
  • br Keywords Cancer driver gene


    Keywords: Cancer driver gene, Cancer genomics, Transcriptional regulatory network, Gene Necrostatin-1 data, Influence maximization
    1. Introduction
    1.1 Importance of CDG discovery
    The quest for key mutated genes which are related to the cancer has been conducted by several recent researches [1]
    [2] [3]. The main idea behind current cancer driver gene (CDG) discovery methods is the assumption that frequent mutations in certain genes causes the cancer. Not all mutations in a cancer genome are related to cancer. Consequently, various computational methods are used to distinguish cancer driver mutations from passenger mutations.
    The available methods for CDG discovery mainly rely on genomic and/or transcriptomic data. For example, DawnRank [4] tries to find driver genes using mutational and transcriptome data together with molecular interaction network information. ActiveDriver [3] exploits the data about post-translational modification sites of proteins that mutated in cancer genomes. Likewise, e-Driver [5] tries to find biased mutation rate in functional regions of a protein. Similarly, the method proposed Necrostatin-1 by Youn and Simon [6] accounts for the functional impact of mutations on proteins for finding CDGs. Moreover, OncodriveFM [7] and OncodriveCLUST [8] present methods which rely on assessing the functional impact of cancer genome variants on proteins. Dendrix [9], MeMo [10], MSEA [11] and CoMDP [12] use mutation profiles to identify cancer driver “pathways”. On the other hand, DriverNet [1 3] tried to find driver mutations by evaluating their effect on transcriptome. In contrast, for finding CDGs, NetBox [14] considers both protein-protein interactions and signaling pathways in order to find “network modules” which are affected by mutations. MutsigCV [15] exploits exome sequences to detect the heterogeneities in cancer datasets, and then, CDGs are determined based on their mutation frequency in different cancer types. Finally, MDPFinder [16] and iPAC [17] combine gene expression and mutational data to identify CDGs.
    There are some limitation and shortcomings in the currently available methods. These methods have high rate of false positive CDGs resulting low precision and F-measure. Furthermore, all of these methods heavily rely on mutation data, which is noisy, and additionally, may not be always available with desired quality. Moreover, most of CDGs found by each of these methods are in common with the set of CDGs found by other methods. Due to these limitations, we propose a new network-based driver gene prediction method. Our approach relies on the structure of gene regulatory network with no requirement of mutation data. This approach exploits an independent source of information, and we show that it is able to find many new CDGs, and therefore, can be considered as an orthogonal
    CDG prediction method. Finally, we show that the performance of our method is better than many of the other existing approaches.
    Transcriptional regulation network (TRN) is the fundamental network for controlling cellular processes. Gene regulation controls the activity of the genes at the transcription level. Transcription factors (TF) are key components in the cell that orchestrate the regulations. In other words, a TRN shows how any of the TFs regulates the expression of other TFs and genes. Most diseases, including cancer, are related to some dysfunction of TFs, which indicates the importance of analyzing TFs in biomedical research [18].
    In this study, we propose iMaxDriver as a novel network-based method for finding CDGs by applying influence maximization approach on a cancer-specific tailored TRN. We show that iMaxDriver is able to improve the accuracy of CDG prediction, while it correctly finds many CDGs which are not detectable by the current sequence-based approaches.
    1.2 Theoretical Background
    Influence maximization (IM) is a concept widely used in social network analysis to explain diffusion of information in a network [19]. Indeed, the IM problem is finding a minimal subset of nodes (i.e., the seed set) which have the greatest influence on other nodes. This problem is known to be an NP-hard optimization problem [20]. r> The IM problem is typically modeled by one of the two main approaches, namely, “independent cascade” o r “linear threshold”. In these models each of the nodes in th e network is labeled as “active” or “inactive”. At the beginning of the algorithm, a set of nodes are assumed to be active, while others are considered as inactive. Each of the active nodes can activate its (inactive) consecutive node based on a certain criterion, which depends on the IM approach. Independent cascade model only exploits probability value of edges to make decision for node activation, while in linear threshold model, in addition, a threshold value for each node is required.