Archives

  • 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • 2021-03
  • br Informatics in Medicine Unlocked br journal homepage www

    2019-10-29


    Informatics in Medicine Unlocked
    journal homepage: www.elsevier.com/locate/imu
    A novel SPDP cancer gene-based biomedical document feature ranking and clustering model 
    T
    Thulasi Bikkua,∗, Radhika Paturib
    a Department of Computer Science and Engineering, Vignan's Nirula Institute of Technology and Science for Women, Palakaluru, Guntur, India b Vignan's Nirula Institute of Technology and Science for Women, Palakaluru, Andhra Pradesh, India
    Keywords:
    Somatic cancer
    Genomes
    Bioinformatics
    Feature selection
    Feature ranking
    Fuzzy clustering
    Document clustering 
    Background: As the size of somatic genomes in biomedical repositories increases, it is essential to predict cancer related document sets using the machine learning models. Most of the traditional gene-based somatic cancer mining models are independent of somatic gene ranking and feature extraction due to high computational cost and memory for large datasets. A wide range of feature selection and feature extraction strategies are existing, and they are by and large generally utilized in various areas. Every one of these strategies plans to expel re-petitive and irrelevant features from the trained datasets with the goal that the arrangement of new document data will be increasingly accurate. Data extraction is the activity of providing relevant data according to an information need from a collection of large resources of data
    Results: Ranking consists of sorting the information offers according to some criterion, so that the “best” results appear in the top priority in the provided list. The mapping of somatic genomes and its equivalent words like synonyms to biomedical document ranking is intricate on vast biomedical document data sets. In order to overcome these limitations, a novel feature ranking based fuzzy clustering framework is designed and im-plemented on large biomedical databases
    Conclusion: Experimental results are simulated with different cluster sizes and gene features for somatic docu-ment clustering. Experimental results proved that the present model has high computational cluster quality rate with document ranking for somatic gene-based document indexing.
    1. Introduction
    Machine learning (ML), a subset of Artificial Intelligence which is based on computational statistics and algorithms used for prediction-making and allows computers to learn automatically [1]. ML involves learning from different experiences and then using those experiences to predict the correct outcomes in the later stages of its use. To define it in a computational language we say that, “A computer program is said to learn from experience E with respect to some class of tasks T and per-formance measure P, if its performance at tasks in T, as measured by P, improves with experience E”. Machine learning refers to algorithms that can make a computer learn from many examples. The basic idea is to extract a formal statistical model from the given examples and using it to predict the value or class of the target variable for an unseen ex-ample. If the value of the target variable is known for each example, it is called supervised machine learning [2]. Further, if the predicted vari-able is considered to be categorical, the task is known as classification and if the predicted variable is considered to be continuous, the task is
    ∗ Corresponding author.
    E-mail address: [email protected] (T. Bikku). 
    known as regression. If the possible outcomes are limited to two, it is called binary classification. If the possible outcomes are above two, it is termed as the multi-class classification problem. The primary objective of a machine learning model is to correctly classify an instance. How-ever, in many problem-domains like medicine, the classification is followed by critical decision making [27]. For example, take the case of using machine learning to predict whether a tumour is malignant or benign. In this problem, both the type of misclassifications i.e. false positives as well as false negatives are hazardous.