Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • need help

    I was reading this patent which contained a training of a neural network the training set consisted of a set of genes and a set of non genes could you give me some information about non genes that is
    1.what are non genes?
    2. how can i get their sequences ?
    3. what kind of organisms have high number of non genes?
    4.are pseudo genes same as non genes?


    Patent and Paragraph details
    publication number :US 2005 /0136480 A1
    publication date 23rd june 2005
    paragraph no [0081]

    paragraph text
    "The training set' consists of 1610 E coli .K-12
    NCBI listed protein coding genes and 3000 F. E coli .K-12
    ORFS (a stretch of sequence of length more than 20 amino
    acids and having start codon, stop codon in the same frame)
    which have not been reported as genes (non-genes).
    The
    validation set has 1000 known genes and 1000 non-genes
    from E coli .K-12 distinct from those used in the training set.
    The test set contains another 1000 genes and 1000 nongenes
    from the same organism. For training of the ANN,
    genes and the non-genes are assigned a probability value of
    1 and 0 respectively."

    Can anybody explain me
    what this paragraph means i.e., from where I get these 3000 non-genes

  • #2
    I think the paragraph means that: the listed protein coding genes are consider as genes and the ones listed as ORFS are consider as non-genes in the training and testing. The genes and non-genes are come from NCBI GENBANK, the method to determine gene and non-gene are based the Genome annotation. But my question it's that: there are total 1610 genes in the analysis and the first 1000 genes was used as training set, so the number of the remain genes is 1610-1000=610, and how do the authors to get another 390 genes in the testing.

    Comment


    • #3
      thanx for reply...i don't know answer of your question but will give as soon as i get...as we are trying to contact the authors of the patent..plz do reply if you get something else..
      thanx

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      48 views
      0 likes
      Last Post seqadmin  
      Working...
      X