Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple genes, one fpkm value

    Hi everyone,

    My apologies if this question has been asked before, but my searches on the forum came up with nothing.

    I got sequencing results back from mouse RNA (75 bp PE, 50 mln reads) and then ran it through a Tophat-Cufflinks-Cuffmerge-Cuffdiff pipeline. This results in a list with differentially transcribed genes. So far, so good.

    However, some genes seem to have been 'combined' somewhere along the pipeline, that it is there are multiple gene symbols on one line but only one fpkm value, chromosomal location, etc. Below are two examples:

    XLOC_002706 XLOC_002706 Neurod4,Vmn2r84,Vmn2r85,Vmn2r86,Vmn2r87 chr10:130268058-130542669 SC4 SC2 OK 889.112 524.056 -0.762645 -256.757 5,00E-05 0.00061898 yes

    XLOC_003158 XLOC_003158 2410006H16Rik,Snord49a,Snord49b chr11:62601222-62670908 SC4 SC2 OK 327.662 373.866 0.19031 0.453688 0.3605 0.742145 no

    As you can see, this can be found both when there is or is no significant change.

    Has anyone had this problem before? If so, what is the best solution? Should the reads be trimmed to avoid them overlapping multiple genes and if so, how much trimming is recommended?

    Many thanks for your input.

  • #2
    These are indeed most probably overlapping genes for which the read counter can't determine to which feature these reads belong... There is likely no perfect solution.

    Comment


    • #3
      ^As mentioned above. They are overlapping genes, I would suggest taking a look at them through NCBI IGV to get a better idea on how things are.

      Also check to see that it might be Isoforms if you have are trying to find novel transcripts of the genes.

      There are some programs like MISO and NCBI IUTA that might help with Isoform detection.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X