Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What do I do with output files from tophat/cufflinks

    Hi I am a beginner with RNA-sequencing and I used tophat to align RNA-seq reads from geuvadis to hg19 from UCSC. In tophat, I provided the reference transcript and then used the accepted_hits.bam file from the output as the input file for cufflinks.

    I tested cufflinks with both the reference and without the reference transcripts and have the outputs for both of them. So now I am stuck... What exactly can I do now. I mean I have the isoforms and gene fpkm files with the values but how should I approach analyzing them in general? I am not doing a project but just want to know about the different processes I can do with these files as well as the transcripts.gtf file.

    Also, what does an FPKM value of 0 mean? I know some other forums mentioned about this meaning that none of the reads mapped to the reference so I created a simple script to filter all of these values out of the isoforms.fpkm_tracking file. is this ok?

    Lastly, what can I do to compare both the isoforms/transcripts files from cufflinks with and without the reference annotation?

    Thank you so much for the help in advance!!!

    -Charlie

  • #2
    These are vast questions.
    I don't have time to answer them fully, but here are some tips, which I hope you will find helpful.

    If you're willing to use some R commands, you might want to try CummeRbund for the downstream analysis.

    It's not the greatest software, but it does it make it easier to extract more information out of all the data.

    I'm not sure why you want to remove isoforms with an FPKM out of 0. An FPKM of 0 means that the isoform is either not expressed, or so lowly expressed that it cannot be detected at this sequencing depth. This is useful information, so I would not remove it.

    Comment


    • #3
      Thanks blancha for your advice especially on the FPKM values! Yeah it makes sense to keep them since I can identify genes that are not expressed.

      Sorry for the really broad questions. Essentially I just needed some advice on what to do next.

      One question which I believe I mentioned above was using cufflinks with and without the reference. How can I view the novel transcripts that cufflinks found without the reference in de novo mode compared to the output file using the reference?

      Lastly, does anyone know a good way to identify SNV's from the data? I wasn't sure how to approach this either. Thanks!

      Comment


      • #4
        For the SNP calling, I would recommend reading the Broad Institute Best Practices Workflow.

        Comment


        • #5
          Alright Cool! Yeah I heard that GATK is useful in SNP calling so I will definitely read through the protocol.

          Thanks Again!

          Comment


          • #6
            There are also several ways of analyzing the biological significance of the data.

            goseq: R package to do gene ontology analysis. Corrects for length bias in RNA-Seq. Cumbersome to use. Default output not complete, e.g. ontology terms but not the genes inputted that are associated with the terms.

            DAVID: Very easy to use. Biologists can do it. Does not correct for length bias. Algorithm rather mysterious. Interactive and informative output. Very easy to play with.

            GSEA: Different algorithm. Can pick gene sets. Criteria must be chosen to rank genes however. There is no perfect ranking. Ranking by fold changes or adjusted p-values both have their disadvantages.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X