Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What do I do with output files from tophat/cufflinks

    Hi I am a beginner with RNA-sequencing and I used tophat to align RNA-seq reads from geuvadis to hg19 from UCSC. In tophat, I provided the reference transcript and then used the accepted_hits.bam file from the output as the input file for cufflinks.

    I tested cufflinks with both the reference and without the reference transcripts and have the outputs for both of them. So now I am stuck... What exactly can I do now. I mean I have the isoforms and gene fpkm files with the values but how should I approach analyzing them in general? I am not doing a project but just want to know about the different processes I can do with these files as well as the transcripts.gtf file.

    Also, what does an FPKM value of 0 mean? I know some other forums mentioned about this meaning that none of the reads mapped to the reference so I created a simple script to filter all of these values out of the isoforms.fpkm_tracking file. is this ok?

    Lastly, what can I do to compare both the isoforms/transcripts files from cufflinks with and without the reference annotation?

    Thank you so much for the help in advance!!!

    -Charlie

  • #2
    These are vast questions.
    I don't have time to answer them fully, but here are some tips, which I hope you will find helpful.

    If you're willing to use some R commands, you might want to try CummeRbund for the downstream analysis.

    It's not the greatest software, but it does it make it easier to extract more information out of all the data.

    I'm not sure why you want to remove isoforms with an FPKM out of 0. An FPKM of 0 means that the isoform is either not expressed, or so lowly expressed that it cannot be detected at this sequencing depth. This is useful information, so I would not remove it.

    Comment


    • #3
      Thanks blancha for your advice especially on the FPKM values! Yeah it makes sense to keep them since I can identify genes that are not expressed.

      Sorry for the really broad questions. Essentially I just needed some advice on what to do next.

      One question which I believe I mentioned above was using cufflinks with and without the reference. How can I view the novel transcripts that cufflinks found without the reference in de novo mode compared to the output file using the reference?

      Lastly, does anyone know a good way to identify SNV's from the data? I wasn't sure how to approach this either. Thanks!

      Comment


      • #4
        For the SNP calling, I would recommend reading the Broad Institute Best Practices Workflow.

        Comment


        • #5
          Alright Cool! Yeah I heard that GATK is useful in SNP calling so I will definitely read through the protocol.

          Thanks Again!

          Comment


          • #6
            There are also several ways of analyzing the biological significance of the data.

            goseq: R package to do gene ontology analysis. Corrects for length bias in RNA-Seq. Cumbersome to use. Default output not complete, e.g. ontology terms but not the genes inputted that are associated with the terms.

            DAVID: Very easy to use. Biologists can do it. Does not correct for length bias. Algorithm rather mysterious. Interactive and informative output. Very easy to play with.

            GSEA: Different algorithm. Can pick gene sets. Criteria must be chosen to rank genes however. There is no perfect ranking. Ranking by fold changes or adjusted p-values both have their disadvantages.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            48 views
            0 likes
            Last Post seqadmin  
            Working...
            X