SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
GTF reference files that work with TopHat/Cufflinks marcora Bioinformatics 23 01-14-2014 11:10 PM
how to compare tophat output files with and without "_random" sequences EA01 Illumina/Solexa 2 06-21-2013 12:05 AM
Using TopHat output files with UCSC genome browser statsteam Bioinformatics 7 05-16-2011 06:09 PM
Question to combine Bowtie output with Tophat's -- impact on Cufflinks FPKM values berath Bioinformatics 0 04-21-2011 08:38 AM
Cufflinks crashes on BAM output from TopHat Sherry Bioinformatics 0 02-07-2011 07:04 AM

Reply
 
Thread Tools
Old 07-04-2014, 08:30 AM   #1
thickrick99
Member
 
Location: Washington

Join Date: Jul 2014
Posts: 21
Question What do I do with output files from tophat/cufflinks

Hi I am a beginner with RNA-sequencing and I used tophat to align RNA-seq reads from geuvadis to hg19 from UCSC. In tophat, I provided the reference transcript and then used the accepted_hits.bam file from the output as the input file for cufflinks.

I tested cufflinks with both the reference and without the reference transcripts and have the outputs for both of them. So now I am stuck... What exactly can I do now. I mean I have the isoforms and gene fpkm files with the values but how should I approach analyzing them in general? I am not doing a project but just want to know about the different processes I can do with these files as well as the transcripts.gtf file.

Also, what does an FPKM value of 0 mean? I know some other forums mentioned about this meaning that none of the reads mapped to the reference so I created a simple script to filter all of these values out of the isoforms.fpkm_tracking file. is this ok?

Lastly, what can I do to compare both the isoforms/transcripts files from cufflinks with and without the reference annotation?

Thank you so much for the help in advance!!!

-Charlie
thickrick99 is offline   Reply With Quote
Old 07-04-2014, 10:38 AM   #2
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

These are vast questions.
I don't have time to answer them fully, but here are some tips, which I hope you will find helpful.

If you're willing to use some R commands, you might want to try CummeRbund for the downstream analysis.
http://compbio.mit.edu/cummeRbund/manual_2_0.html
It's not the greatest software, but it does it make it easier to extract more information out of all the data.

I'm not sure why you want to remove isoforms with an FPKM out of 0. An FPKM of 0 means that the isoform is either not expressed, or so lowly expressed that it cannot be detected at this sequencing depth. This is useful information, so I would not remove it.
blancha is offline   Reply With Quote
Old 07-04-2014, 11:01 AM   #3
thickrick99
Member
 
Location: Washington

Join Date: Jul 2014
Posts: 21
Default

Thanks blancha for your advice especially on the FPKM values! Yeah it makes sense to keep them since I can identify genes that are not expressed.

Sorry for the really broad questions. Essentially I just needed some advice on what to do next.

One question which I believe I mentioned above was using cufflinks with and without the reference. How can I view the novel transcripts that cufflinks found without the reference in de novo mode compared to the output file using the reference?

Lastly, does anyone know a good way to identify SNV's from the data? I wasn't sure how to approach this either. Thanks!
thickrick99 is offline   Reply With Quote
Old 07-04-2014, 11:12 AM   #4
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

For the SNP calling, I would recommend reading the Broad Institute Best Practices Workflow.

http://gatkforums.broadinstitute.org...ants-in-rnaseq
blancha is offline   Reply With Quote
Old 07-04-2014, 11:17 AM   #5
thickrick99
Member
 
Location: Washington

Join Date: Jul 2014
Posts: 21
Default

Alright Cool! Yeah I heard that GATK is useful in SNP calling so I will definitely read through the protocol.

Thanks Again!
thickrick99 is offline   Reply With Quote
Old 07-05-2014, 03:36 AM   #6
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

There are also several ways of analyzing the biological significance of the data.

goseq: R package to do gene ontology analysis. Corrects for length bias in RNA-Seq. Cumbersome to use. Default output not complete, e.g. ontology terms but not the genes inputted that are associated with the terms.

DAVID: Very easy to use. Biologists can do it. Does not correct for length bias. Algorithm rather mysterious. Interactive and informative output. Very easy to play with.

GSEA: Different algorithm. Can pick gene sets. Criteria must be chosen to rank genes however. There is no perfect ranking. Ranking by fold changes or adjusted p-values both have their disadvantages.
blancha is offline   Reply With Quote
Reply

Tags
cufflink, fpkm, rna-seq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:22 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO