Seqanswers Leaderboard Ad

**blancha** · 07-04-2014, 10:38 AM

These are vast questions.
I don't have time to answer them fully, but here are some tips, which I hope you will find helpful.

If you're willing to use some R commands, you might want to try CummeRbund for the downstream analysis.

CummeRbund - An R package for persistent storage, analysis, and visualization of RNA-Seq from cufflinks output

http://compbio.mit.edu/cummeRbund/manual_2_0.html

Open source tools for exploration, analysis and visualization of high-throughput RNA-Seq data

It's not the greatest software, but it does it make it easier to extract more information out of all the data.

I'm not sure why you want to remove isoforms with an FPKM out of 0. An FPKM of 0 means that the isoform is either not expressed, or so lowly expressed that it cannot be detected at this sequencing depth. This is useful information, so I would not remove it.

**thickrick99** · 07-04-2014, 11:01 AM

Thanks blancha for your advice especially on the FPKM values! Yeah it makes sense to keep them since I can identify genes that are not expressed.

Sorry for the really broad questions. Essentially I just needed some advice on what to do next.

One question which I believe I mentioned above was using cufflinks with and without the reference. How can I view the novel transcripts that cufflinks found without the reference in de novo mode compared to the output file using the reference?

Lastly, does anyone know a good way to identify SNV's from the data? I wasn't sure how to approach this either. Thanks!

**blancha** · 07-04-2014, 11:12 AM

For the SNP calling, I would recommend reading the Broad Institute Best Practices Workflow.

Just a moment...

http://gatkforums.broadinstitute.org/discussion/3891/calling-variants-in-rnaseq

**thickrick99** · 07-04-2014, 11:17 AM

Alright Cool! Yeah I heard that GATK is useful in SNP calling so I will definitely read through the protocol.

Thanks Again!

**blancha** · 07-05-2014, 03:36 AM

There are also several ways of analyzing the biological significance of the data.

goseq: R package to do gene ontology analysis. Corrects for length bias in RNA-Seq. Cumbersome to use. Default output not complete, e.g. ontology terms but not the genes inputted that are associated with the terms.

DAVID: Very easy to use. Biologists can do it. Does not correct for length bias. Algorithm rather mysterious. Interactive and informative output. Very easy to play with.

GSEA: Different algorithm. Can pick gene sets. Criteria must be chosen to rank genes however. There is no perfect ranking. Ranking by fold changes or adjusted p-values both have their disadvantages.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

What do I do with output files from tophat/cufflinks

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News