SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help! RNA-seq data for groupwise comparison with isoform expression validation Xi Wang Bioinformatics 2 08-23-2012 08:40 AM
RNA-Seq: Isoform-level microRNA-155 target prediction using RNA-seq. Newsbot! Literature Watch 0 02-15-2011 02:00 AM
RNA-Seq: Using non-uniform read distribution models to improve isoform expression inf Newsbot! Literature Watch 0 12-21-2010 02:00 AM

Reply
 
Thread Tools
Old 04-25-2013, 11:21 AM   #61
EduEyras
Member
 
Location: Barcelona, Spain

Join Date: Dec 2012
Posts: 17
Default

Great work. We have put the pre-print of the paper in the arXiv: http://arxiv.org/abs/1304.5952

I hope it is useful.

E
EduEyras is offline   Reply With Quote
Old 05-16-2013, 01:16 AM   #62
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Is there a method to detect differentially expressed junctions? The input is junciton information, like the output of tophat, and the output is the differentially expressed junctions, like the isoform and exon.

Thank you.
pengchy is offline   Reply With Quote
Old 05-16-2013, 03:11 PM   #63
shi
Wei Shi
 
Location: Australia

Join Date: Feb 2010
Posts: 235
Default

Dear @pengchy,

A possible way to do this is just perform a differential expression analysis for junctions, in a similar way as that for performing differential expression analysis for genes (using for example edgeR and limma voom). However, to perform this kind of analysis, you will need to have the read count for each junction in each condition.

The subjunc program detects exon-exon junctions and outputs the number of supporting reads for each junction. It outputs a bed file in which the first three columns give the location information of the discovered exon-exon junctions and the fifth column gives the number of supporting reads for each junction.

If you are interested in using it, have a look at this short tutorial - http://bioinf.wehi.edu.au/subjunc

Hope this is helpful.

Best wishes,
Wei
shi is offline   Reply With Quote
Old 05-29-2013, 08:58 AM   #64
pravee1216
Member
 
Location: India

Join Date: Aug 2010
Posts: 35
Default

Based on experience, can anyone suggest a better tool for studying spliced isoform expression between two conditions (using single-end data without replicates)? Setting up Alexa-seq is very complex as there is no pre-built database for C. elegans and package was not updated since long. Please share your experience on other tools and accuracy of results.

Thanks
Raj
pravee1216 is offline   Reply With Quote
Old 05-29-2013, 10:14 AM   #65
EduEyras
Member
 
Location: Barcelona, Spain

Join Date: Dec 2012
Posts: 17
Default

Without replicates MISO works reasonably well, but you need a pre-calculated set of events. It is not difficult to build one anew, but it requires some work.

Using replicates we've seen that with |deltaPSI| > 0.25 and BF > 2, if you had replicates, that would give you a less than 1% False positive rate.

Cufflinks can also work well with just one replicate, provided that you estimate a lower bound for FPKM to know an isoform is expressed. If you don't use tophat for mapping, be careful with the data format in BAM. Also, make sure to use the option for "RABT", which quantifies known and novel isoforms.

Have a look at http://arxiv.org/abs/1304.5952 for further methods. Let me know if I can help with anything else.

Good luck

E.
EduEyras is offline   Reply With Quote
Old 05-29-2013, 01:52 PM   #66
skdhanraj
Bioinformatician
 
Location: Norway

Join Date: May 2010
Posts: 4
Default

Nice compilation. Thank you all
skdhanraj is offline   Reply With Quote
Old 05-30-2013, 10:12 AM   #67
pravee1216
Member
 
Location: India

Join Date: Aug 2010
Posts: 35
Default

Thanks, E. It's a good article.

Thanks for reminding me the RABT option. That created 400 times bigger transcripts.gtf files by cufflinks.

What is the suggested value of --min-isoform-fraction (lower bound FPKM)? By default it is 10% (0.1).

Raj
pravee1216 is offline   Reply With Quote
Old 07-07-2013, 10:11 AM   #68
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

Quote:
Originally Posted by krespim View Post
Nice list.



As far as I know, and I have been using MISO regularly, it does not give information on isoforms. It is very much "exon-centric".




I also feel the same. When I choose a tool I always look for information on validation rates, that is, where the predictions reproduced at the experimental level? It really does not matter if uses Baeysian inference or binomial distribution if the predictions are not validated in the "real data". I also take into account easy of use (very often compiling the tools is a nightmare), and whether the output understandable.

I know these practical/trivial considerations but IMO they are worth consideration.
Hi @krespim:

I had a question on MISO and posted here. Can you help me with that? Thanks! ;-)
alittleboy is offline   Reply With Quote
Old 08-27-2013, 05:21 PM   #69
elsagc
Junior Member
 
Location: East Lansing, MI, US

Join Date: Jun 2010
Posts: 1
Default

Quote:
Originally Posted by EduEyras View Post
Without replicates MISO works reasonably well, but you need a pre-calculated set of events. It is not difficult to build one anew, but it requires some work.

Using replicates we've seen that with |deltaPSI| > 0.25 and BF > 2, if you had replicates, that would give you a less than 1% False positive rate.

Cufflinks can also work well with just one replicate, provided that you estimate a lower bound for FPKM to know an isoform is expressed. If you don't use tophat for mapping, be careful with the data format in BAM. Also, make sure to use the option for "RABT", which quantifies known and novel isoforms.

Have a look at http://arxiv.org/abs/1304.5952 for further methods. Let me know if I can help with anything else.

Good luck

E.
Hi E,
I have found your article very useful. Thanks for sharing. As you mentioned before MISO requires to pre-calculate the set of events. I was wondering if you could share what tool/method do you use to calculate the splicing events?
Thanks,
Elsa
elsagc is offline   Reply With Quote
Old 10-10-2013, 09:05 AM   #70
Tomnl
Junior Member
 
Location: Leicester

Join Date: Jun 2013
Posts: 6
Default

Hi all

I have found this thread very useful. Thank you everyone!
(In particular for the comparison paper http://arxiv.org/abs/1304.5952 and post http://seqanswers.com/forums/showpos...1&postcount=60.)

I also found the papers description of the differences between tools for differential splicing and differential isoform expression very informative

Based on what I have read and what my data is (paired end, multiple samples, 2 conditions) I decided to compare BitSeq/RSEM/Cuffdiff for differential expression of isoforms.

I have also decided to compare rDiff,DiffSplice and Cuffdiff for differential splicing.

However, I am currently having a bit of trouble with rDiff, http://seqanswers.com/forums/showthr...509#post118509, has anybody tried rDiff here?

Cheers
Tom
Tomnl is offline   Reply With Quote
Old 10-24-2013, 05:08 PM   #71
Maayanster
Member
 
Location: Vancouver, BC

Join Date: Dec 2012
Posts: 30
Default

I haven't tried rDiff.


So I just looked at the RNA-seq blog, and there's like 8 new transcript assembly/quantification tools since the summer. Sigh. There's no way I'm keeping up, it's too much work to get these programs running, and so often things crap out on the last step.
http://www.rna-seqblog.com/category/...ression-tools/

Just wanted to mention that I'm still updating post #60 which has all the summaries together, since I've been running things on the same datasets in an organized manner, and still finding out new things.
Maayanster is offline   Reply With Quote
Old 11-07-2013, 02:04 PM   #72
Maayanster
Member
 
Location: Vancouver, BC

Join Date: Dec 2012
Posts: 30
Default

I've been searching for a decent set of transcript-specific qPCR validations in order to compare to these tools to ground truth for a while, and recent reading has yielded some info. Unfortunately a lot of info on these is deep in the supplementary. Here's a summary.

The ALEXA paper http://www.nature.com/nmeth/journal/...meth.1503.html
1. transcript expression
- qpcr on two colorectal cancer cell lines, MIP101 and MIP5FU. 192 amplicons in 152 genes representing various event types (skipped exons, etc.)
qpcr validation results are here: http://www.alexaplatform.org/alexa_s...ltsPackage.zip

the cuffdiff2 paper http://www.nature.com/nbt/journal/v3.../nbt.2450.html (mostly in the supplementary)
1. gene expression:
- MAQC qpcr dataset (Brain and stratagene UHR as treatment and control)

2. transcript expression:
- ALEXA q-pcr dataset
- simulated data using their own protocol

For the genes, they compare both the quantification and fold-changes between the two treatments
For the transcripts, they have plots that separate the fold-change comparisons into deciles by the level of expression.

The SailFish paper http://arxiv.org/pdf/1308.3700.pdf
1. gene expression:
- they also use the MAQC data for gene-level expression. They only use the brain data, without doing any DE.

2. transcript expression
- data simulated using flux capacitor

For both types of comparisons to the "ground truth" they use four statistics: Pearson, Spearman, RMSE, and MedPE, which evaluate different types of variations.

The nature methods paper (Steijger et al) also in the supplementary http://www.nature.com/nmeth/journal/...meth.2714.html
1. transcript expression
- a custom assay for 109 alternatively spliced genes using the NanoString nCounter in order to compare transcript quatification (no DE)

The statistic they use for comparison is Pearson correlation.

The MATS paper http://nar.oxfordjournals.org/content/40/8/e61.full
the two treatments were a human breast cancer cell line (MDA-MB-231) with ectopic expression of the epithelial-specific splicing factor ESRP1 and an empty vector (EV) control
1. transcript expression
- RT-PCR for 164 exons that are known as regulated by the ESRP1 gene

The RT-PCR data is available but one would have to ask the authors for sequenced libraries. This same data was also used in the recent rSeqDiff paper

So it seems that the MAQC dataset (which was developed for microarray validations) is only useful for gene level evaluations. This made sense when I dug into the actual data. More info on that dataset here: http://www.biostars.org/p/85219/
For transcript level evaluations we have simulations, the alexa q-pcr data, and the new nanoString validation data from the nature methods paper.

I've also been looking at the various RNA simulation methods available, but maybe I'll leave that for another thread.

I've been using the Alexa validation data myself, which is very helpful for looking at pairwise comparisons, but not useful for comparing groups of libraries.

Two notes from the blogosphere:
1. people might like to take a look****storm on Lior Pachter's blog over GTEx's isoform analysis choices. http://liorpachter.wordpress.com/201...of-their-data/
2. Article on Getting Genetics Done blog featuring eXpress. It's also copied on the RNA-seq blog. http://gettinggeneticsdone.blogspot....h-express.html

Last edited by Maayanster; 12-04-2013 at 03:49 PM.
Maayanster is offline   Reply With Quote
Old 11-13-2013, 04:23 AM   #73
magnusr
Junior Member
 
Location: Manchester

Join Date: Sep 2012
Posts: 1
Default

Hi, this is a useful overview, thanks for putting it together.

I'm one of the BitSeq authors, you mentioned you have a bug in stage 2. Peter Glaus (main BitSeq developer) doesn't have a record of a bug report. I wonder if you could email myself or Peter with details.

Best wishes,
Magnus
magnusr is offline   Reply With Quote
Old 11-13-2013, 02:58 PM   #74
Maayanster
Member
 
Location: Vancouver, BC

Join Date: Dec 2012
Posts: 30
Default

Hi Magnus,
Thanks for this! I did find the email. I will re-send it.
Maayanster is offline   Reply With Quote
Old 12-16-2013, 03:44 PM   #75
Maayanster
Member
 
Location: Vancouver, BC

Join Date: Dec 2012
Posts: 30
Default

I'm posting some data from various tools compared to wet lab validated data. There are 3 datasets for 3 different types of analysis:
  • Pairwise DE - ALEXA seq data. In house libraries for two cell lines: MIP101 and MIP5FU
    validation data: q-pcr from Griffith et al. (in Alexa-seq paper). Expression values for each library, as well as fold changes provided. http://www.nature.com/nmeth/journal/...meth.1503.html
  • Group DE - 6 in house libraries from MAGIC project group3 and group4, (3 libs each each)
    validation data: Microarray expression results from first MAGIC paper (Northcott et al 2010). http://jco.ascopubs.org/content/earl....4324.abstract This data is a list of differentially expressed genes between groups 3 and 4, not a quantification on the isoform level. It doesn't provide a real "ground truth" set, but rather just a subset of genes and transcripts that may be biologically interesting to look at.

The first two datasets have wet-lab experimental transcript expression values. The third dataset doesn't have actual validated transcript expression values to compare to, so I just did pairwise comparisons between the tools for a subset of transcripts from interesting genes.

The results are in this google document. I didn't spend any time making it look nice, so you just need to zoom in to see the plots.
https://docs.google.com/document/d/1...it?usp=sharing

For single-library quantification, SailFish wins since it's the fastest and is qualitatively equal to the next best option, eXpress. For pairwise DE it's less clear. For group DE we can't really draw any conclusions because there's no ground truth.

** note: cufflinks was sometimes run twice, with different alignments: "cuff gsc" means cufflinks was run on in house spliced alignents, "cuff tophat" means cufflinks was run on tophat alignments.
Maayanster is offline   Reply With Quote
Reply

Tags
expression, isoform, rna seq, tools, transcript

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:54 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO