Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Newsbot! Literature Watch 3 07-31-2011 08:08 PM
RNA seq analysis with one replicate per biological sample anle Bioinformatics 2 06-03-2011 11:16 AM
RNA-Seq: Transcript amplification from single bacterium for transcriptome analysis. Newsbot! Literature Watch 0 05-04-2011 03:50 AM
Strand SI introduces Avadis NGS. NGS analysis for the rest of us! Strand SI Vendor Forum 0 02-14-2011 11:19 AM
Sample prep for RNA-seq from low abundance RNA chadn737 RNA Sequencing 7 06-15-2010 07:49 AM

Thread Tools
Old 04-25-2012, 10:48 PM   #1
Senior Member
Location: Pittsburgh

Join Date: Feb 2010
Posts: 151
Default RNA-seq two sample single and rest PE analysis

I have 5 time points of RNA-seq data which is paired end. In addition I have two samples (single end) one of which is replicate of 5 time points and One is separate additional time point. (I know it is an unusual situation- I have two samples in single read rest PE- long story short part of study was done by other grp). tMy questions are:
1. Should I keep separate PE and single reads and analyze separately, in that case how I will combine one of them as replicate.
2. Will that be OK if I combine both types of reads in that case will not I loose information if I follow single end.
3. Is there any rule which tool may be best to analyze such data, I was planning to use TopHat> Cufflink. Alternatively, I can think about DEseq.
My aim is to find differential expression transcripts an displacing events.

Thanks for your help and attention.
honey is offline   Reply With Quote
Old 04-25-2012, 11:12 PM   #2
Senior Member
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default one option

you could try:
1) map (e.g. with RUM)
2) count reads with HTseq-count
3) analyse with R-packages limma/edgeR (voom()-function) and an appropriate contrast matrix
dietmar13 is offline   Reply With Quote
Old 04-26-2012, 12:55 PM   #3
I like code
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438

I've had to run some comparisons between single-end and paired-end data myself. Me and the researchers came to the conclusion that regardless of the sequencing method we should align each sample in whatever way we can to get the most complete set of alignments for that sample. This is because in each case the kit is designed to produce some type of reads and we should align them in the way it was intended starting from the kit used to prepare the samples.

The touchy thing would probably come down to normalization between samples and I think tools like DESeq and edgeR do a good job of that.

So my version of the above post would be:

1) align with tophat
2) count reads with htseq-count
3) perform pairwise de tests with DESeq
4) with results of pairwise tests...
sdriscoll is offline   Reply With Quote
Old 04-27-2012, 05:00 AM   #4
Genome Informatics Facility
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default normalization PE and SE

I have not seen much in the literature the explicitly states the best protocols for RNA-Seq analysis using PE reads let alone mixing PE and SE.

So here are the ideas to consider.

PE and SE reads are taken from one fragment in your library. If the idea is to count the number of fragments that are suppose to be represented of the RNA in your sample then after you do a paired end alignment the PE read counts for a particular feature (gene exon etc) should be divided by 2.

The other option would be to take only the forward read of the PE and run your analysis so everything is equal. I would liken this method to taking the first 36 bases of a 50 base read so that it matches the read length of other libraries you have. You will lose information but libraries will be the same (SE, read length).

If you align the PE you will get more uniquely mappable alignments. So there will be some bias in the SE reads mapping to more locations.
severin is offline   Reply With Quote
Old 04-27-2012, 05:52 AM   #5
Senior Member
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112

Thanks Andrew. So, the next logical question - what are the methods that we're all using every day (HTSeq-count + DESeq, DEXSeq, BayesSeq, EdgeR, Cuffdiff) doing about counting with paired-end reads?
turnersd is offline   Reply With Quote
Old 04-27-2012, 05:59 AM   #6
Genome Informatics Facility
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default bwa

BWA manual appears to suggest it maps only PE reads that are concordant (map in the proper orientation and are within the specified fragment size distance from each other)

GSNAP accounts for all the possible combinations of alignments and outputs half PE reads where only 1 read maps and the other does not. These can be output into separate files but I haven't seen any methods for combining them.

In other words the question of how to normalize and count PE aligned reads is important in performing RNA-Seq differential gene expression analysis.
severin is offline   Reply With Quote
Old 04-27-2012, 08:08 AM   #7
Genome Informatics Facility
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default Potential source of false positives?

I was thinking about this a little more and this doubling could lead to false positives too if not corrected for.

Consider the following made up example:
condition1 condition2
gene1 5 9 corrected for PE double count
gene1 10 18 uncorrected for PE double count

For low read counts the doubling could result in the appearance of a more significant difference than actually exists unless I am missing something fundamental here not to mention the havoc it will likely have on over dispersion between biological replicates.

Thoughts anyone?
severin is offline   Reply With Quote
Old 04-29-2012, 04:47 AM   #8
Simon Anders
Senior Member
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994

You could use my htsep-count script, which does not double-count paired-end reads. Rather, a read pair is counted once for a gene, if both ends map to the same gene, and is discarded otherwise.

Furthermore, have a look the vignette of DESeq, where we present an example of a mixed paired-end/single-end data set.

However, if you have confounding between treatment and library type, you should better discard the second mates to avoid bias.
Simon Anders is offline   Reply With Quote
Old 08-02-2012, 01:00 PM   #9
Location: Dresden

Join Date: Jul 2012
Posts: 49
Default Similar problem

I have a similar problem as the OP. I've downloaded RNA-seq data (Hs) from a published paper and to my surprise the control condition is PE and the 2 conditions are SE. No replicates.

They've used this data to determine differential exon usage between control vs condition 1 and control vs condition 2 using a "home-brewed" analysis after mapping. To answer the particular biological problem I am interested in cufflinks (and cuffdiff) is ideal because it should allow me to extract exactly the information I am interested.

The question is: is it even possible to use cufflinks (+cuffdiff) to analyse control (PE) vs condition (SE)?
krespim is offline   Reply With Quote
Old 08-03-2012, 06:26 AM   #10
Location: SF Bay Area

Join Date: Feb 2012
Posts: 62

Originally Posted by turnersd View Post
Thanks Andrew. So, the next logical question - what are the methods that we're all using every day (HTSeq-count + DESeq, DEXSeq, BayesSeq, EdgeR, Cuffdiff) doing about counting with paired-end reads?
I'd like to second this question. Even delving deep into the documentation for some of these doesn't find an answer.

I am quite interested in the effect of changing the counting scheme on gene expression estimates. It seems to me that with paired-ends, you need to be counting fragments, not reads, and you need to be normalizing for fragment length as well. Shouldn't a transcript that gets chopped into 2x250bp fragments should count the same as a transcript that gets chopped into 5x100bp fragments?
jparsons is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 10:50 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO