![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RNA-Seq: Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified | Newsbot! | Literature Watch | 3 | 07-31-2011 08:08 PM |
RNA seq analysis with one replicate per biological sample | anle | Bioinformatics | 2 | 06-03-2011 11:16 AM |
RNA-Seq: Transcript amplification from single bacterium for transcriptome analysis. | Newsbot! | Literature Watch | 0 | 05-04-2011 03:50 AM |
Strand SI introduces Avadis NGS. NGS analysis for the rest of us! | Strand SI | Vendor Forum | 0 | 02-14-2011 11:19 AM |
Sample prep for RNA-seq from low abundance RNA | chadn737 | RNA Sequencing | 7 | 06-15-2010 07:49 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Pittsburgh Join Date: Feb 2010
Posts: 151
|
![]()
I have 5 time points of RNA-seq data which is paired end. In addition I have two samples (single end) one of which is replicate of 5 time points and One is separate additional time point. (I know it is an unusual situation- I have two samples in single read rest PE- long story short part of study was done by other grp). tMy questions are:
1. Should I keep separate PE and single reads and analyze separately, in that case how I will combine one of them as replicate. 2. Will that be OK if I combine both types of reads in that case will not I loose information if I follow single end. 3. Is there any rule which tool may be best to analyze such data, I was planning to use TopHat> Cufflink. Alternatively, I can think about DEseq. My aim is to find differential expression transcripts an displacing events. Thanks for your help and attention. ![]() |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Vienna Join Date: Mar 2010
Posts: 107
|
![]()
you could try:
1) map (e.g. with RUM) 2) count reads with HTseq-count 3) analyse with R-packages limma/edgeR (voom()-function) and an appropriate contrast matrix |
![]() |
![]() |
![]() |
#3 |
I like code
Location: San Diego, CA, USA Join Date: Sep 2009
Posts: 438
|
![]()
I've had to run some comparisons between single-end and paired-end data myself. Me and the researchers came to the conclusion that regardless of the sequencing method we should align each sample in whatever way we can to get the most complete set of alignments for that sample. This is because in each case the kit is designed to produce some type of reads and we should align them in the way it was intended starting from the kit used to prepare the samples.
The touchy thing would probably come down to normalization between samples and I think tools like DESeq and edgeR do a good job of that. So my version of the above post would be: 1) align with tophat 2) count reads with htseq-count 3) perform pairwise de tests with DESeq 4) ...play with results of pairwise tests... |
![]() |
![]() |
![]() |
#4 |
Genome Informatics Facility
Location: Iowa @isugif Join Date: Sep 2009
Posts: 105
|
![]()
I have not seen much in the literature the explicitly states the best protocols for RNA-Seq analysis using PE reads let alone mixing PE and SE.
So here are the ideas to consider. PE and SE reads are taken from one fragment in your library. If the idea is to count the number of fragments that are suppose to be represented of the RNA in your sample then after you do a paired end alignment the PE read counts for a particular feature (gene exon etc) should be divided by 2. The other option would be to take only the forward read of the PE and run your analysis so everything is equal. I would liken this method to taking the first 36 bases of a 50 base read so that it matches the read length of other libraries you have. You will lose information but libraries will be the same (SE, read length). If you align the PE you will get more uniquely mappable alignments. So there will be some bias in the SE reads mapping to more locations. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Charlottesville, VA Join Date: May 2011
Posts: 112
|
![]()
Thanks Andrew. So, the next logical question - what are the methods that we're all using every day (HTSeq-count + DESeq, DEXSeq, BayesSeq, EdgeR, Cuffdiff) doing about counting with paired-end reads?
|
![]() |
![]() |
![]() |
#6 |
Genome Informatics Facility
Location: Iowa @isugif Join Date: Sep 2009
Posts: 105
|
![]()
BWA manual appears to suggest it maps only PE reads that are concordant (map in the proper orientation and are within the specified fragment size distance from each other)
GSNAP accounts for all the possible combinations of alignments and outputs half PE reads where only 1 read maps and the other does not. These can be output into separate files but I haven't seen any methods for combining them. In other words the question of how to normalize and count PE aligned reads is important in performing RNA-Seq differential gene expression analysis. |
![]() |
![]() |
![]() |
#7 |
Genome Informatics Facility
Location: Iowa @isugif Join Date: Sep 2009
Posts: 105
|
![]()
I was thinking about this a little more and this doubling could lead to false positives too if not corrected for.
Consider the following made up example: condition1 condition2 gene1 5 9 corrected for PE double count gene1 10 18 uncorrected for PE double count For low read counts the doubling could result in the appearance of a more significant difference than actually exists unless I am missing something fundamental here not to mention the havoc it will likely have on over dispersion between biological replicates. Thoughts anyone? |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
You could use my htsep-count script, which does not double-count paired-end reads. Rather, a read pair is counted once for a gene, if both ends map to the same gene, and is discarded otherwise.
Furthermore, have a look the vignette of DESeq, where we present an example of a mixed paired-end/single-end data set. However, if you have confounding between treatment and library type, you should better discard the second mates to avoid bias. |
![]() |
![]() |
![]() |
#9 |
Member
Location: Dresden Join Date: Jul 2012
Posts: 49
|
![]()
I have a similar problem as the OP. I've downloaded RNA-seq data (Hs) from a published paper and to my surprise the control condition is PE and the 2 conditions are SE. No replicates.
They've used this data to determine differential exon usage between control vs condition 1 and control vs condition 2 using a "home-brewed" analysis after mapping. To answer the particular biological problem I am interested in cufflinks (and cuffdiff) is ideal because it should allow me to extract exactly the information I am interested. The question is: is it even possible to use cufflinks (+cuffdiff) to analyse control (PE) vs condition (SE)? |
![]() |
![]() |
![]() |
#10 | |
Member
Location: SF Bay Area Join Date: Feb 2012
Posts: 62
|
![]() Quote:
I am quite interested in the effect of changing the counting scheme on gene expression estimates. It seems to me that with paired-ends, you need to be counting fragments, not reads, and you need to be normalizing for fragment length as well. Shouldn't a transcript that gets chopped into 2x250bp fragments should count the same as a transcript that gets chopped into 5x100bp fragments? |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|