SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
mapping strand-specific RNA-seq reads using TopHat caswater Bioinformatics 2 03-17-2015 08:27 AM
Does Strand-specific RNA-seq mean reads from one genome strand? shangzhong0619 RNA Sequencing 4 06-16-2014 12:56 AM
Bowtie Mapping Wrong Strand Dario1984 Bioinformatics 1 02-01-2013 04:05 AM
No expression? or wrong mapping? sarahthefool RNA Sequencing 0 10-26-2011 07:00 AM
cufflinks transcript on wrong strand ornam Bioinformatics 2 12-13-2010 12:33 PM

Reply
 
Thread Tools
Old 07-23-2014, 11:19 AM   #1
rdsqc22
Junior Member
 
Location: Rochester

Join Date: Nov 2013
Posts: 7
Default My reads are mapping to the wrong strand?

Hello,

So, I'm doing some downstream analysis on a published RNA-seq data set for yeast: http://downloads.yeastgenome.org/pub...3390610/fastq/

However, after I mapped them to the yeast genome, I noticed using Samtools that, oddly enough, far, far more reads were mapping to the complement of genes, than to the genes themselves. That is, if a gene was on the (+) strand, between nucleotide 500-1000 (for example), I would find that for most of the genes, far more RNA-seq reads would map to that location on the (-) strand than the (+) strand. I found that only ~800 genes would map in a 'canonical' fashion, that is, having more reads than the complementary region, while ~5800 would map in a non-canonical way, where there were more reads complementary to a gene than within the gene.

I tested the script I wrote to make these measurements among other RNA-seq datasets, and did not find the same thing. What could be wrong with my yeast dataset?

I have performed alignment with both SHRiMP and Tophat- both programs gave the same numbers. Changing the library type on Tophat did not affect the outcome.

Thanks for any help!
rdsqc22 is offline   Reply With Quote
Old 07-23-2014, 12:03 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Try reverse-complimenting the reads prior to mapping.

....just kidding! Actually, assuming you are using a stranded protocol, the strand reads map to is NOT affected by the library type flag you give Tophat. That only affects downstream processing using Cufflinks/Cuffdif. One of the library types is supposed to have read1 mapping to the 'wrong' strand.

On the other hand, if your protocol was unstranded, it doesn't matter either way. The strand bias in that case is probably an artifact of the number of PCR cycles or some kind of 3'/5' binding affinity difference (just a guess).

Last edited by Brian Bushnell; 07-23-2014 at 12:11 PM.
Brian Bushnell is offline   Reply With Quote
Old 07-23-2014, 01:33 PM   #3
rdsqc22
Junior Member
 
Location: Rochester

Join Date: Nov 2013
Posts: 7
Default

Quote:
Originally Posted by Brian Bushnell View Post
Actually, assuming you are using a stranded protocol, the strand reads map to is NOT affected by the library type flag you give Tophat. That only affects downstream processing using Cufflinks/Cuffdif. One of the library types is supposed to have read1 mapping to the 'wrong' strand.

On the other hand, if your protocol was unstranded, it doesn't matter either way. The strand bias in that case is probably an artifact of the number of PCR cycles or some kind of 3'/5' binding affinity difference (just a guess).
This makes sense- however, according to the manufacturer documentation for the sequencing platform (Illumina GA IIx) it claims to be strand-specific. So, I would have to look into the Cuffdiff results to see if I am indeed seeing most of my reads discarded for most genes, or mostly looked at, or all considered regardless of strand?

Based on my understanding, for stranded data, firststrand means that the read that comes out is equivalent to the original mRNA, and therefore will map to the opposite strand from the gene's location (as I am seeing in my data), whereas secondstrand means that the complement to the cDNA is sequenced, and the read is equivalent to the original gene, which is where it maps on the genome.

Would I be correct, then, to think that this data is probably a firststrand library, which will be clear once I run cuffdiff (on the data I generated from aligning with the firststrand argument in tophat)?
rdsqc22 is offline   Reply With Quote
Old 07-23-2014, 01:37 PM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

I don't do library prep, but my understanding is that machines are not inherently strand-specific; rather, some machines offer the possibility of using a strand-specific protocol. That does not ensure that your specific library was, in fact, sequenced using a strand-specific protocol; you'd have to check with the people who made it.

Just to be clear, is your data single-ended or paired?
Brian Bushnell is offline   Reply With Quote
Old 07-23-2014, 01:42 PM   #5
rdsqc22
Junior Member
 
Location: Rochester

Join Date: Nov 2013
Posts: 7
Default

It is single-end.
I just checked the protocol accompanying the data- it confirms that the reads are indeed strand-specific.

By the way, thank you so much for all of your help so far.
rdsqc22 is offline   Reply With Quote
Old 07-23-2014, 02:43 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by rdsqc22 View Post
It is single-end.
I just checked the protocol accompanying the data- it confirms that the reads are indeed strand-specific.

By the way, thank you so much for all of your help so far.
You're welcome. As for 'firststrand' vs 'secondstrand', the documentation in Tophat is confusing, but I eventually concluded that for firststrand, read1 gets the sam tag "XS:A:+" if it maps to the plus strand and "XS:A:-" if it maps to the minus strand. This gives results concordant with Tophat, anyway, so I consider it empirically correct. According to the Tophat manual:

Quote:
Note the use of the custom tag XS. This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from.
So... with 'firststrand', a plus-mapped read will get "XS:A:+", which by my reading indicates that its template RNA was minus strand, which indicates the gene is on the plus strand. But the description is vague so I'm not sure.
Brian Bushnell is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO