Seqanswers Leaderboard Ad

**dpryan** · 04-01-2014, 08:34 AM

Can you post a reproducible example? The simplest way to do this would be to:

Code:

samtools sort -@ 4 namesorted_sample.bam sorted_sample
samtools index sorted_sample.bam
samtools view sorted_sample.bam 19:53864080-53867080 > subset.sam

and then post "subset.sam" somewhere (an attachment here, copy.com, dropbox, where ever). That should correspond to exonic_bin number 19 (at least it does on GRCm38.71). Then we can look at exactly why things are wonky.

**areyes** · 04-01-2014, 11:44 PM

Hi kajot,

I have the feeling that could be related to the protocol that you are using. In the most recent protocol of the illumina runs (at least the latest used by the sequencing facility at EMBL), you get strand specific data, but the reads map opposite to the strand where they come from. For example, for a gene that is in the forward strand, you will get all the reads mapping to the reverse strand. This is a it confusing!

So, what I recommend is that in your IGV browser you could colour your reads depending on the strand where they map to, and then see if what I describe above is the case. If so, I think the parameter "-s reverse" parameter of the dexseq_count.py script should do the trick.

Alejandro

**kmcarr** · 04-02-2014, 06:56 AM

Originally posted by areyes View Post

...For example, for a gene that is in the forward strand, you will get all the reads mapping to the reverse strand. This is a [b]it confusing!

So, what I recommend is that in your IGV browser you could colour your reads depending on the strand where they map to, and then see if what I describe above is the case. If so, I think the parameter "-s reverse" parameter of the dexseq_count.py script should do the trick.

Alejandro

More accurately, for paired end reads, read 1 will map to the anti-sense strand while read 2 maps to the sense strand, so in this case you will get reads mapping to both strands. You need to distinguish both mapping strand and read (1 or 2). The attached image from IGV shows mapping of TruSeq Stranded, paired end reads. In this case the gene is transcribed from the forward strand of the reference (left to right). Reads are colored by strand, blue == minus, red == plus (in this highly zoomed in view the reads glyphs also have arrows indicating their direction). The alignments are then grouped by "first-in-pair". The top group is read 1 and the bottom group read 2.

Alejandro is correct that for TruSeq Stranded Kit libraries you need to specify "-s reverse" for dexseq_count.py or htseq-count if you are counting reads to genes for DESeq(2)

Attached Files

igv_panel.png (18.1 KB, 67 views)

**kajot** · 04-02-2014, 08:21 AM

Thank you for all your rapid replies! I was swamped today with regular wet-lab work so I had little time to check all your suggestions, I will try to re-run counting script with -p reverse tomorrow morning.

I only managed to get the subset of reads mapping to aforementioned exon 14 of Rbm20. This was the exon that was covered by multiple reads in IGV and received only around 50 counts in DEXseq.

The subset for exon 14 is here:

Dropbox - Error - Simplify your life

https://www.dropbox.com/s/simbcsdzyvk8js0/subset.sam

I looked at IGV and displayed reads joined together when they are a pair. What I can see is that for Rbm20 which is on + strand, I have read 1 mapped to - strand, and read 2 mapped to + strand, pair orientation F2R1. As I understand this means I have the "Illumina protocol bug" and have to run counting with -s reverse option ?

--------------------

Update: I just ran the counting script with -s reverse and it all seems to work. I have now almost 6 million reads that are empty, 1.5 milion ambigous and the rest (around 72.5 million) counted. Since I specified fr-firststrand in Tophat, isn't it somehow affecting my alignment ? Shouldn't it be fr-secondstrand then ? Or am I mixing something up here now ?

**kmcarr** · 04-02-2014, 09:17 AM

Originally posted by kajot View Post

Update: I just ran the counting script with -s reverse and it all seems to work. I have now almost 6 million reads that are empty, 1.5 milion ambigous and the rest (around 72.5 million) counted. Since I specified fr-firststrand in Tophat, isn't it somehow affecting my alignment ? Shouldn't it be fr-secondstrand then ? Or am I mixing something up here now ?

fr-firststrand is the correct setting to use in Tophat for TruSeq Stranded libraries so you are fine. What 'fr-firststrand' means is the the first read of the pair (or only read for single end reads) matches the 'first strand of cDNA synthesized', which is anti-sense to the RNA.

**fchen** · 08-03-2018, 11:04 PM

Originally posted by areyes View Post

I have the feeling that could be related to the protocol that you are using. In the most recent protocol of the illumina runs (at least the latest used by the sequencing facility at EMBL), you get strand specific data, but the reads map opposite to the strand where they come from. For example, for a gene that is in the forward strand, you will get all the reads mapping to the reverse strand. This is a it confusing!

So, what I recommend is that in your IGV browser you could colour your reads depending on the strand where they map to, and then see if what I describe above is the case. If so, I think the parameter "-s reverse" parameter of the dexseq_count.py script should do the trick.

Alejandro

Thank you!

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

DEXseq - very low numbers of counts

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News