View Single Post
Old 03-20-2013, 03:53 AM   #37
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

Little up for this interessting post.

I've a problem with my strand-specific data and htseq-count. So I aligned my data (2x50bp - dUTP method) with STAR. After that I extracted the reads with htseq-count :

htseq-count -s yes gtf.gtf data.sam > htseq.txt

But I've only a read count of 9 for the gene beside . And there is a lot of other genes with very low gene count.

With -s no, the read count seems ok.

Here are a read (and its pair) in sam format that are aligning on this gene (cf figure below)

Code:
HWI-ST1172:65:C0RN7ACXX:1:2316:4226:51105	99	chr15	44109457	255	51M	=	44109544	136	TGTAAACGCCGTAGCCGGGGGTCACTGGATGAATCCTCCTCCTGTTCCTCA	CCBFFFFFGHHHHJIJJJJJJ@EIIIJHGFHGFFFFDEEDEEEDCCDDDDD	NH:i:1	HI:i:1	AS:i:98	nM:i:0
HWI-ST1172:65:C0RN7ACXX:1:2316:4226:51105	147	chr15	44109544	255	49M2S	=	44109457	-136	TGAAATTCTTCATCCTCCTCATCTGAGGACTCCATAGGGGCATAGTCTGCN	EJJJJJIJIJJIJJIIJJJJJJJJJJIGDIIJJJJJJJHHGHHFFDD=4+#	NH:i:1	HI:i:1	AS:i:98	nM:i:0
So do I have to put -s reverse ? but I don't understand in the gtf file, the gene is encoded on the minus strand and my reads are also aligning on the minus strand. I must miss something..

Thanks

N.

NicoBxl is offline   Reply With Quote