Hello,
I am trying to use HTSeq-0.5.3p9 with python2.7 to get the counts from genomic regions defined in a gff file.
My data are BAM files generated by MapSplice on (human) RNA-seq paired-end reads from Illumina HiSeq.
I sorted the BAM files by name with "samtools sort -n" and then transformed them into SAM file by selecting properly paired reads and discarding unmapped and 'mate unmapped' reads ("samtools view -F 12 -f 3").
But applying HTSeq on these data I have an error:
> python -m HTSeq.scripts.count -i ID -s reverse data.sam regions.gff > test.counts
Error occured in line 230375 of file data.sam.
Error: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 230375 of file data.sam')
Here are the lines 230375 and 230376 of the SAM file:
id1/1 403 chr14 21679407 255 29S19M * 0 0 seq1 JJJJJJJJJIJJJJJJJGIGGGGIJHIIJBC@FFFFFHHHHHJJJJJJ XF:Z:GTAG, ZF:Z:FUS_86736804_21679425(+-) RG:Z:id1 NM:i:1 XS:A:+
id1/1 99 chr11 86736776 255 29M19S * 0 0 seq1 JJJJJJJJJIJJJJJJJGIGGGGIJHIIJBC@FFFFFHHHHHJJJJJJ XF:Z:GTAG, ZF:Z:FUS_86736804_21679425(+-) RG:Z:id1 NM:i:1 XS:A:+
We can see that flag &0x0008 is indeed cleared, i.e the mate is noticed as mapped, but the MRNM (Mate Reference NaMe) is "*". In addition, no mate seems to be available in the SAM file (no "id1/2" is available).
The flag &0x0008 seems thus to be incorrectly specified by MapSplice, I guess there is an explanation regarding the fact that this is a Fusion alignment (http://www.netlab.uky.edu/p/bioinfo/...lignmentFormat).
I read here https://stat.ethz.ch/pipermail/bioco...st/040614.html that before this error was only a warning.
Is there a way I could deal with these MapSplice fusion alignments using HTSeq?
Best,
Anne
I am trying to use HTSeq-0.5.3p9 with python2.7 to get the counts from genomic regions defined in a gff file.
My data are BAM files generated by MapSplice on (human) RNA-seq paired-end reads from Illumina HiSeq.
I sorted the BAM files by name with "samtools sort -n" and then transformed them into SAM file by selecting properly paired reads and discarding unmapped and 'mate unmapped' reads ("samtools view -F 12 -f 3").
But applying HTSeq on these data I have an error:
> python -m HTSeq.scripts.count -i ID -s reverse data.sam regions.gff > test.counts
Error occured in line 230375 of file data.sam.
Error: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 230375 of file data.sam')
Here are the lines 230375 and 230376 of the SAM file:
id1/1 403 chr14 21679407 255 29S19M * 0 0 seq1 JJJJJJJJJIJJJJJJJGIGGGGIJHIIJBC@FFFFFHHHHHJJJJJJ XF:Z:GTAG, ZF:Z:FUS_86736804_21679425(+-) RG:Z:id1 NM:i:1 XS:A:+
id1/1 99 chr11 86736776 255 29M19S * 0 0 seq1 JJJJJJJJJIJJJJJJJGIGGGGIJHIIJBC@FFFFFHHHHHJJJJJJ XF:Z:GTAG, ZF:Z:FUS_86736804_21679425(+-) RG:Z:id1 NM:i:1 XS:A:+
We can see that flag &0x0008 is indeed cleared, i.e the mate is noticed as mapped, but the MRNM (Mate Reference NaMe) is "*". In addition, no mate seems to be available in the SAM file (no "id1/2" is available).
The flag &0x0008 seems thus to be incorrectly specified by MapSplice, I guess there is an explanation regarding the fact that this is a Fusion alignment (http://www.netlab.uky.edu/p/bioinfo/...lignmentFormat).
I read here https://stat.ethz.ch/pipermail/bioco...st/040614.html that before this error was only a warning.
Is there a way I could deal with these MapSplice fusion alignments using HTSeq?
Best,
Anne
Comment