Hi all (and Simon & Alejandro, hopefully),
In my Bam files, optional NH tag was given to only mapped reads (NH:i:n; n>=1). There's no NH:i:0. Here is a part of the sam (PE; sorted by read name):
I think DEXSeq-count.py doesn't like this format showing an error message below:
Actually, this bam file has worked well with HTSeq-count without an error. The manual of HTSeq says ......If the aligner does not set this field, multiply aligned reads will be counted multiple times. ... I think HTSeq can handle my format.
Do you have any suggestion for avoiding this error message by dexseq-count? What if I grep the sam file by "NH" before running dexseq-count?? There was no error but I cannot figure out how does it affect to my result.
Thanks in advance for your reply.
Kwangwoo
In my Bam files, optional NH tag was given to only mapped reads (NH:i:n; n>=1). There's no NH:i:0. Here is a part of the sam (PE; sorted by read name):
H13P7ADXX130822:1:1101:1122:27935 77 * 0 0 * * 0 0 CGGGNGGATGCCTTCTTTATCTTGGATCTTGGCNTTCACATTTTCGATGGTGTCACTGGGCTCCACCTCCAGAGTG CCCF#2ADHHHH
HJJJJJJJJJJJJIJJJJJJJ#1CFHIIJJJJJJJJJJGHHIJJHGEGGGIGHIIIEHHGDB7? PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
H13P7ADXX130822:1:1101:1122:27935 141 * 0 0 * * 0 0 NNNCNCCATCGNAAATGTGAAGGCCAAGATCCAGGATAAAGAAGGCATCCCTCCCGACCAGCAGAGGCTCATCTTT ###2#2:?@@@#
4@@@@@@@@@@@@@@@@????8?????????????????1>?###################### PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
H13P7ADXX130822:1:1101:1124:37458 1609 3 40500185 255 43M2670N33M = 40500185 0 AGTGNATGCAGCTCACTGATTTCATCCTCAAGTTTCCGCACAGTGCCCACCAGAAGTATGTCCGACAA
GCCTGGCA <<<@#2@((=?>7=<<6::?=9@=>>@<9><98)9@@6???8?33;>?>9;7::>6.6>?6)8=?8?8??;=0=;7 PG:Z:MarkDuplicates.1E RG:Z:H13P7.1 NH:i:1 NM:i:1 UQ:i:2 XS:A:+
HJJJJJJJJJJJJIJJJJJJJ#1CFHIIJJJJJJJJJJGHHIJJHGEGGGIGHIIIEHHGDB7? PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
H13P7ADXX130822:1:1101:1122:27935 141 * 0 0 * * 0 0 NNNCNCCATCGNAAATGTGAAGGCCAAGATCCAGGATAAAGAAGGCATCCCTCCCGACCAGCAGAGGCTCATCTTT ###2#2:?@@@#
4@@@@@@@@@@@@@@@@????8?????????????????1>?###################### PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
H13P7ADXX130822:1:1101:1124:37458 1609 3 40500185 255 43M2670N33M = 40500185 0 AGTGNATGCAGCTCACTGATTTCATCCTCAAGTTTCCGCACAGTGCCCACCAGAAGTATGTCCGACAA
GCCTGGCA <<<@#2@((=?>7=<<6::?=9@=>>@<9><98)9@@6???8?33;>?>9;7::>6.6>?6)8=?8?8??;=0=;7 PG:Z:MarkDuplicates.1E RG:Z:H13P7.1 NH:i:1 NM:i:1 UQ:i:2 XS:A:+
I think DEXSeq-count.py doesn't like this format showing an error message below:
Traceback (most recent call last):
File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 225, in <module>
rs = map_read_pair( af, ar )
File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 134, in map_read_pair
if af != None and af.optional_field("NH") > 1:
File "_HTSeq.pyx", line 1399, in HTSeq._HTSeq.SAM_Alignment.optional_field (src/_HTSeq.c:26483)
KeyError: 'SAM optional field tag NH not found'
File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 225, in <module>
rs = map_read_pair( af, ar )
File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 134, in map_read_pair
if af != None and af.optional_field("NH") > 1:
File "_HTSeq.pyx", line 1399, in HTSeq._HTSeq.SAM_Alignment.optional_field (src/_HTSeq.c:26483)
KeyError: 'SAM optional field tag NH not found'
Do you have any suggestion for avoiding this error message by dexseq-count? What if I grep the sam file by "NH" before running dexseq-count?? There was no error but I cannot figure out how does it affect to my result.
Thanks in advance for your reply.
Kwangwoo
Comment