Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error: 'SAM optional field tag NH not found'

    Hi all (and Simon & Alejandro, hopefully),

    In my Bam files, optional NH tag was given to only mapped reads (NH:i:n; n>=1). There's no NH:i:0. Here is a part of the sam (PE; sorted by read name):

    H13P7ADXX130822:1:1101:1122:27935 77 * 0 0 * * 0 0 CGGGNGGATGCCTTCTTTATCTTGGATCTTGGCNTTCACATTTTCGATGGTGTCACTGGGCTCCACCTCCAGAGTG CCCF#2ADHHHH
    HJJJJJJJJJJJJIJJJJJJJ#1CFHIIJJJJJJJJJJGHHIJJHGEGGGIGHIIIEHHGDB7? PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
    H13P7ADXX130822:1:1101:1122:27935 141 * 0 0 * * 0 0 NNNCNCCATCGNAAATGTGAAGGCCAAGATCCAGGATAAAGAAGGCATCCCTCCCGACCAGCAGAGGCTCATCTTT ###2#2:?@@@#
    4@@@@@@@@@@@@@@@@????8?????????????????1>?###################### PG:Z:MarkDuplicates.1E RG:Z:H13P7.1
    H13P7ADXX130822:1:1101:1124:37458 1609 3 40500185 255 43M2670N33M = 40500185 0 AGTGNATGCAGCTCACTGATTTCATCCTCAAGTTTCCGCACAGTGCCCACCAGAAGTATGTCCGACAA
    GCCTGGCA <<<@#2@((=?>7=<<6::?=9@=>>@<9><98)9@@6???8?33;>?>9;7::>6.6>?6)8=?8?8??;=0=;7 PG:Z:MarkDuplicates.1E RG:Z:H13P7.1 NH:i:1 NM:i:1 UQ:i:2 XS:A:+


    I think DEXSeq-count.py doesn't like this format showing an error message below:


    Traceback (most recent call last):
    File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 225, in <module>
    rs = map_read_pair( af, ar )
    File "/home/kkim/softwares/R_packages/DEXSeq/python_scripts/dexseq_count.py", line 134, in map_read_pair
    if af != None and af.optional_field("NH") > 1:
    File "_HTSeq.pyx", line 1399, in HTSeq._HTSeq.SAM_Alignment.optional_field (src/_HTSeq.c:26483)
    KeyError: 'SAM optional field tag NH not found'
    Actually, this bam file has worked well with HTSeq-count without an error. The manual of HTSeq says ......If the aligner does not set this field, multiply aligned reads will be counted multiple times. ... I think HTSeq can handle my format.

    Do you have any suggestion for avoiding this error message by dexseq-count? What if I grep the sam file by "NH" before running dexseq-count?? There was no error but I cannot figure out how does it affect to my result.

    Thanks in advance for your reply.


    Kwangwoo
    Last edited by Kwangwoo; 10-31-2013, 11:44 AM.

  • #2
    Hi Kwangwoo,

    Thanks for your post. This is a recent change introduced to the python script to make sure to only consider uniquely mapped reads.

    Previously, the scripts were checking for a good alignment quality (e.g. more than 10) in the sam file, however we noticed that some aligners were giving very good alignment qualities to multiple mapping reads. If a read mapping multiple times should have a good quality score is debatable.

    Therefore we decided to introduce this change and instead check for the NH flag, and a read will only be considered if its value is 1... but I did not know it was an optional flag? What aligner are you using?

    A quick fix for you would be just to prefilter your reads for those with a unique alignment, e.g. NH:i:1. But I would first make sure that all the uniquely aligned reads have the NH flag!

    Alejandro

    Comment


    • #3
      NH not found

      Same error for me. We are using Soapsplice and as Alejandro was saying, nothing happened with the old dexseq_count.py.
      Jose



      Originally posted by areyes View Post
      Hi Kwangwoo,

      Thanks for your post. This is a recent change introduced to the python script to make sure to only consider uniquely mapped reads.

      Previously, the scripts were checking for a good alignment quality (e.g. more than 10) in the sam file, however we noticed that some aligners were giving very good alignment qualities to multiple mapping reads. If a read mapping multiple times should have a good quality score is debatable.

      Therefore we decided to introduce this change and instead check for the NH flag, and a read will only be considered if its value is 1... but I did not know it was an optional flag? What aligner are you using?

      A quick fix for you would be just to prefilter your reads for those with a unique alignment, e.g. NH:i:1. But I would first make sure that all the uniquely aligned reads have the NH flag!

      Alejandro

      Comment


      • #4
        where can i find the old python script?

        Comment


        • #5
          FYI: I run a custom program (getnh) to slap the NH tag on ...
          samtools view original.bam | ./getnh | python dexseq_count.py -f sam --paired=yes hg19.gff - outputcounts.txt 2> errfile

          This takes the bam file with missing nh tags and outputs as a sam file with new NH tags which gets piped into python dexseq_count.py

          /*
          how to compile : gcc -Wall -O2 -o getnh getnh.c
          */


          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>

          int main()
          {
          char *z;
          char s[5000];

          s[0] = (char)0;
          while (gets(s))
          {
          z = strstr(s,"NH:i");
          if (z)
          {
          printf("%s\n",s);
          s[0] = (char)0;
          continue;
          }
          printf("%s\tNH:i:1\n",s);
          s[0] = (char)0;
          }
          return 0;
          }

          Comment


          • #6
            where can i find the old python script?
            From old versions of DEXSeq, for instance:

            The package is focused on finding differential exon usage using RNA-seq exon counts between samples with different experimental designs. It provides functions that allows the user to make the necessary statistical tests based on a model that uses the negative binomial distribution to estimate the variance between biological replicates and generalized linear models for testing. The package also provides functions for the visualization and exploration of the results.


            But I would rather add the NH flag, rather than using old, not-mantained software!

            Comment


            • #7
              I aligned with bowtie2, and according to what I read it does not set this flag.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X