Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error with paired-end data

    Hi,

    I'm trying to run rhe python script for counting the reads for a dexseq analysis.
    But I keep getting a strange error I can't understand

    Code:
    python ~/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py -p yes -s no MmusGRCm38.DEXSeq.gff -f bam -r pos ../STARmapping_allSamples/C23/C23.STAR.sorted.bam C23.DEXSeq.paired.unstranded.txt
    and the error is:
    Code:
    Traceback (most recent call last):
      File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py", line 236, in <module>
        for a in reader( sam_file ):
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
        yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
      File "_HTSeq.pyx", line 1247, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:24235)
      File "csamtools.pyx", line 2308, in csamtools.AlignedRead.tags.__get__ (lib/pysam/csamtools.c:19977)
    OverflowError: unsigned byte integer is less than minimum
    The bam files were created by the STAR aligner and multiple mapping was allowed.

    here is a sample of one of the bam files. I attached the first 1000 lines of the bam files as a file to test.

    Code:
    samtools view ../STARmapping_allSamples/C23/C23.STAR.sorted.bam | head
    HISEQ:244:C492NACXX:2:1312:7663:90531   99      10      3138670 255     86M15S  =       3138670 86      
    HISEQ:244:C492NACXX:2:1312:7663:90531   147     10      3138670 255     15S86M  =       3138670 -86     
    HISEQ:244:C492NACXX:2:1314:12970:38730  99      10      3140537 255     90M11S  =       3140537 90      
    HISEQ:244:C492NACXX:2:1314:12970:38730  147     10      3140537 255     11S90M  =       3140537 -90     
    HISEQ:244:C492NACXX:2:2110:4692:21500   99      10      3147218 3       101M    =       3147234 117     
    HISEQ:244:C492NACXX:2:2110:4692:21500   147     10      3147234 3       101M    =       3147218 -117    
    HISEQ:244:C492NACXX:2:2110:14708:71286  99      10      3199864 3       101M    =       3199882 119     
    HISEQ:244:C492NACXX:2:2110:14708:71286  147     10      3199882 3       101M    =       3199864 -119    
    HISEQ:244:C492NACXX:2:1316:5341:98737   419     10      3238883 0       101M    =       3238903 121     
    HISEQ:244:C492NACXX:2:1316:5341:98737   339     10      3238903 0       101M    =       3238883 -121
    Attached Files

  • #2
    My only guess is that it things like jM:B:c,-1 aren't supported by htseq-count. Maybe you can remove them with awk?

    Comment


    • #3
      yes, I thought this would be a problem. I have removed the "unusual" falgs and it works perfectly.

      thanks again

      Comment


      • #4
        Dear all,

        I have a very similar problem as frymor, when using the dexseq_count.py script.

        However, the error I get is this:

        Code:
        Traceback (most recent call last):
          File "/home/ibis/kinga.balazs/RNA-Seq/STAR_outputs/BAM_new/sorted/dexseq_count.py", line 239, in <module>
            for a in reader( sam_file ):
          File "/home/ibis/kinga.balazs/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
            yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
          File "_HTSeq.pyx", line 1233, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:23869)
          File "pysam/calignedsegment.pyx", line 2077, in pysam.calignedsegment.AlignedSegment.qual.__get__ (pysam/calignedsegment.c:22616)
          File "pysam/cutils.pyx", line 43, in pysam.cutils.array_to_qualitystring (pysam/cutils.c:1845)
        OverflowError: unsigned byte integer is greater than maximum
        I have to mention that I have 36 samples and I get this error only for 4 of them, which is very strange. For the rest it works perfectly. These are not the biggest files, nor were they generated differently. I also checked, if my SAM files contain one of the flags mentioned by Devon Ryan, and no.

        Also for these samples it starts runnig, and for one of them after processing more than 24 million reads I get this error.

        I used STAR as aligner and I also have paired end reads.

        I would be very thankful for any suggestions.

        Comment


        • #5
          What versions of pysam and Cython do you have installed?

          Comment


          • #6
            I have pysam 0.9.1.4, no Cython and Python 2.7.5

            Comment


            • #7
              My only suggestion would be to try a different pysam version. Perhaps you'll get lucky with that.

              Comment


              • #8
                After trying several things out (also installing an older pysam version did not solve the problem) I tried also to use the SAM files not the BAMs. And it worked. I really don't know why can this happen. As mentioned before, for 32 samples out of 36, the program works on both SAM and BAM files, and for the rest only when I use the SAM files. I just wanted to let you know, that this was the solution I found.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X