Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq error with paired-end data

    Hi,

    I'm trying to run rhe python script for counting the reads for a dexseq analysis.
    But I keep getting a strange error I can't understand

    Code:
    python ~/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py -p yes -s no MmusGRCm38.DEXSeq.gff -f bam -r pos ../STARmapping_allSamples/C23/C23.STAR.sorted.bam C23.DEXSeq.paired.unstranded.txt
    and the error is:
    Code:
    Traceback (most recent call last):
      File "/home/yeroslaviz/R/x86_64-pc-linux-gnu-library/3.1/DEXSeq/python_scripts/dexseq_count.py", line 236, in <module>
        for a in reader( sam_file ):
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.6.1p1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
        yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
      File "_HTSeq.pyx", line 1247, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:24235)
      File "csamtools.pyx", line 2308, in csamtools.AlignedRead.tags.__get__ (lib/pysam/csamtools.c:19977)
    OverflowError: unsigned byte integer is less than minimum
    The bam files were created by the STAR aligner and multiple mapping was allowed.

    here is a sample of one of the bam files. I attached the first 1000 lines of the bam files as a file to test.

    Code:
    samtools view ../STARmapping_allSamples/C23/C23.STAR.sorted.bam | head
    HISEQ:244:C492NACXX:2:1312:7663:90531   99      10      3138670 255     86M15S  =       3138670 86      
    HISEQ:244:C492NACXX:2:1312:7663:90531   147     10      3138670 255     15S86M  =       3138670 -86     
    HISEQ:244:C492NACXX:2:1314:12970:38730  99      10      3140537 255     90M11S  =       3140537 90      
    HISEQ:244:C492NACXX:2:1314:12970:38730  147     10      3140537 255     11S90M  =       3140537 -90     
    HISEQ:244:C492NACXX:2:2110:4692:21500   99      10      3147218 3       101M    =       3147234 117     
    HISEQ:244:C492NACXX:2:2110:4692:21500   147     10      3147234 3       101M    =       3147218 -117    
    HISEQ:244:C492NACXX:2:2110:14708:71286  99      10      3199864 3       101M    =       3199882 119     
    HISEQ:244:C492NACXX:2:2110:14708:71286  147     10      3199882 3       101M    =       3199864 -119    
    HISEQ:244:C492NACXX:2:1316:5341:98737   419     10      3238883 0       101M    =       3238903 121     
    HISEQ:244:C492NACXX:2:1316:5341:98737   339     10      3238903 0       101M    =       3238883 -121
    Attached Files

  • #2
    My only guess is that it things like jM:B:c,-1 aren't supported by htseq-count. Maybe you can remove them with awk?

    Comment


    • #3
      yes, I thought this would be a problem. I have removed the "unusual" falgs and it works perfectly.

      thanks again

      Comment


      • #4
        Dear all,

        I have a very similar problem as frymor, when using the dexseq_count.py script.

        However, the error I get is this:

        Code:
        Traceback (most recent call last):
          File "/home/ibis/kinga.balazs/RNA-Seq/STAR_outputs/BAM_new/sorted/dexseq_count.py", line 239, in <module>
            for a in reader( sam_file ):
          File "/home/ibis/kinga.balazs/.local/lib/python2.7/site-packages/HTSeq-0.6.1-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 946, in __iter__
            yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
          File "_HTSeq.pyx", line 1233, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:23869)
          File "pysam/calignedsegment.pyx", line 2077, in pysam.calignedsegment.AlignedSegment.qual.__get__ (pysam/calignedsegment.c:22616)
          File "pysam/cutils.pyx", line 43, in pysam.cutils.array_to_qualitystring (pysam/cutils.c:1845)
        OverflowError: unsigned byte integer is greater than maximum
        I have to mention that I have 36 samples and I get this error only for 4 of them, which is very strange. For the rest it works perfectly. These are not the biggest files, nor were they generated differently. I also checked, if my SAM files contain one of the flags mentioned by Devon Ryan, and no.

        Also for these samples it starts runnig, and for one of them after processing more than 24 million reads I get this error.

        I used STAR as aligner and I also have paired end reads.

        I would be very thankful for any suggestions.

        Comment


        • #5
          What versions of pysam and Cython do you have installed?

          Comment


          • #6
            I have pysam 0.9.1.4, no Cython and Python 2.7.5

            Comment


            • #7
              My only suggestion would be to try a different pysam version. Perhaps you'll get lucky with that.

              Comment


              • #8
                After trying several things out (also installing an older pysam version did not solve the problem) I tried also to use the SAM files not the BAMs. And it worked. I really don't know why can this happen. As mentioned before, for 32 samples out of 36, the program works on both SAM and BAM files, and for the rest only when I use the SAM files. I just wanted to let you know, that this was the solution I found.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-27-2024, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-27-2024, 06:07 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X