Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stampy/BAM alignment error

    Hi all,

    I've been trying to run Stampy 1.0.20 on a BAM file (produced by BWA) that has aligned some Illumina paired-end reads, but it is getting an error. I ran the following commands to build the index, build the hash, and re-align the BAM file:

    python2.6 stampy.py --species=human -G hg37 human_g1k_v37.fasta
    python2.6 stampy.py -g hg37 -H hg37
    python2.6 stampy.py -g hg37 -h hg37 --solexa --bamkeepgoodreads -M xaa.bam > xaa.stampy.sam

    The first two steps seem to succeed, but on the last step I get the following error:

    stampy: Mapping...
    stampy: Internal problem: File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/reader.py", line 1243, in generator
    stampy: Internal problem: ZeroDivisionError('float division',)
    stampy: Please report this problem to <[email protected]> mentioning the version number (1.0.20 (r1642)) and command line - thanks!


    The resulting output xaa.stampy.sam file is empty. It says I can contact Gerton Lunter, but before doing so, I wanted to check if I was doing anything obviously wrong. (I would also look into the reader.py file, but I'm not sure where to find it). If anyone has any thoughts, please let me know. Thanks!

    -Henry
    Last edited by henry2304; 12-12-2012, 03:43 PM.

  • #2
    Are you sure you should be using --solexa? I could imagine wrongly calibrated quality scores inducing a ZeroDivisionError.

    Comment


    • #3
      When I tried to align the original fastq reads, I was getting an error that told me to use the --solexa option.

      stampy: Error: (FastQReader Warning: Suspiciously high Q scores (70)
      on input (line 4). Perhaps you use FASTQ files with Illumina base-64
      Q scores but forgot the --solexa option? If Q score is correct,
      adjust --maxbasequal.

      Adding it seemed allow stampy to align the reads from scratch and generate a SAM file, although when I tried to convert the SAM file generated to BAM with samtools view -bS, I got the following error:

      Parse error at line 8008031: sequence and quality are inconsistent

      So then I tried the alternate way starting with a BAM file generated with BWA. I think trying without the --solexa also seemed to give the same error, but I'll double check and try again. The odd part is that stampy runs awhile on the 5GB bwa bam file, and only crashes with the divide by zero error after about an hour. Could there be one read that has some weird quality scores that is causing it to have problems, but maybe the rest of the reads are fine? I'm not sure why no data is written to the xaa.stampy.sam file for output, even after an hour, but maybe it's some quirk of piping the output to the file?

      -Henry

      Comment


      • #4
        It could be a bad read. The samtools error means that the read and the corresponding quality score do not have the same length.

        As for the sam output, try running it with a smaller test set like the first 10,000 reads.

        Comment


        • #5
          Ah, thanks for the suggestion! I'll try it out shortly.

          Comment


          • #6
            I ran on a smaller test set, and stampy seems to work fine when aligning from scratch from smaller fastq files, and while using the --solexa option. (I think the last time I ran it, it had issues with the cluster and didn't finish running).

            However, aligning on a smaller BAM file from BWA still seems to have some issues. With the smaller BAM file, it still seems to give the divide by zero error, when I run either with the default (--sanger) or --solexa options, and while using the --bamkeepgoodreads option. However, if I get rid of the --bamkeepgoodreads option and use the --solexa option, I do not get any errors (like the divide by zero error), and a non-empty SAM file is produced; however the SAM file only has a header and does not contain any reads aligned. This is the command that produces no divide by zero error, but only a sam file with only a header.

            python2.6 stampy.py -g hg37 -h hg37 -t2 --solexa -M xaa.short.bam > xaa.short.stampy.sam

            I'm not sure if I need to need to deal at all with the --logit option, but I think Illumina uses a phred score? But maybe I'll try to run some more tests. I'll let you know if I figure out anything. Thanks for the help!

            Comment


            • #7
              At this point I think it is worth asking Gerton.

              Comment


              • #8
                Ah ok, sounds good. Actually, one last thing I realized, was that when I created my smaller bam file, I just took the original bam file and converted it to sam. Then I took the first 1000 lines of the sam file, and converted it to bam format. This may not have preserved the paired-end reads, and I realized I have an odd number of reads in my new file. Would have caused any issues? I might try to re-run with an even number of paired reads. If that doesn't end up working, I'll contact Gerton. I also ran stampy with more verbose options, and here's more on the error, although I don't know if it's any more informative:

                stampy: Traceback:
                File "/Users/hcl2112/Desktop/stampy-1.0.20/stampy.py", line 762, in <module>
                main()
                File "/Users/hcl2112/Desktop/stampy-1.0.20/stampy.py", line 730, in main
                mapreads( settings, logger, actiondict['-M'], arguments )
                File "/Users/hcl2112/Desktop/stampy-1.0.20/stampy.py", line 533, in mapreads
                for output in formatgenerator: pass
                File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/formatter.py", line 115, in formatter
                File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/multithread.py", line 77, in generator
                File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/singleendedmapper.py", line 932, in generator
                File "/Net/fs1/home/gerton/Progs/Mapper/stampy/Stampy/reader.py", line 1243, in generator
                stampy: Internal problem: ZeroDivisionError('float division',)
                stampy: Please report this problem to <[email protected]> mentioning the version number (1.0.20 (r1642)) and command line - thanks!

                Comment


                • #9
                  It also doesn't seem to work with just two paired reads, so I've emailed Gerton for help.

                  Comment


                  • #10
                    Gerton got back to me, and it turns out that I had an issue with my BAM file, which did not properly pair the reads. Since Stampy was having trouble finding each read's pair, it was not aligning any of the reads. Then when it tries to compute some statistics with 0 reads aligned, it gets a divide by zero error. I think the latest version of Stampy should fix this issue. Thanks for your help and advice, Aaron!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X