Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup segfault with positions list

    Hello,

    I am using samtools v 1.2 on a machine with Ubuntu 14.04.2 and I am trying
    to generate a SNP table from sorted bam files (generated from a GBS library
    prep) aligned to a reference. So far I have used bwa for the alignment,
    samtools view to convert sam to bam, samtools sort, and samtools index.
    Those are all working fine, but when I use samtools mpileup, I get a
    segmentation fault. I think it is because I am trying to pass mpileup a
    list of positions to include from the reference with the -l flag (I don't
    get a segfault if I leave out the list of positions):

    samtools mpileup -g -b ~/Data/alignments/bamfilenames.txt -f
    /Data/alignments/reference.fa -l ~/Data/alignments/positions_to_include.txt
    > fish_snps.bcf

    This returns "segmentation fault (core dumped)." It also segaults if I just
    pass it a single bam file, rather than a list of bam files. How does the
    list of positions file need to be formatted? Is this what could be causing
    the segfault?

    Thanks for any help,

    Sierra

  • #2
    What does your regions file look like?

    Comment


    • #3
      It's just a list of the chromosome names in one column and the position along the chromosome in the other, like so ...

      chrUn 1
      chrUn 2
      chrUn 3
      chrUn 4
      ...
      chr_Sex 353216
      chr_Sex 353217
      chr_Sex 353218
      chr_Sex 353219
      chr_Sex 353220
      chr_Sex 353221
      chr_Sex 353222

      I'm trying to exclude SNPs called in regions of my reference genome that are potential paralogs, which I've identified by read depth of the original short reads used to build the reference.

      Comment


      • #4
        I'm not sure what's going on.

        Looks like it needs a "bed file".


        Code says this for option 'l' ("el", not one "1") ... at line 846 in "bam_plcmd.c"


        case 'l':

        // In the original version the whole BAM was streamed which is inefficient
        // with few BED intervals and big BAMs. Todo: devise a heuristic to determine
        // best strategy, that is streaming or jumping.
        mplp.bed = bed_read(optarg);
        if (!mplp.bed) { print_error_errno("Could not read file \"%s\"", optarg); return 1; }
        break;

        _________

        If you know C or gdb, you can always debug using "printfs" or compile with -g (debug on) and step through using gdb ( linux debugger) .... a little advanced for most but very easy for linux wizards.
        Last edited by Richard Finney; 05-07-2015, 01:10 PM.

        Comment


        • #5
          Hmm, that might be it. Based on the vague man page, I thought the -l flag could take a list of positions or a BED file:

          -l FILE list of positions (chr pos) or regions (BED) [null]

          but possibly it needs to be BED formatted. I will look into that. Thanks!

          I'm very far from a linux wizard, but will enlist the help of the nice wizards down the hall to help me debug with printfs and gdb.

          Comment


          • #6
            You can find *where* it's segfaulting using printf (and an immediate fflush) very easily.

            figuring out *why* it segfaulted is another dimension.

            Comment


            • #7
              I realize that stackoverflow and other venues might be a more appropriate venue for this question, but what's the usage for printf() and fflush()? Right before my call to samtools? Inexperienced linux user here...

              Comment


              • #8
                samtools is written in C.
                The main author of samtools, Heng Li, provides the source which is *very* easy to compile.
                You can edit this code and type "make" and , if you didn't make a syntax error,
                have a new samtools executable with your modifications.

                the printf function is a convenient way to output strings.
                So, you can do things like ....

                printf("hey, i'm here before call to bed() function\n'); fflush(stdout);

                Output is not always done immediately, so the fflush forces the issue; before the next
                statement is executed.

                If you see your custom output, then the segfault, you know it happened *after* your printf call. If you had another printf expected to execute later, but you did not see it, then
                your program bombed *before* that statement.

                Strategic and refined placements or printfs can quickly tell you where the bomb blew up.
                It takes about 2 seconds to compile a modifed samtools, so this is often faster and easier thand cranking up gdb (a famous linux debugger). gdb can be used for tricky situations where more robust monitoring the states of variables is required.

                ____
                If you did ask on stackoverflow, your question would get promptly closed as "off topic" ... and you'd get ridiculed. The most frequently asked questions seem to suffer this fate over there.
                Last edited by Richard Finney; 05-08-2015, 04:50 AM.

                Comment


                • #9
                  Sweet, thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  72 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  81 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X