Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools mpileup segfault with positions list

    Hello,

    I am using samtools v 1.2 on a machine with Ubuntu 14.04.2 and I am trying
    to generate a SNP table from sorted bam files (generated from a GBS library
    prep) aligned to a reference. So far I have used bwa for the alignment,
    samtools view to convert sam to bam, samtools sort, and samtools index.
    Those are all working fine, but when I use samtools mpileup, I get a
    segmentation fault. I think it is because I am trying to pass mpileup a
    list of positions to include from the reference with the -l flag (I don't
    get a segfault if I leave out the list of positions):

    samtools mpileup -g -b ~/Data/alignments/bamfilenames.txt -f
    /Data/alignments/reference.fa -l ~/Data/alignments/positions_to_include.txt
    > fish_snps.bcf

    This returns "segmentation fault (core dumped)." It also segaults if I just
    pass it a single bam file, rather than a list of bam files. How does the
    list of positions file need to be formatted? Is this what could be causing
    the segfault?

    Thanks for any help,

    Sierra

  • #2
    What does your regions file look like?

    Comment


    • #3
      It's just a list of the chromosome names in one column and the position along the chromosome in the other, like so ...

      chrUn 1
      chrUn 2
      chrUn 3
      chrUn 4
      ...
      chr_Sex 353216
      chr_Sex 353217
      chr_Sex 353218
      chr_Sex 353219
      chr_Sex 353220
      chr_Sex 353221
      chr_Sex 353222

      I'm trying to exclude SNPs called in regions of my reference genome that are potential paralogs, which I've identified by read depth of the original short reads used to build the reference.

      Comment


      • #4
        I'm not sure what's going on.

        Looks like it needs a "bed file".


        Code says this for option 'l' ("el", not one "1") ... at line 846 in "bam_plcmd.c"


        case 'l':

        // In the original version the whole BAM was streamed which is inefficient
        // with few BED intervals and big BAMs. Todo: devise a heuristic to determine
        // best strategy, that is streaming or jumping.
        mplp.bed = bed_read(optarg);
        if (!mplp.bed) { print_error_errno("Could not read file \"%s\"", optarg); return 1; }
        break;

        _________

        If you know C or gdb, you can always debug using "printfs" or compile with -g (debug on) and step through using gdb ( linux debugger) .... a little advanced for most but very easy for linux wizards.
        Last edited by Richard Finney; 05-07-2015, 01:10 PM.

        Comment


        • #5
          Hmm, that might be it. Based on the vague man page, I thought the -l flag could take a list of positions or a BED file:

          -l FILE list of positions (chr pos) or regions (BED) [null]

          but possibly it needs to be BED formatted. I will look into that. Thanks!

          I'm very far from a linux wizard, but will enlist the help of the nice wizards down the hall to help me debug with printfs and gdb.

          Comment


          • #6
            You can find *where* it's segfaulting using printf (and an immediate fflush) very easily.

            figuring out *why* it segfaulted is another dimension.

            Comment


            • #7
              I realize that stackoverflow and other venues might be a more appropriate venue for this question, but what's the usage for printf() and fflush()? Right before my call to samtools? Inexperienced linux user here...

              Comment


              • #8
                samtools is written in C.
                The main author of samtools, Heng Li, provides the source which is *very* easy to compile.
                You can edit this code and type "make" and , if you didn't make a syntax error,
                have a new samtools executable with your modifications.

                the printf function is a convenient way to output strings.
                So, you can do things like ....

                printf("hey, i'm here before call to bed() function\n'); fflush(stdout);

                Output is not always done immediately, so the fflush forces the issue; before the next
                statement is executed.

                If you see your custom output, then the segfault, you know it happened *after* your printf call. If you had another printf expected to execute later, but you did not see it, then
                your program bombed *before* that statement.

                Strategic and refined placements or printfs can quickly tell you where the bomb blew up.
                It takes about 2 seconds to compile a modifed samtools, so this is often faster and easier thand cranking up gdb (a famous linux debugger). gdb can be used for tricky situations where more robust monitoring the states of variables is required.

                ____
                If you did ask on stackoverflow, your question would get promptly closed as "off topic" ... and you'd get ridiculed. The most frequently asked questions seem to suffer this fate over there.
                Last edited by Richard Finney; 05-08-2015, 04:50 AM.

                Comment


                • #9
                  Sweet, thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  58 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X