Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Smatools mpileup thinks my sorted bam files are not sorted

    Hi
    I am trying to run samtools mpileup on a large list of bam files
    Code:
    samtools mpileup -d 5000 -f /path/to/ref.fa \
    /path/to/first.bam \
    /path/to/second.bam \
    | gzip > output.piledup
    However, I get a log file which features a list saying
    Code:
    [bam_pileup_core] the input is not sorted (reads out of order
    [bam_pileup_core] the input is not sorted (chromosomes out of order)
    Corresponding, to each of my original input bam files and then a further list of
    Code:
    [bam_plp_destroy] memory leak: 2. Continue anyway.
    With a line for each input file.
    I have definitely used the samtools sort command to sort these files prior to using mpileup. However, if I use:
    Code:
    samtools view -H sorted.bam
    I still get a header of @HD VN:1.0 SO:unsorted
    So my questions are:
    • Are my "sorted" bam files actually sorted?
    • If they are not, how can I sort them if samtools sort doesn't seem to sort them?
    • If they are sorted where else could the error be?

    These bam files were intially aligned using bwa and converted from sam to bam with bwa
    Thanks in advance for any help

  • #2
    You could try Picard's SortSam.

    HTH

    Comment


    • #3
      What version of samtools do you have? Older versions of samtools never bothered to update the @HD line during 'samtools sort'.

      Comment


      • #4
        0.1.18
        I found an old thread which seemed to suggest that the @HD line might be being left as saying unsorted by samtools.
        Is the mpileup totally dependent on this line saying unsorted or sorted?

        Comment


        • #5
          For the benefit of closing this thread: the files sorted successfully with Picard SortSam. I have no idea why they wouldn't sort with samtools

          Comment


          • #6
            Hi,
            I am having a similar problem using samtools phase on sorted and indexed BAM files and I keep getting the errors:

            [bam_pileup_core] the input is not sorted (reads out of order)
            [bam_plp_destroy] memory leak: 19. Continue anyway.

            My bam file is sorted. I have also tried to use Picard SortSam, but get the error:
            Error: Unable to access jarfile INPUT=Sample_Bbcap31_L002.bam

            Does anyone have advice on how to proceed? Thank you!

            Comment


            • #7
              I was having the same problem with mpileup not recognizing that my bam file was sorted. I had tried sorting with samtools and with picardtools but neither seemed to solve the problem.

              For my case it appears that the issue was that one read pair had a read mapping at the very beginning of the contig and the other read not mapping. These were on the reverse strand. This meant that the first two reads are shown with a coordinate of 0. I am not sure what is causing the error exactly but removing this pair of reads resolves the issue.

              Perhaps someone else has seen this as well?

              Version is 1.18

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin


                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              39 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              41 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X