Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools sort

    Hi,

    I just installed the latest samtools (0.1.19-44428cd) and now I have an issue with my SAM->BAM->BAM_Sorted Pipeline using the Linux pipe. In samtools version 0.1.18 (r982:295) the following always worked well:
    Code:
    samtools view -bS -1 temp.sam | samtools sort - temp_sorted
    But with the new version I always get the following error:
    Code:
    [bam_header_read] EOF marker is absent. The input is probably truncated
    I also ran the pipeline with version 0.1.18 to check whether the resulting sorted bam files are the same (regardless of the error message). Linux diff command said no. So my first question: Is the error message problematic?

    After some testing I realized that there is even a difference (also for version 0.1.18) between a sorted bam that was build with the pipe (like in the command above) or that was build without the pipe via:
    Code:
    samtools sort temp.bam temp_sorted
    So my second question is whether anyone knows the difference and if this can be problematic too?

    Sorry, this part was wrong, I made a stupid mistake. The pipe sorting and and direct way of sorting gives the same result!

    As the error is not reported in the non-pipeline version, and the resulting file is the same as that of the pipeline version, the error message in version 0.1.19 is negligible. The only question remaining now is the difference in the 0.1.18-sorted file and the 0.1.19-sorted file
    :
    Code:
    diff <(samtools view temp_sorted_pipe.bam) <(samtools view temp_sorted_old_pipe.bam) | head -20
    466a467
    > DJG6PNM1:223:D1GB7ACXX:2:1101:19332:59581     16      gi|555853|gb|U13369.1|HSU13369  3657    255     19M     *       0       0       TACCTGGTTGATCCTGCCA     HHIIIIHHHHHFFFFFCCC   XA:i:0  MD:Z:19 NM:i:0
    474,475d474
    < DJG6PNM1:223:D1GB7ACXX:2:1101:19332:59581     16      gi|555853|gb|U13369.1|HSU13369  3657    255     19M     *       0       0       TACCTGGTTGATCCTGCCA     HHIIIIHHHHHFFFFFCCC   XA:i:0  MD:Z:19 NM:i:0
    < DJG6PNM1:223:D1GB7ACXX:2:1101:14750:15107     0       gi|555853|gb|U13369.1|HSU13369  3660    255     16M     *       0       0       CTGGTTGATCCTGCCA        BCCFDFFFHHHHGIII      XA:i:0  MD:Z:16 NM:i:0
    477c476
    < DJG6PNM1:223:D1GB7ACXX:2:1101:15030:64473     0       gi|555853|gb|U13369.1|HSU13369  3661    255     26M     *       0       0       TGGTTGATCCTGCCAGTAGCATATGC      4114=?BDHHHGHIIIIIIIIIEIHI    XA:i:0  MD:Z:26 NM:i:0
    ---
    > DJG6PNM1:223:D1GB7ACXX:2:1101:14750:15107     0       gi|555853|gb|U13369.1|HSU13369  3660    255     16M     *       0       0       CTGGTTGATCCTGCCA        BCCFDFFFHHHHGIII      XA:i:0  MD:Z:16 NM:i:0
    478a478
    > DJG6PNM1:223:D1GB7ACXX:2:1101:15030:64473     0       gi|555853|gb|U13369.1|HSU13369  3661    255     26M     *       0       0       TGGTTGATCCTGCCAGTAGCATATGC      4114=?BDHHHGHIIIIIIIIIEIHI    XA:i:0  MD:Z:26 NM:i:0
    492d491
    < DJG6PNM1:223:D1GB7ACXX:2:1101:5749:82660      0       gi|555853|gb|U13369.1|HSU13369  3669    255     29M     *       0       0       CCTGCCAGTAGCATATGCTTGTCTCAAAG   CCCFFFFFHHHHHIIIIIIIIIIIIIIII XA:i:0  MD:Z:29 NM:i:0
    495,497c494
    < DJG6PNM1:223:D1GB7ACXX:2:1101:17420:15616     0       gi|555853|gb|U13369.1|HSU13369  3670    255     23M     *       0       0       CTGCCAGTAGCATATGCTTGTCT CCCFFFFFHHHHHIIIIIIIIII       XA:i:0  MD:Z:23 NM:i:0
    < DJG6PNM1:223:D1GB7ACXX:2:1101:6026:70596      0       gi|555853|gb|U13369.1|HSU13369  3670    255     23M     *       0       0       CTGCCAGTAGCATATGCTTGTCT CCCFFFFFHHHHHIIIIIIIIII       XA:i:0  MD:Z:23 NM:i:0
    < DJG6PNM1:223:D1GB7ACXX:2:1102:15933:7414      0       gi|555853|gb|U13369.1|HSU13369  3670    255     22M     *       0       0       CTGCCAGTAGCATATGCTTGTC  BCCFFFFFHHHHHIIIIIIIII        XA:i:0  MD:Z:22 NM:i:0
    ---
    > DJG6PNM1:223:D1GB7ACXX:2:1101:5749:82660      0       gi|555853|gb|U13369.1|HSU13369  3669    255     29M     *       0       0       CCTGCCAGTAGCATATGCTTGTCTCAAAG   CCCFFFFFHHHHHIIIIIIIIIIIIIIII XA:i:0  MD:Z:29 NM:i:0
    498a496
    temp_sorted_old_pipe.bam was build using the old samtools version (0.1.18)
    Thank you very much
    Last edited by hanshart; 06-29-2013, 12:57 PM.

  • #2
    I've always assumed this is not a real issue but maybe it is. You said you noticed a difference between the two methods of sorting. What difference did you notice? Meaning, were the output files different or was the only difference whether or not you got that error message?

    Comment


    • #3
      Originally posted by Heisman View Post
      I've always assumed this is not a real issue but maybe it is. You said you noticed a difference between the two methods of sorting. What difference did you notice? Meaning, were the output files different or was the only difference whether or not you got that error message?
      Thank you for your answer Heisman,
      actually I was wrong.
      There is no difference in the way of sorting (either with or without the pipe). Sorry for the confusion, I edited my first post.

      The difference between the different versions is however true. I attached the first part of the Linux "diff" output but I'm not sure if this is really helpful. So, in which way the sorting has changed? Is it important for any issues?
      Thanks again

      Comment


      • #4
        Originally posted by hanshart View Post
        Hi,

        I just installed the latest samtools (0.1.19-44428cd) and now I have an issue with my SAM->BAM->BAM_Sorted Pipeline using the Linux pipe. In samtools version 0.1.18 (r982:295) the following always worked well:
        Code:
        samtools view -bS -1 temp.sam | samtools sort - temp_sorted
        But with the new version I always get the following error:
        Code:
        [bam_header_read] EOF marker is absent. The input is probably truncated
        I also ran the pipeline with version 0.1.18 to check whether the resulting sorted bam files are the same (regardless of the error message). Linux diff command said no. So my first question: Is the error message problematic?
        This is a known bug in samtools 0.1.19,
        Original title: "samtools sort from stdin shouldn't check BAM EOF" Consider this simplified example where I want to sort a BAM file supplied on stdin, $ cat test.bam | samtools sort - test_sorted [...


        The warning is in this case probably harmless - but in general can be a sign of a truncated file related problem.

        Comment


        • #5
          Originally posted by maubp View Post
          This is a known bug in samtools 0.1.19,
          Original title: "samtools sort from stdin shouldn't check BAM EOF" Consider this simplified example where I want to sort a BAM file supplied on stdin, $ cat test.bam | samtools sort - test_sorted [...


          The warning is in this case probably harmless - but in general can be a sign of a truncated file related problem.
          Thank you maubp.
          About the different sorting in version 0.1.19 in contrast to version 0.1.18:
          I'm quite sure that in version 0.1.19 reads beginning at the same position are now sorted by strand (first forward, than reverse strand) whereas in version 0.1.18 they were not sorted by strand:

          Code:
           diff <(samtools view temp_sorted_pipe.bam | cut -f2,4) <(samtools view temp_sorted_old_pipe.bam | cut -f2,4) -y | less -S
          ...
          0       3709                   0       3709
          0       3709                 <
          16      3709                   16      3709
          16      3709                   16      3709
          16      3709                   16      3709
          16      3709                   16      3709
                                       > 0       3709
                                       > 16      3710
          0       3710                   0       3710
          0       3710                   0       3710
          0       3710                   0       3710
          16      3710                 | 16      3711
          0       3711                   0       3711
          0       3711                   0       3711
          16      3711                   16      3711
          16      3711                 <
          0       3712                   0       3712
          0       3713                   0       3713
                                       > 16      3713
                                       > 16      3713
          0       3713                   0       3713
          0       3713                   0       3713
          0       3713                   0       3713
          16      3713                 <
          16      3713                 <
          0       3714                 <
          0       3714                   0       3714
          0       3714                   0       3714
          ...
          On the left (version 0.1.19) the reads are sorted by position and strand whereas on the right (version 0.1.18) they are only sorted by position

          Am I right?
          Thanks

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X