SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Samtools sort vs Picard Sortsam rahul Bioinformatics 6 02-11-2015 12:12 PM
***samtools*** sort wrong??? shuoguo Bioinformatics 2 09-21-2012 05:47 PM
samtools sort EBER Bioinformatics 1 06-08-2012 05:15 PM
samtools sort stdout pederworning Bioinformatics 4 04-09-2011 08:02 AM
@HD SO field after samtools sort telos Bioinformatics 3 11-10-2010 06:12 AM

Reply
 
Thread Tools
Old 06-29-2013, 10:09 AM   #1
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default samtools sort

Hi,

I just installed the latest samtools (0.1.19-44428cd) and now I have an issue with my SAM->BAM->BAM_Sorted Pipeline using the Linux pipe. In samtools version 0.1.18 (r982:295) the following always worked well:
Code:
samtools view -bS -1 temp.sam | samtools sort - temp_sorted
But with the new version I always get the following error:
Code:
[bam_header_read] EOF marker is absent. The input is probably truncated
I also ran the pipeline with version 0.1.18 to check whether the resulting sorted bam files are the same (regardless of the error message). Linux diff command said no. So my first question: Is the error message problematic?

After some testing I realized that there is even a difference (also for version 0.1.18) between a sorted bam that was build with the pipe (like in the command above) or that was build without the pipe via:
Code:
samtools sort temp.bam temp_sorted
So my second question is whether anyone knows the difference and if this can be problematic too?

Sorry, this part was wrong, I made a stupid mistake. The pipe sorting and and direct way of sorting gives the same result!

As the error is not reported in the non-pipeline version, and the resulting file is the same as that of the pipeline version, the error message in version 0.1.19 is negligible. The only question remaining now is the difference in the 0.1.18-sorted file and the 0.1.19-sorted file
:
Code:
diff <(samtools view temp_sorted_pipe.bam) <(samtools view temp_sorted_old_pipe.bam) | head -20
466a467
> DJG6PNM1:223:D1GB7ACXX:2:1101:19332:59581     16      gi|555853|gb|U13369.1|HSU13369  3657    255     19M     *       0       0       TACCTGGTTGATCCTGCCA     HHIIIIHHHHHFFFFFCCC   XA:i:0  MD:Z:19 NM:i:0
474,475d474
< DJG6PNM1:223:D1GB7ACXX:2:1101:19332:59581     16      gi|555853|gb|U13369.1|HSU13369  3657    255     19M     *       0       0       TACCTGGTTGATCCTGCCA     HHIIIIHHHHHFFFFFCCC   XA:i:0  MD:Z:19 NM:i:0
< DJG6PNM1:223:D1GB7ACXX:2:1101:14750:15107     0       gi|555853|gb|U13369.1|HSU13369  3660    255     16M     *       0       0       CTGGTTGATCCTGCCA        BCCFDFFFHHHHGIII      XA:i:0  MD:Z:16 NM:i:0
477c476
< DJG6PNM1:223:D1GB7ACXX:2:1101:15030:64473     0       gi|555853|gb|U13369.1|HSU13369  3661    255     26M     *       0       0       TGGTTGATCCTGCCAGTAGCATATGC      4114=?BDHHHGHIIIIIIIIIEIHI    XA:i:0  MD:Z:26 NM:i:0
---
> DJG6PNM1:223:D1GB7ACXX:2:1101:14750:15107     0       gi|555853|gb|U13369.1|HSU13369  3660    255     16M     *       0       0       CTGGTTGATCCTGCCA        BCCFDFFFHHHHGIII      XA:i:0  MD:Z:16 NM:i:0
478a478
> DJG6PNM1:223:D1GB7ACXX:2:1101:15030:64473     0       gi|555853|gb|U13369.1|HSU13369  3661    255     26M     *       0       0       TGGTTGATCCTGCCAGTAGCATATGC      4114=?BDHHHGHIIIIIIIIIEIHI    XA:i:0  MD:Z:26 NM:i:0
492d491
< DJG6PNM1:223:D1GB7ACXX:2:1101:5749:82660      0       gi|555853|gb|U13369.1|HSU13369  3669    255     29M     *       0       0       CCTGCCAGTAGCATATGCTTGTCTCAAAG   CCCFFFFFHHHHHIIIIIIIIIIIIIIII XA:i:0  MD:Z:29 NM:i:0
495,497c494
< DJG6PNM1:223:D1GB7ACXX:2:1101:17420:15616     0       gi|555853|gb|U13369.1|HSU13369  3670    255     23M     *       0       0       CTGCCAGTAGCATATGCTTGTCT CCCFFFFFHHHHHIIIIIIIIII       XA:i:0  MD:Z:23 NM:i:0
< DJG6PNM1:223:D1GB7ACXX:2:1101:6026:70596      0       gi|555853|gb|U13369.1|HSU13369  3670    255     23M     *       0       0       CTGCCAGTAGCATATGCTTGTCT CCCFFFFFHHHHHIIIIIIIIII       XA:i:0  MD:Z:23 NM:i:0
< DJG6PNM1:223:D1GB7ACXX:2:1102:15933:7414      0       gi|555853|gb|U13369.1|HSU13369  3670    255     22M     *       0       0       CTGCCAGTAGCATATGCTTGTC  BCCFFFFFHHHHHIIIIIIIII        XA:i:0  MD:Z:22 NM:i:0
---
> DJG6PNM1:223:D1GB7ACXX:2:1101:5749:82660      0       gi|555853|gb|U13369.1|HSU13369  3669    255     29M     *       0       0       CCTGCCAGTAGCATATGCTTGTCTCAAAG   CCCFFFFFHHHHHIIIIIIIIIIIIIIII XA:i:0  MD:Z:29 NM:i:0
498a496
temp_sorted_old_pipe.bam was build using the old samtools version (0.1.18)
Thank you very much

Last edited by hanshart; 06-29-2013 at 12:57 PM.
hanshart is offline   Reply With Quote
Old 06-29-2013, 10:46 AM   #2
Heisman
Senior Member
 
Location: St. Louis

Join Date: Dec 2010
Posts: 535
Default

I've always assumed this is not a real issue but maybe it is. You said you noticed a difference between the two methods of sorting. What difference did you notice? Meaning, were the output files different or was the only difference whether or not you got that error message?
Heisman is offline   Reply With Quote
Old 06-29-2013, 12:45 PM   #3
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

Quote:
Originally Posted by Heisman View Post
I've always assumed this is not a real issue but maybe it is. You said you noticed a difference between the two methods of sorting. What difference did you notice? Meaning, were the output files different or was the only difference whether or not you got that error message?
Thank you for your answer Heisman,
actually I was wrong.
There is no difference in the way of sorting (either with or without the pipe). Sorry for the confusion, I edited my first post.

The difference between the different versions is however true. I attached the first part of the Linux "diff" output but I'm not sure if this is really helpful. So, in which way the sorting has changed? Is it important for any issues?
Thanks again
hanshart is offline   Reply With Quote
Old 06-29-2013, 09:01 PM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by hanshart View Post
Hi,

I just installed the latest samtools (0.1.19-44428cd) and now I have an issue with my SAM->BAM->BAM_Sorted Pipeline using the Linux pipe. In samtools version 0.1.18 (r982:295) the following always worked well:
Code:
samtools view -bS -1 temp.sam | samtools sort - temp_sorted
But with the new version I always get the following error:
Code:
[bam_header_read] EOF marker is absent. The input is probably truncated
I also ran the pipeline with version 0.1.18 to check whether the resulting sorted bam files are the same (regardless of the error message). Linux diff command said no. So my first question: Is the error message problematic?
This is a known bug in samtools 0.1.19,
https://github.com/samtools/samtools/issues/18

The warning is in this case probably harmless - but in general can be a sign of a truncated file related problem.
maubp is offline   Reply With Quote
Old 07-01-2013, 07:45 AM   #5
hanshart
Member
 
Location: Germany

Join Date: Nov 2011
Posts: 27
Default

Quote:
Originally Posted by maubp View Post
This is a known bug in samtools 0.1.19,
https://github.com/samtools/samtools/issues/18

The warning is in this case probably harmless - but in general can be a sign of a truncated file related problem.
Thank you maubp.
About the different sorting in version 0.1.19 in contrast to version 0.1.18:
I'm quite sure that in version 0.1.19 reads beginning at the same position are now sorted by strand (first forward, than reverse strand) whereas in version 0.1.18 they were not sorted by strand:

Code:
 diff <(samtools view temp_sorted_pipe.bam | cut -f2,4) <(samtools view temp_sorted_old_pipe.bam | cut -f2,4) -y | less -S
...
0       3709                   0       3709
0       3709                 <
16      3709                   16      3709
16      3709                   16      3709
16      3709                   16      3709
16      3709                   16      3709
                             > 0       3709
                             > 16      3710
0       3710                   0       3710
0       3710                   0       3710
0       3710                   0       3710
16      3710                 | 16      3711
0       3711                   0       3711
0       3711                   0       3711
16      3711                   16      3711
16      3711                 <
0       3712                   0       3712
0       3713                   0       3713
                             > 16      3713
                             > 16      3713
0       3713                   0       3713
0       3713                   0       3713
0       3713                   0       3713
16      3713                 <
16      3713                 <
0       3714                 <
0       3714                   0       3714
0       3714                   0       3714
...
On the left (version 0.1.19) the reads are sorted by position and strand whereas on the right (version 0.1.18) they are only sorted by position

Am I right?
Thanks
hanshart is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:14 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO