SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks problem sorting RNAddict Bioinformatics 2 05-23-2012 09:57 AM
samtools sorting outfile is not as large as input file vinay052003 Bioinformatics 4 03-12-2012 09:03 AM
SAMTOOLS problem lapti Bioinformatics 1 11-22-2010 07:59 AM
BWA - Samtools problem joa_ds Bioinformatics 18 11-08-2009 06:55 PM
Samtools import problem karl.d Illumina/Solexa 0 06-29-2009 11:42 AM

Reply
 
Thread Tools
Old 04-18-2013, 04:45 PM   #1
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default samtools sorting problem

Hi everyone,

I encountered a strange problem with samtools sorting. A 37.6 GB sam file was generated by Stampy after mapping Illumina reads to hg19. It was converted to a 11.3 GB bam file with samtools. Then I tried to sort it and this where I hit the problem. I used fisrt default settings (1 thread, 756MB/thread):
Code:
samtools sort input.sam out_sorted.bam
After a while samtools started spitting chunks of out_sorted.bam000X.bam files, each new initiated after previous reached 130-160 MB; at 21st chunk I killed it.
Then I increased memory:
Code:
samtools sort -m 10G input.sam out_sorted.bam
This time samtools spit out only 6 chunks of 1.5-2.5 GB, then started pouring binary gibberish to stdout, and eventually hang.
When I tried to run multithreaded sorting
Code:
samtools sort -@ 8 -m 3G input.sam out_sorted.bam
the behavior was the same except chunks were spit in multiples of 8, with -@ 16 in multiples of 16, but eventually all ended up with binary gibberish to stdout.
I am using version 0.1.19-44428cd; the 2x4 cpu box has 96GB of memory, RHEL5.8.
Can anyone advise what is going on and why?
yaximik is offline   Reply With Quote
Old 04-18-2013, 06:01 PM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

A 37 gig sam file?

Why haven't you converted it to .bam?

I didn't think samtools would even work on a .sam file.

Also note that if your command line worked, it would make an output file called output.bam.bam

Why did you kill the sort function? It is supposed to make all those intermediate files, then it merges them.
swbarnes2 is offline   Reply With Quote
Old 04-18-2013, 06:25 PM   #3
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Quote:
Originally Posted by swbarnes2 View Post
A 37 gig sam file?

Why haven't you converted it to .bam?

I didn't think samtools would even work on a .sam file.

Also note that if your command line worked, it would make an output file called output.bam.bam

Why did you kill the sort function? It is supposed to make all those intermediate files, then it merges them.
This is what Stampy produced, not my choice. Then I converted it with samtools, it does take sam files with -S option
Code:
samtools view -bS -@ 16 infile.sam -o outfile.bam
If if you read again my previous post, I got not output.bam.bam, but output.bam000X.bam, where X was incremented for each new chunk. I killed it because it did not look right, it could have produced hundreds of intermediate files. Indeed, with larger allocated memory file chinks become larger and less in number, yet everytime I waited instead of combined files I got binary gibberish to stdout. It seemed that the size of chinks was roughly 1/5 of the allocated memory (150-160MB at default 756MB, 2-2.5GB at 10GB, and 11GB at 48GB). Finally I used just one default thread but gave it 72 GB - no intremediate file outfile_sorted.bam0000.bam was formed, yet after a while the same binary gibberish was spilled out to stdout.
yaximik is offline   Reply With Quote
Old 04-19-2013, 12:32 AM   #4
syfo
Just a member
 
Location: Southern EU

Join Date: Nov 2012
Posts: 103
Default

What about simply trying

Code:
samtools sort input.bam out_sorted
syfo is offline   Reply With Quote
Old 04-19-2013, 05:19 AM   #5
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Quote:
Originally Posted by syfo View Post
What about simply trying

Code:
samtools sort input.bam out_sorted
Well, the only difference is that instead of hundreds of out_sorted.bam000X.bam I get hundreds of 150MB out_sorted000X.bam files and then binary gibberish if I wait long enough. Not an enticing reason to try...
yaximik is offline   Reply With Quote
Old 04-19-2013, 05:50 AM   #6
sBeier
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 42
Default

The temporary files are normal. And I guess the binary gibberish could actually be you sorted output bam-file.
I usually use bamtools sort, but have you tried
Code:
samtools sort input.bam > output.bam
sBeier is offline   Reply With Quote
Old 04-19-2013, 06:07 AM   #7
syfo
Just a member
 
Location: Southern EU

Join Date: Nov 2012
Posts: 103
Default

Quote:
Originally Posted by yaximik View Post
Well, the only difference is that instead of hundreds of out_sorted.bam000X.bam I get hundreds of 150MB out_sorted000X.bam files and then binary gibberish if I wait long enough. Not an enticing reason to try...
How do you know? Have you tried to use it on a proper bam file?
According to the manual:

Code:
Usage:   samtools sort [options] <in.bam> <out.prefix>
I do not get the point of keeping huge sam files.
syfo is offline   Reply With Quote
Old 04-19-2013, 08:34 AM   #8
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by yaximik View Post
This is what Stampy produced, not my choice. Then I converted it with samtools, it does take sam files with -S option
Code:
samtools view -bS -@ 16 infile.sam -o outfile.bam
If if you read again my previous post, I got not output.bam.bam, but output.bam000X.bam, where X was incremented for each new chunk.
And this is totally in line with how samtools sort works.

Quote:
I killed it because it did not look right, it could have produced hundreds of intermediate files.
Probably not hundreds, but it is supposed to produce a lot of them. And then it deletes them after it merges them together.

And again, why do you still have a .sam file?

bwa outputs a .sam file too, but I don't leave them lying around. I make them into .bams as soon as possible, piping the program that makes the .sam file directly into samtools view, where possible, and I don't keep the .sams once I have .bams.

And I don't see any documentation for samtools sort that says it takes a .sam file as input. I'm using samtools 1.18, and it doesn't.
swbarnes2 is offline   Reply With Quote
Old 04-19-2013, 09:41 AM   #9
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Red face [SOLVED] samtools sorting problem

Quote:
Originally Posted by swbarnes2 View Post
And this is totally in line with how samtools sort works.



Probably not hundreds, but it is supposed to produce a lot of them. And then it deletes them after it merges them together.

And again, why do you still have a .sam file?

bwa outputs a .sam file too, but I don't leave them lying around. I make them into .bams as soon as possible, piping the program that makes the .sam file directly into samtools view, where possible, and I don't keep the .sams once I have .bams.

And I don't see any documentation for samtools sort that says it takes a .sam file as input. I'm using samtools 1.18, and it doesn't.
Samtools 0.1.19 does take sam file, and I did convert it to bam once I got the output from Stampy, which produces sam by default.

I got answer from samtools-help mailing list. Darn, that was so simple! The option -o means output to stdout, so I was getting what I asked for. I confused this with other programs, in which option -o outfile does exactly opposite. syfo was absolutely correct and I apologize for brushing the correct advice off.
yaximik is offline   Reply With Quote
Old 04-22-2013, 05:22 AM   #10
syfo
Just a member
 
Location: Southern EU

Join Date: Nov 2012
Posts: 103
Default

Quote:
Originally Posted by yaximik View Post
Samtools 0.1.19 does take sam file, and I did convert it to bam once I got the output from Stampy, which produces sam by default.

I got answer from samtools-help mailing list. Darn, that was so simple! The option -o means output to stdout, so I was getting what I asked for. I confused this with other programs, in which option -o outfile does exactly opposite. syfo was absolutely correct and I apologize for brushing the correct advice off.
No problem Thanks for the update, that could help others.
syfo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:35 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO