SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK: sorting vcf file given a reference file jorge Bioinformatics 4 01-14-2015 01:16 PM
Samtools mpileup creates extra large file after local realignment Hkins552 Bioinformatics 7 07-21-2011 02:20 PM
sorting sam file crh Bioinformatics 2 06-16-2011 07:45 AM
SAMTOOLS AND GATK input file (base qualities) hrajasim Illumina/Solexa 0 05-13-2011 06:58 AM
Sorting large files scami Bioinformatics 3 09-21-2010 12:45 AM

Reply
 
Thread Tools
Old 03-12-2012, 08:36 AM   #1
vinay052003
Member
 
Location: Atlanta, US

Join Date: Jan 2010
Posts: 59
Default samtools sorting outfile is not as large as input file

I tried to sort a bam file for paired-end genomic data using samtools sort option. BAM file size is about 85gb. I sorted them on read names instead of chromosome coordinates. The output file is about 79gb. I am wondering where did 6gb of data from the input file go? Has anyone seen this type of inconsistency before?

Thanks.
vinay052003 is offline   Reply With Quote
Old 03-12-2012, 09:15 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

That's the magic of sorting a .bam; it comes out smaller, because it compresses better.

If you do flagstat on the .bam before and after sorting, you'll see that they have the same number of reads.
swbarnes2 is offline   Reply With Quote
Old 03-12-2012, 09:53 AM   #3
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Definitely makes sure you have the right number of reads and that the sort did not prematurely terminate.
adaptivegenome is offline   Reply With Quote
Old 03-12-2012, 09:54 AM   #4
vinay052003
Member
 
Location: Atlanta, US

Join Date: Jan 2010
Posts: 59
Default

Thanks a lot......... how would I know if the sorted file is complete?
vinay052003 is offline   Reply With Quote
Old 03-12-2012, 10:03 AM   #5
jflowers
Member
 
Location: New York, NY

Join Date: Oct 2011
Posts: 41
Default

samtools flagstat or use samtools view -c to the count the reads
jflowers is offline   Reply With Quote
Reply

Tags
samtools sort bam size

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO