SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   samtools sorting outfile is not as large as input file (http://seqanswers.com/forums/showthread.php?t=18349)

vinay052003 03-12-2012 08:36 AM

samtools sorting outfile is not as large as input file
 
I tried to sort a bam file for paired-end genomic data using samtools sort option. BAM file size is about 85gb. I sorted them on read names instead of chromosome coordinates. The output file is about 79gb. I am wondering where did 6gb of data from the input file go? Has anyone seen this type of inconsistency before?

Thanks.

swbarnes2 03-12-2012 09:15 AM

That's the magic of sorting a .bam; it comes out smaller, because it compresses better.

If you do flagstat on the .bam before and after sorting, you'll see that they have the same number of reads.

adaptivegenome 03-12-2012 09:53 AM

Definitely makes sure you have the right number of reads and that the sort did not prematurely terminate.

vinay052003 03-12-2012 09:54 AM

Thanks a lot......... how would I know if the sorted file is complete?

jflowers 03-12-2012 10:03 AM

samtools flagstat or use samtools view -c to the count the reads


All times are GMT -8. The time now is 05:43 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.