SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Smatools mpileup thinks my sorted bam files are not sorted plumb_r Bioinformatics 6 04-28-2014 08:22 AM
DNA concentration increased MonaE Ion Torrent 19 09-05-2013 09:49 AM
convert sorted bam to sorted sam for htseq-count ugolino Bioinformatics 3 10-19-2012 06:30 AM
Insert mean size in *.rsem.sorted.bam.stats file AsoBioInfo Bioinformatics 0 08-25-2012 09:57 PM

Reply
 
Thread Tools
Old 02-19-2014, 01:57 AM   #1
Parashar
Junior Member
 
Location: India

Join Date: Feb 2014
Posts: 5
Exclamation Sorted bamfiles have increased size!!

BACKGROUND:
I have RNA-Seq data from Illumina platform.
I aligned the reads post-QC onto hg19 bowtie index using tophat.
Now I wished to get the raw count of reads mapped to each gene using Ht-seq-count.

WHAT I DID:
However the HT-Seq reports error that it is unable to find the mate pair and asks whether the sam file is properly sorted.
Hence, after reading posts on the problem. I sorted my bam files using samtoools.
samtools sort -n accepted_hits.bam accepted_hits_bam_sorted

CONCERN:
Now the sorted BAM file has increased in size and I wish to know why his has happened.
Though it might not be but seems highly unintuitive.
Parashar is offline   Reply With Quote
Old 02-19-2014, 02:26 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

That's not surprising. Remember that BAM files are compressed, with the compression algorithm being more efficient the more similar neighbouring reads are (i.e., coordinate-sorted files should compress to a greater extent).
dpryan is offline   Reply With Quote
Old 02-19-2014, 02:37 AM   #3
Parashar
Junior Member
 
Location: India

Join Date: Feb 2014
Posts: 5
Default

Thanks dpryan.
Just wanted to confirm if something went horribly wrong.
I'll now be following your earlier reply on a different post to run HTSeq
Parashar is offline   Reply With Quote
Old 02-19-2014, 11:16 PM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

You sorted by read name (-n), but the default sorting by mapping position puts similar reads together, and therefore their sequence data compresses well, so coordinate sorted BAM files are usually smaller.
maubp is offline   Reply With Quote
Reply

Tags
bam, htseq, rna-seq, samtools, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO