SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat internals (splitting longer reads) ocs Bioinformatics 5 08-11-2011 02:15 AM
splitting 454 reads into kmers for diff expression Jeremy RNA Sequencing 0 01-18-2011 07:17 PM
Filter paired end BAM file based on iSize Leif Bergsagel Bioinformatics 2 12-16-2010 12:50 PM
Splitting concatenated PE fastq to two files for respect reads JayM Illumina/Solexa 5 11-05-2010 03:58 AM
Splitting 454 paired reads in a FASTQ file sjackman Bioinformatics 5 09-10-2010 12:09 PM

Reply
 
Thread Tools
Old 01-04-2012, 03:25 PM   #1
kga1978
Senior Member
 
Location: Boston, MA

Join Date: Nov 2010
Posts: 100
Question Splitting a BAM based on # of reads?

Hi All,

I'm surprised I can't find anything on this topic, but I would like to take a BAM file of, say 100million reads, and split it into multiple files containing 1million reads each. Is there an easy way of doing that? (I thought BamTools, samtools, GATK, Picard, but none of them seem to be able to do this - these tools can only split based on RG, chromosomes, etc. - I wan't to do it based on number of reads).

Thanks very much in advance.
kga1978 is offline   Reply With Quote
Old 01-04-2012, 03:55 PM   #2
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Yes. Do a view with samtools and use the unix command "split" to split by number of lines and then pipe this output to a new file. You will get a set of output files that are in SAM format. Next run them back through samtools (view -b) and pipe them to a file with a BAM extension. Now you have BAM files split by # of reads.
adaptivegenome is offline   Reply With Quote
Old 01-04-2012, 04:19 PM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

And reinsert the SAM header into each BAM file at the end if required using 'samtools reheader'
maubp is offline   Reply With Quote
Old 01-04-2012, 04:32 PM   #4
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Quote:
Originally Posted by maubp View Post
And reinsert the SAM header into each BAM file at the end if required using 'samtools reheader'
Yes, sorry for omitting this!
adaptivegenome is offline   Reply With Quote
Old 01-05-2012, 05:40 AM   #5
kga1978
Senior Member
 
Location: Boston, MA

Join Date: Nov 2010
Posts: 100
Default

Ah, yes - that's elegant. Thanks very much for the response - so easy nobody bothered making a script .
kga1978 is offline   Reply With Quote
Old 01-10-2012, 03:24 AM   #6
CHRYSES
Member
 
Location: Netherlands

Join Date: Dec 2009
Posts: 13
Default

Quote:
Originally Posted by maubp View Post
And reinsert the SAM header into each BAM file at the end if required using 'samtools reheader'
I had to do:
samtools view -Sb -T reference.fasta out.sam > out.bam

to each split, to get the header...

I don't think that the "samtools reheader" would have worked as you describe it in this thread, as the splits themselves do not have any headers.
CHRYSES is offline   Reply With Quote
Old 05-02-2013, 10:55 AM   #7
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Is there any way to split by number of reads, and not lines?

Such that each read, even if it has 10 alignments, is contained in a single new BAM file, along with other reads?
carmeyeii is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO