![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
TopHat internals (splitting longer reads) | ocs | Bioinformatics | 5 | 08-11-2011 02:15 AM |
splitting 454 reads into kmers for diff expression | Jeremy | RNA Sequencing | 0 | 01-18-2011 07:17 PM |
Filter paired end BAM file based on iSize | Leif Bergsagel | Bioinformatics | 2 | 12-16-2010 12:50 PM |
Splitting concatenated PE fastq to two files for respect reads | JayM | Illumina/Solexa | 5 | 11-05-2010 03:58 AM |
Splitting 454 paired reads in a FASTQ file | sjackman | Bioinformatics | 5 | 09-10-2010 12:09 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Boston, MA Join Date: Nov 2010
Posts: 100
|
![]()
Hi All,
I'm surprised I can't find anything on this topic, but I would like to take a BAM file of, say 100million reads, and split it into multiple files containing 1million reads each. Is there an easy way of doing that? (I thought BamTools, samtools, GATK, Picard, but none of them seem to be able to do this - these tools can only split based on RG, chromosomes, etc. - I wan't to do it based on number of reads). Thanks very much in advance. |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
Yes. Do a view with samtools and use the unix command "split" to split by number of lines and then pipe this output to a new file. You will get a set of output files that are in SAM format. Next run them back through samtools (view -b) and pipe them to a file with a BAM extension. Now you have BAM files split by # of reads.
|
![]() |
![]() |
![]() |
#3 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
And reinsert the SAM header into each BAM file at the end if required using 'samtools reheader'
|
![]() |
![]() |
![]() |
#4 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]() |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Boston, MA Join Date: Nov 2010
Posts: 100
|
![]()
Ah, yes - that's elegant. Thanks very much for the response - so easy nobody bothered making a script
![]() |
![]() |
![]() |
![]() |
#6 | |
Member
Location: Netherlands Join Date: Dec 2009
Posts: 13
|
![]() Quote:
samtools view -Sb -T reference.fasta out.sam > out.bam to each split, to get the header... I don't think that the "samtools reheader" would have worked as you describe it in this thread, as the splits themselves do not have any headers. |
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Mexico Join Date: Mar 2011
Posts: 137
|
![]()
Is there any way to split by number of reads, and not lines?
Such that each read, even if it has 10 alignments, is contained in a single new BAM file, along with other reads? |
![]() |
![]() |
![]() |
Thread Tools | |
|
|