View Single Post
Old 07-03-2011, 07:39 PM   #6
history_of_robots
Junior Member
 
Location: Caltech

Join Date: May 2011
Posts: 9
Default

Quote:
Originally Posted by raela View Post
That would most likely be the issue! You need to combine the chromosomes into one file. On a *nix machine, say each is named chr##.fa (I know it's this way for the horse.. chr1, chr2, chr3, ... chrX) - you would do
cat chr*.fa > genome.fa
This tells it to put the contents of all files in the new file genome.fa. Then you index, but, you probably want to use -a bwtsw.. I believe not including that flag was my error. So, you'd do
bwa index -p prefix -a bwtsw genome.fa
You are suggesting to use '-a bwtsw'. On BWA manual (http://bio-bwa.sourceforge.net/bwa.shtml) it says: "BWA-SW can also be used to align ~100bp reads, but it is slower than the short-read algorithm." and "On low-error short queries, BWA-SW is slower and less accurate than the first algorithm [IS], but on long queries, it is better". So it is '-a is' that is seemed to be required for BWA indexing a genome for subsequent short read alignment. However, in the same manual page it says: 'IS is moderately fast, but does not work with database larger than 2GB'. A complete genome can be larger than that (for example mm9.fa is 2.5GB). So I am wondering if chromosomes should be indexed separately. Unfortunately, it seems that in this case BWA will have to be run on each chromosome separately it seems. Or is there another way to use IS on the whole genome?
__________________
"Letís start with the three fundamental Rules of Robotics...."
history_of_robots is offline   Reply With Quote