Hi all,
I decided to check out the alignment package Mosaik to create an assembly of a bacterial genome that we are working on. Usually we just use Newbler to create de novo assemblies (and in fact we already have). We've sequenced 12 strains of the same species, using 454 titanium (not paired end). We then, after assembly, closed two of the genomes on the bench with PCR. I'd like to reduce the number of contigs in the other strains by using the closed genomes as reference sequences. Well, also I'd like to get the assemblies into SAM format, since Newbler doesn't support that as output yet.
Mosaik is the first one I've been looking at, but I'm having an issue. I create the reference using one of the closed genomes (fasta file consisting of a single contig, no quality information) with this command:
./MosaikBuild -fr B475.fasta -oa B475.dat
Then I create the input file for the sequence fragments from one of our runs (leading sequence i.e. MIDs etc stripped):
./MosaikBuild -fr B476.fasta -st 454 -out B476.dat -fq B476.qual
Both of the above commands appear to work fine, however using the command:
./MosaikAligner -in B476.dat -out B475_B476_aligned.dat -ia B475.dat
Nets this problem (end of output):
Alignment statistics (mates):
===================================
# failed hash: 1774 ( 35.9 %)
# filtered out: 3169 ( 64.1 %)
-----------------------------------
total: 4943
total aligned: 0 ( 0.0 %)
MosaikAligner CPU time: 39.200 s, wall time: 40.548 s
If I change some of the stats to be more forgiving, i.e. add the flags:
-hs 12 -mm 10
None of the sequences "failed hash", but they are still all filtered out. Am I doing something obviously wrong? The Alignment statistics (mates) title worries me, since this isn't mated pair reads, just single ends. Ideas?
~josh
I decided to check out the alignment package Mosaik to create an assembly of a bacterial genome that we are working on. Usually we just use Newbler to create de novo assemblies (and in fact we already have). We've sequenced 12 strains of the same species, using 454 titanium (not paired end). We then, after assembly, closed two of the genomes on the bench with PCR. I'd like to reduce the number of contigs in the other strains by using the closed genomes as reference sequences. Well, also I'd like to get the assemblies into SAM format, since Newbler doesn't support that as output yet.
Mosaik is the first one I've been looking at, but I'm having an issue. I create the reference using one of the closed genomes (fasta file consisting of a single contig, no quality information) with this command:
./MosaikBuild -fr B475.fasta -oa B475.dat
Then I create the input file for the sequence fragments from one of our runs (leading sequence i.e. MIDs etc stripped):
./MosaikBuild -fr B476.fasta -st 454 -out B476.dat -fq B476.qual
Both of the above commands appear to work fine, however using the command:
./MosaikAligner -in B476.dat -out B475_B476_aligned.dat -ia B475.dat
Nets this problem (end of output):
Alignment statistics (mates):
===================================
# failed hash: 1774 ( 35.9 %)
# filtered out: 3169 ( 64.1 %)
-----------------------------------
total: 4943
total aligned: 0 ( 0.0 %)
MosaikAligner CPU time: 39.200 s, wall time: 40.548 s
If I change some of the stats to be more forgiving, i.e. add the flags:
-hs 12 -mm 10
None of the sequences "failed hash", but they are still all filtered out. Am I doing something obviously wrong? The Alignment statistics (mates) title worries me, since this isn't mated pair reads, just single ends. Ideas?
~josh
Comment