View Single Post
Old 04-10-2015, 10:48 PM   #3
Junior Member
Location: alberta

Join Date: Feb 2015
Posts: 8

Thanks Brian !

You should align your reads to the contigs, not the other way around.
It makes more sense indeed.
However, I also just tried samtools view and got the same number of sequences as bowtie calculated
samtools -c -F 4 <input.sam>
The trick is that I think the sequences in the SAM file are the ones of the reads and not the contigs. The other way around should give me the contig sequences though.

But if I align the reads to the contigs as you mentioned should I use local alignment instead of end-to-end?
Also, how could I discard sequences that align with gaps? Do you think setting --gbar <int> to the longest contig value would work? (--gbar <int> disallows gaps within <int> positions of the beginning or end of the read)

Generally, though, if you assembled the sequences together, most of the contigs will probably come from both datasets
I read several times that the more reads you use the better the assembly, but it is another (complex) topic maybe…
Alun3.1 is offline   Reply With Quote