I am assembling avian influenza sequences and run into a strange problem. I have a set of reference sequences of about 100 sequences totaling 150K.
If I use this set I have 2 million sequences mapping to a gene called N8. The assembled sequence has no homology to N8 but is scrambled version of a gene N7. N7 is in the reference set and 0 sequences map to it.
If I use just the N7 and N8, assembly works fine and 2 million sequences map properly to N8 and the assembled sequence has 93% nucleotide identity to N8.
Has anyone else seen this kind of behavior? I am using beta 6 version of bowtie 2.
If I use this set I have 2 million sequences mapping to a gene called N8. The assembled sequence has no homology to N8 but is scrambled version of a gene N7. N7 is in the reference set and 0 sequences map to it.
If I use just the N7 and N8, assembly works fine and 2 million sequences map properly to N8 and the assembled sequence has 93% nucleotide identity to N8.
Has anyone else seen this kind of behavior? I am using beta 6 version of bowtie 2.