Hi all,
I've been attempting a de novo assembly with 454 single reads and Illumina 2.5 kb mate pairs. I had built an assembly of just the 454 reads and decided to Blast some of the mate pairs against that assembly and noticed that for several of them, the two ends mapped only a few hundred bases apart. So I ran a much larger batch (1 million mate pairs) to get a better sense of what was going on. For most (~68%) of the mate pairs, one or both ends didn't hit the assembly, so they were uninformative, but for more than 20%, the ends mapped less than 500 bp apart (avg. 320 bp), with the two reads pointing inward toward each other, the typical paired end arrangement. Only about 2.5% were clearly mate pair reads (avg. length 2420 bp, ends pointing away from each other), but I expect that most of the uninformative reads would fall into this group. It appears that the selection of the biotinylated mate pair fragments wasn't very stringent so that a lot of non-mate-pair fragments came through.
Questions: has anyone else checked the level of paired end fragments in their mate pairs? Any idea if 20% is excessive? How do assemblers like Mira and Velvet deal with this mixture, which is likely never completely clean? Would it help the assembly to remove the paired ends or to separate them out and put them in as a third type of sequence? Any thoughts would be appreciated.
Cheers,
Mike
I've been attempting a de novo assembly with 454 single reads and Illumina 2.5 kb mate pairs. I had built an assembly of just the 454 reads and decided to Blast some of the mate pairs against that assembly and noticed that for several of them, the two ends mapped only a few hundred bases apart. So I ran a much larger batch (1 million mate pairs) to get a better sense of what was going on. For most (~68%) of the mate pairs, one or both ends didn't hit the assembly, so they were uninformative, but for more than 20%, the ends mapped less than 500 bp apart (avg. 320 bp), with the two reads pointing inward toward each other, the typical paired end arrangement. Only about 2.5% were clearly mate pair reads (avg. length 2420 bp, ends pointing away from each other), but I expect that most of the uninformative reads would fall into this group. It appears that the selection of the biotinylated mate pair fragments wasn't very stringent so that a lot of non-mate-pair fragments came through.
Questions: has anyone else checked the level of paired end fragments in their mate pairs? Any idea if 20% is excessive? How do assemblers like Mira and Velvet deal with this mixture, which is likely never completely clean? Would it help the assembly to remove the paired ends or to separate them out and put them in as a third type of sequence? Any thoughts would be appreciated.
Cheers,
Mike
Comment