When importing data into CLC Genomics Workbench for a de novo assembly (100 bp paired end reads), you are required to input the read orientation minimum and maximum distance between paired reads. How is this calculated?
Initially, I thought the range was calculated by the total size of the fragment. So, if my inserts were 200-220 bp + 120 bp for adaptors, the size of the fragment would be 320-340 bp. The reads being 100 bp on each side, the distance between both would be a minimum distance of 120 bp. However, my assembly using this logic, while OK, had the majority of the paired reads "broken" in the analysis.
After asking around, I was told that the distance shouldn't include the adaptors. So that the minimum distance between 2 reads would be around 0 or an insert size of 200-220 bp?
When I tried this though, the majority of the paired reads were broken in the analysis, and my contigs were generally smaller.
Initially, I thought the range was calculated by the total size of the fragment. So, if my inserts were 200-220 bp + 120 bp for adaptors, the size of the fragment would be 320-340 bp. The reads being 100 bp on each side, the distance between both would be a minimum distance of 120 bp. However, my assembly using this logic, while OK, had the majority of the paired reads "broken" in the analysis.
After asking around, I was told that the distance shouldn't include the adaptors. So that the minimum distance between 2 reads would be around 0 or an insert size of 200-220 bp?
When I tried this though, the majority of the paired reads were broken in the analysis, and my contigs were generally smaller.
Comment