View Single Post
Old 01-05-2012, 10:23 AM   #2
Location: Stockholm

Join Date: Aug 2009
Posts: 75

My strategy so far was to not worry too much about the bases that get lost due to random matches. It depends on your data, but although 94613 looks large, you lose “only” 95613x3 bp, which may not be that bad.

However, the “count” column in your histogram decreases montonically from length 3 to 32. This is different from what I see in my data. One explanation is that your adapter almost never appears partially – it's either fully there or not at all and all matches from length 3 to 32 are, in fact, spurious. In that case, you can safely set --overlap to 33.

I'll probably change the output that cutadapt prints to make this all a bit clearer. Perhaps helpful would be print the number of bases removed and to give an estimate of how many of those were removed due to chance alone.
mmartin is offline   Reply With Quote