Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Threshold quality score to determine the quality read of ILLUMINA reads problem edge Illumina/Solexa 35 11-02-2015 10:31 AM
Problemastic reads quality score distribution by dwgsim yjx1217 Bioinformatics 1 10-08-2012 01:28 PM
about illumina reads quality score gridbird Illumina/Solexa 4 08-08-2011 05:10 AM
De novo assembling taking into account base quality richenlaw Bioinformatics 1 03-12-2011 04:31 AM
Threshold quality score to determine the quality read of ILLUMINA reads problem edge General 1 09-13-2010 02:22 PM

Thread Tools
Old 06-20-2018, 10:11 PM   #1
Location: Montpellier, France

Join Date: Dec 2009
Posts: 13
Default Merging PE reads taking into account minimum positional quality score

hello everyone,
I face an unusual issue with merging Illumina Paired-end reads and controling for the merging using individual base quality:
using your preferred merge (let us say Pear, FLash...) you can of course control for the effect of quality score difference between R1 and R2 for a given position and decide wether the difference is large enough to make the base with highest value the one you keep. Ok, works fine in most cases. Here I have a different situation where I would like to do that + if one of the two scores for a given position is < threshold (let us say 20 for example) then the other strand is kept, whatever the difference in score AS LONG as its own score > 20. And that I could not find it from any PE reads merger yet !
Any (verified) idea anyone?

just to avoid out of topic comments, &- yes, I alreadyy though of softmasking low quality bases before merging but I could not find any merger that uses this information also. 2- No, I cannot just remove the reads with low quality bases before merging as I cannot use a strategy based on the % of low quality reads or sliding windows as I really want to use the individual position quality profile for rare variants calling. 3- yes, using illumina correction algorithms like dada2 is an option I will also explore but I would prefer exploring the solution I detail during merging first.
Thanks all !
martinjf is offline   Reply With Quote
Old 06-21-2018, 12:55 AM   #2
Senior Member
Location: Cambridge

Join Date: Sep 2010
Posts: 115

You can do custom trimming (based on different profile of errors between R1&R2 (to take into account the higher chance of presence of the erroneous/random data in the end of the R2 read)).

One can use perl for prototyping a read trimmer (or modify any of the open source ones (if you have a bit of C/C++ knowledge).

May I ask you a couple questions:

1. What readlength and platform had you used for you sequencing data aquisition?
Was it 4chanell or 2 chanell imaging? 2 chanell one has 2x-10x higher the error rate and is limited to 150bp read length, so use a 4 chanell platform.

2. What was your cluster density? If you use MiSeq or a Hiseq 2500 - I would undercluster to get lower error rate if looking for rare things - lower cluster density lowers error rate by 3X-6X, (while also lowering raw data yields).

3. Did you use the PCRFree library prep protocol? What was your average insert size and size distribution?

Last edited by Markiyan; 06-21-2018 at 12:56 AM. Reason: Typo fix.
Markiyan is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 10:23 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO