Hi everyone,
I constructed libraries with the NuGEN WGA kit, followed by their double-stranding kit, and then into Agilent SureSelect.
I have found that a variable percentage of my unmapped reads (both R1 and R2 from Illumina 100x100 PE) contain some of the NuGEN adaptor at the beginning of the read. NuGEN ligates on their own adaptor for whole genome amplification and it is trimmed during the double-stranding step of library prep. However, incomplete trimming could mean that when I ligated on Agilent SureSelect primers, they were ligated onto a chimeric fragment containing both human gDNA and 1-20 bases of remaining NuGEN adaptor.
If the NuGEN adaptor sequence is known (FastQC flags it as an overrepresented sequence) is there a tool that will trim it from the beginning of the read down to 5 bases or so? For example, if I know the sequence is ACTGACTGACTGACTGACTG, would it trim:
ACTGACTGACTGACTGACTGNNNNNN
CTGACTGACTGACTGACTGNNNNN
TGACTGACTGACTGACTGNNNNN
GACTGACTGACTGACTGNNNNN
ACTGACTGACTGACTGNNNNN
etc, down to five bases of adaptor (or whatever # I specify) because at that length I would not know whether I'm trimming off specific sequence or adaptor. I do not want to trim of N bases from the beginning arbitrarily since most of my reads (50-90%) do not contain adaptor.
I constructed libraries with the NuGEN WGA kit, followed by their double-stranding kit, and then into Agilent SureSelect.
I have found that a variable percentage of my unmapped reads (both R1 and R2 from Illumina 100x100 PE) contain some of the NuGEN adaptor at the beginning of the read. NuGEN ligates on their own adaptor for whole genome amplification and it is trimmed during the double-stranding step of library prep. However, incomplete trimming could mean that when I ligated on Agilent SureSelect primers, they were ligated onto a chimeric fragment containing both human gDNA and 1-20 bases of remaining NuGEN adaptor.
If the NuGEN adaptor sequence is known (FastQC flags it as an overrepresented sequence) is there a tool that will trim it from the beginning of the read down to 5 bases or so? For example, if I know the sequence is ACTGACTGACTGACTGACTG, would it trim:
ACTGACTGACTGACTGACTGNNNNNN
CTGACTGACTGACTGACTGNNNNN
TGACTGACTGACTGACTGNNNNN
GACTGACTGACTGACTGNNNNN
ACTGACTGACTGACTGNNNNN
etc, down to five bases of adaptor (or whatever # I specify) because at that length I would not know whether I'm trimming off specific sequence or adaptor. I do not want to trim of N bases from the beginning arbitrarily since most of my reads (50-90%) do not contain adaptor.