Hi!
I'm working on a transcriptome de novo assembly, and I'm having some difficulties removing adapter contaminants from my 100 bp PE reads. According to FASTQC, I have no more than ~1% of my reads with overrepresented adapter sequences. For ex:
90743 reads 0.4% of reads TruSeq Adapter, Index 6 (100% over 50bp)
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG
42478 reads 0.10% of reads TruSeq Adapter, Index 6 (100% over 50bp)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC
I've been tried to remove these sequences using fastq-mcf, since this seems to work well for PE reads.
However, I keep getting way more reads removed then what FastQC is telling me is present. I've been playing around with parameters, but without much improvement. I'm realizing now that the program trims partial adapter sequences from the ends, possibly even if there are just a few base pairs that match the adapter sequence? Is this generally what adapter trimming does?What if I'm only interested in trimming out the sequences overrepresented as described in FastQC, (full 65 bp-50 bp of adapter contaminants)?
There are so many parameters for this program, and I'm not sure how to set them to remove only what I need... and now I'm not sure what exactly I'm "supposed" to be removing (full or partial adapter sequence matches...)
Cheers
I'm working on a transcriptome de novo assembly, and I'm having some difficulties removing adapter contaminants from my 100 bp PE reads. According to FASTQC, I have no more than ~1% of my reads with overrepresented adapter sequences. For ex:
90743 reads 0.4% of reads TruSeq Adapter, Index 6 (100% over 50bp)
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG
42478 reads 0.10% of reads TruSeq Adapter, Index 6 (100% over 50bp)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC
I've been tried to remove these sequences using fastq-mcf, since this seems to work well for PE reads.
However, I keep getting way more reads removed then what FastQC is telling me is present. I've been playing around with parameters, but without much improvement. I'm realizing now that the program trims partial adapter sequences from the ends, possibly even if there are just a few base pairs that match the adapter sequence? Is this generally what adapter trimming does?What if I'm only interested in trimming out the sequences overrepresented as described in FastQC, (full 65 bp-50 bp of adapter contaminants)?
There are so many parameters for this program, and I'm not sure how to set them to remove only what I need... and now I'm not sure what exactly I'm "supposed" to be removing (full or partial adapter sequence matches...)
Cheers
Comment