Dear Bioinformaticians,
I'm sorry to admit that after so many posts I've seen regarding the topic, I still have some doubts on how to do it correctly.
I have sequenced (using Illumina Miseq) some monocellular parasite genomes, planning to map the data to reference with BWA mem, remove duplicates with Picard, finally do SNP calling.
As first step, I have done some quality control using FastQC, and found out that some of the samples have adapter contamination up to ~ 0.14%. I believe this is due to some fragments being too short and so the machine read over the insert. I was planning to use Trimmomatic 0.33 for adapter trimming, but I have noticed that the sequences in the TruSeq3-PE-2.fa file are 34 nucleotides long, while in the FastQC report the segments are 50 nt, including the index in some cases .
I wonder if it would be more correct to create a file with the specific 50 nt sequences to use as a guide to trim adapters, or should add them to the already existing fasta file.
What is probably confusing me is that I initially thought the insert would be at the end of each affected read in the fastq file, while it is actually at the beginning of the read. Can you please advice me?
Thanks for your help,
Max
I'm sorry to admit that after so many posts I've seen regarding the topic, I still have some doubts on how to do it correctly.
I have sequenced (using Illumina Miseq) some monocellular parasite genomes, planning to map the data to reference with BWA mem, remove duplicates with Picard, finally do SNP calling.
As first step, I have done some quality control using FastQC, and found out that some of the samples have adapter contamination up to ~ 0.14%. I believe this is due to some fragments being too short and so the machine read over the insert. I was planning to use Trimmomatic 0.33 for adapter trimming, but I have noticed that the sequences in the TruSeq3-PE-2.fa file are 34 nucleotides long, while in the FastQC report the segments are 50 nt, including the index in some cases .
I wonder if it would be more correct to create a file with the specific 50 nt sequences to use as a guide to trim adapters, or should add them to the already existing fasta file.
What is probably confusing me is that I initially thought the insert would be at the end of each affected read in the fastq file, while it is actually at the beginning of the read. Can you please advice me?
Thanks for your help,
Max
Comment