I wish to align to human genome samples sequenced in a Illumina HiSeq 4000. Most reads (single end) are between 20-40bp long, clean from adaptors. The base calls have 38 quality on average.
But some reads have some overrepresented repetitive kmers (like AAAAAA, CCCCCC, AGCGGGG) at the end, mostly after the 35th base, detected by FastQC. I assume these are from sequencing errors at the end of the reaction.
I'm new to sequencing and wish to align with BWA on Galaxy main. The program offers a convenient option:
"-q quality threshold for read trimming down to 35bp"
Which alignment quality threshold should I chose to soft clip the unwanted kmers at the end of some reads? Any other hint about soft clipping and other program parameters I could fine tune for this alignment? I'm currently using the defaults for these.
But some reads have some overrepresented repetitive kmers (like AAAAAA, CCCCCC, AGCGGGG) at the end, mostly after the 35th base, detected by FastQC. I assume these are from sequencing errors at the end of the reaction.
I'm new to sequencing and wish to align with BWA on Galaxy main. The program offers a convenient option:
"-q quality threshold for read trimming down to 35bp"
Which alignment quality threshold should I chose to soft clip the unwanted kmers at the end of some reads? Any other hint about soft clipping and other program parameters I could fine tune for this alignment? I'm currently using the defaults for these.