I am working with a bunch of fastq files from various sources. I do not know anything about the sequencing type. I have found that some have not been trimmed, and have the adapter seq in the front(in lower case), but also low quality sequence at the end (in mostly lowercase).
I can trim the adapter pretty easily (even thou it has high quality scores). However I noticed that if i try to trim the ends based on quality I get incongruencies from the lower case designation. As an example here is a sequence that i trimmed all bases with a phred score below 20.
@HPYED3V01EWA6Z
gactacgtacacactCTGCAGGCGCAGCTGGCCGAGGCGCTGGTCGAGATCGACCTGCTGAGCGAACAGCCGCagtgtgtgcgtggtcggcgtctctcaaggcacacagggagt
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIBB==:41195>@668C?DCEIIDEAAEBDIIGED<===<===GHIIIIIIIIIIIIII;::?BIIIIIIIIIIIIIIIIEBBA:>
yet what ever sequencing center designated the lowercase bases at the end as low quality.
1.) Should I just trim all bases that are lowercase? or should I only trim bases that are low quality scores?
I can trim the adapter pretty easily (even thou it has high quality scores). However I noticed that if i try to trim the ends based on quality I get incongruencies from the lower case designation. As an example here is a sequence that i trimmed all bases with a phred score below 20.
@HPYED3V01EWA6Z
gactacgtacacactCTGCAGGCGCAGCTGGCCGAGGCGCTGGTCGAGATCGACCTGCTGAGCGAACAGCCGCagtgtgtgcgtggtcggcgtctctcaaggcacacagggagt
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIBB==:41195>@668C?DCEIIDEAAEBDIIGED<===<===GHIIIIIIIIIIIIII;::?BIIIIIIIIIIIIIIIIEBBA:>
yet what ever sequencing center designated the lowercase bases at the end as low quality.
1.) Should I just trim all bases that are lowercase? or should I only trim bases that are low quality scores?