Hi all,
This is my first post and potentially a very obvious question so please be gentle!
I have had 12 samples sequenced, on an Illumina GA machine, using triplexing.
I ran fastqc and notice exactly the same pattern in the per-base sequence content and per-base gc content for the first 12 or 13 bases for each sample.
At first I thought it could be the barcodes used to identify the different samples in the multiplexing but then that wouldn't explain why the pattern is identical for all samples, even though the samples would have different barcode sequences.
Any idea what could be causing this? my guess is something to do with the adaptor sequences. And is the best way to look for overrepresentation of reads with a given sequence at the beginning in the fastq file and delete this?
Many thanks in advance for any advice you can give on the matter.
J
This is my first post and potentially a very obvious question so please be gentle!
I have had 12 samples sequenced, on an Illumina GA machine, using triplexing.
I ran fastqc and notice exactly the same pattern in the per-base sequence content and per-base gc content for the first 12 or 13 bases for each sample.
At first I thought it could be the barcodes used to identify the different samples in the multiplexing but then that wouldn't explain why the pattern is identical for all samples, even though the samples would have different barcode sequences.
Any idea what could be causing this? my guess is something to do with the adaptor sequences. And is the best way to look for overrepresentation of reads with a given sequence at the beginning in the fastq file and delete this?
Many thanks in advance for any advice you can give on the matter.
J
Comment