Seqanswers Leaderboard Ad

**fkrueger** · 09-29-2010, 12:32 AM

We have seen different kinds of artefacts happening towards the ends of BS-data (especially for long Illumina reads), most often the number of Cs increases drastically which is paralleled by a drop in Ts. The imbalance in base composition in BS-reads are clearly affecting the way the Illumina pipeline is calling bases towards later cycles.

It is difficult to tell exactly what is going on without seeing the rest of the picture, such as the FastQC per base sequence quality plot. I suspect that the overall basecall quality decrease substantially after cycle 60 or so (which it always does from what we have seen for BS-Seq datasets so far). Thus, your quality trimming script might reduce your sequences to varying lengths, leaving only few reads with their original 75bp read length. These few full length reads would then make up a much higher proportion as in the original untrimmed dataset, and thus you see the sequence bias increase rather than decrease by your trimming step. Might it be possible that the insert size for some reads is too short and you start sequencing the read_2 adapter which happens to be rich in T and poor in A? (normally there should be a correlation between T and C but not T and A.....).

What we normally do prior to aligning BS-treated reads with Bismark is trim all sequences to a length which has still good quality scores AND doesn't show and kind of weird sequence bias, normally down to 50bp to be sure. 50 bp is plenty of sequence to do very good bisulfite mapping (normally 60-70%), and in addition you have paired-end reads which will further increase mapping efficiency by around 2-4% (if you do paired-end reads and the read length is very long (75+) you might read an overlapping bit of sequence in the middle from both sides, which effectively doesn't give you any additional qualitative methylation information anyway).

I hope this helps, if I was unclear please contact me again.

Kind regards,
Felix

**brentp** · 09-29-2010, 06:10 AM

hi felix, thanks for the reply.
indeed, the quality does drop after 50. but there are still plenty of reads that extend to 76bp, so it's not sampling error. in addition, i seem this same patter for many of the _1 ends from BS-Seq on the short read archive. i hadn't thought about the adaptor being the cause, i'll look into it.

i also trim before using MethylCoder, but just per-read, havent tried trimming all reads to a set length. maybe i'll set the max-length to 72 which would remove the portion with increased T.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 22 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

increase in 3' %T after filtering BS-Treated reads

Comment

Comment

Latest Articles

ad_right_rmr

News