SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Overrepresented kmers at the start of reads kentk Bioinformatics 20 07-23-2014 01:23 AM
Where is FastQC? sklages General 10 02-05-2012 11:46 PM
fastQC papori RNA Sequencing 3 02-04-2012 01:48 PM
FastQC; overrepresented sequences versus a grep mgg Bioinformatics 16 12-23-2011 01:51 AM
interpretation of FASTQC Overrepresented Kmers mattanswers Bioinformatics 1 09-20-2011 12:40 PM

Reply
 
Thread Tools
Old 07-05-2011, 10:25 AM   #1
PFS
Member
 
Location: USA

Join Date: Mar 2010
Posts: 55
Default fastqc - overrepresented sequences

I have run a FASTQC analysis and found out that there are hundreds of over-represented sequences in my datasets. Some of these, are Illumina PCR primers, some are single end adapters.

Does it mean I have some contamination going on? What can I do about it? Do I simply remove the primer and adapter sequences or what else?

Thanks in advance.
PFS is offline   Reply With Quote
Old 07-05-2011, 11:46 AM   #2
BAMseek
Senior Member
 
Location: St. Louis, MO, USA

Join Date: Apr 2011
Posts: 124
Default

If your data is microRNA or something where the sequence length is less than the read length, then you might end up reading into adapter or primer sequences. This would possibly show up as over-represented k-mers on the 3' end of the sequences. In this case you might want to trim the adapters before alignment.
BAMseek is offline   Reply With Quote
Old 07-05-2011, 12:36 PM   #3
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

If they're in the overrepresented sequences then it probably means that your library was contaminated with primer dimers. If you read through into adapter you'd get slightly different sequences so they would appear in the Kmer plot rather than the overrepresented sequences.

There's not much you can do about the dimers for the data you've already collected. They're easy enough to filter out if you want to remove them, but they probably won't map to whichever genome you're using anyway.

You can normally spot this kind of contamination by doing a BioAnalyser run on your library before sequencing. Others here are better qualified than me to advise on how you might avoid them in the first place.
simonandrews is offline   Reply With Quote
Old 07-05-2011, 06:18 PM   #4
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Another source contributing to the over-represented sequences is reads from rRNA genes. Even with rRNA removal, oftentimes you still have high-level rRNA reads.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Reply

Tags
adapter, fastqc, primer, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:42 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO