SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNAseq fastqc - Failed "Per base N content" mcfb Bioinformatics 5 09-12-2014 08:28 AM
Strange FastQC "Per base sequence content report" tu.le Bioinformatics 10 12-23-2013 04:09 PM
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink ruben6um Bioinformatics 3 10-12-2011 12:39 AM
FastQC "Per Base Sequence Content": systematic deviation at 3' end of reads d f Illumina/Solexa 4 09-28-2010 09:46 AM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-29-2009 11:49 PM

Reply
 
Thread Tools
Old 11-01-2014, 08:21 AM   #1
ysnapus
Member
 
Location: Mid-Atlantic

Join Date: Jun 2013
Posts: 22
Default MiSeq gDNA reads still fail "Kmer content" and "per base seq content" after trimming"

I have genomic DNA that was PE sequenced on the MiSeq platform. I understand there must've been some adapter read through due to the large read sizes. Even after trimming, I still get some enriched kmers and skewed GC content on either end of both pairs of reads. Here are some Kmer content graphs: , , , , ,

Here are some examples of per base GC content: ,


I ran trimmomatic with
PE -phred33 ILLUMINACLIP:TruSeq2-PE.fa:2:20:7:2 LEADING:13 TRAILING:13 SLIDINGWINDOW:4:15 MINLEN:36

My adapter file
$ cat TruSeq2-PE.fa
>PrefixPE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>PCR_Primer1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PCR_Primer1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
>PCR_Primer2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>PCR_Primer2_rc
AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
>FlowCell1
TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC
>FlowCell2
TTTTTTTTTTCAAGCAGAAGACGGCATACGA
ysnapus is offline   Reply With Quote
Old 11-01-2014, 08:55 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

First, do you know what kind library prep was used? If it was Nextera, that would explain the biased sequence near the beginning, and also why some adapters are not being removed, since you're trimming for TruSeq sequences. But if it was in fact TruSeq, then I'm not really sure about the biased composition near the beginning.

Unfortunately, because of the way FastQC compresses the base positions after base 9, it's impossible to get a good idea of what's going on at the end of the read from those graphs. But note that typical adapter-trimming will not remove adapters shorter than X bp at the very end, because it becomes too short to match the sequence confidently (X is usually a parameter). However, BBDuk can still remove those very short adapter sequences from PE reads by overlapping them to determine the insert size, so you might give that a try; just use the "tbo" flag.
Brian Bushnell is offline   Reply With Quote
Old 11-12-2014, 12:07 AM   #3
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

Just trim off the ends.
Is probably less of a headache than trying to figure out the problem.

For the high GC at the end: It seems to be that in general the longer reads have a higher chance to have GC at the end, not AT.
So if your reads are of inequal length, then you'll just get an increase of GC content at the end, because all the AT is more likely to be removed.
bastianwur is offline   Reply With Quote
Old 11-12-2014, 02:56 AM   #4
avo
Member
 
Location: Germany

Join Date: Sep 2013
Posts: 14
Default

Quote:
Originally Posted by ysnapus View Post
I ran trimmomatic with
PE -phred33 ILLUMINACLIP:TruSeq2-PE.fa:2:20:7:2 LEADING:13 TRAILING:13 SLIDINGWINDOW:4:15 MINLEN:36
I agree with Brian. Are you sure it is a TruSeq2 library? We often see this kind of sequence content plots for Nextera libraries. In this case you should just use the NexteraPE-PE.fa adapter file.
avo is offline   Reply With Quote
Old 11-12-2014, 07:25 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

Quote:
Originally Posted by avo View Post
I agree with Brian. Are you sure it is a TruSeq2 library? We often see this kind of sequence content plots for Nextera libraries. In this case you should just use the NexteraPE-PE.fa adapter file.
It definitely looks like a TruSeq (or other mechanically fragmented) library to me. Nextera (tagmentase fragmented) have a very distinct and more exaggerated base composition bias at the 5' end. TruSeq or other libraries in which the input DNA is fragmented in a Covaris still show a slight bias in their 5' base composition due to base composition influencing fragmentation sensitivity.
kmcarr is offline   Reply With Quote
Reply

Tags
adapter, fastqc, illumina, miseq, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO