![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RNAseq fastqc - Failed "Per base N content" | mcfb | Bioinformatics | 5 | 09-12-2014 09:28 AM |
Strange FastQC "Per base sequence content report" | tu.le | Bioinformatics | 10 | 12-23-2013 05:09 PM |
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink | ruben6um | Bioinformatics | 3 | 10-12-2011 01:39 AM |
FastQC "Per Base Sequence Content": systematic deviation at 3' end of reads | d f | Illumina/Solexa | 4 | 09-28-2010 10:46 AM |
SEQanswers second "publication": "How to map billions of short reads onto genomes" | ECO | Literature Watch | 0 | 06-30-2009 12:49 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Mid-Atlantic Join Date: Jun 2013
Posts: 22
|
![]()
I have genomic DNA that was PE sequenced on the MiSeq platform. I understand there must've been some adapter read through due to the large read sizes. Even after trimming, I still get some enriched kmers and skewed GC content on either end of both pairs of reads. Here are some Kmer content graphs:
![]() ![]() ![]() ![]() ![]() ![]() Here are some examples of per base GC content: ![]() ![]() I ran trimmomatic with PE -phred33 ILLUMINACLIP:TruSeq2-PE.fa:2:20:7:2 LEADING:13 TRAILING:13 SLIDINGWINDOW:4:15 MINLEN:36 My adapter file $ cat TruSeq2-PE.fa >PrefixPE/1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >PrefixPE/2 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT >PCR_Primer1 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT >PCR_Primer1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT >PCR_Primer2 CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT >PCR_Primer2_rc AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG >FlowCell1 TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC >FlowCell2 TTTTTTTTTTCAAGCAGAAGACGGCATACGA |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
First, do you know what kind library prep was used? If it was Nextera, that would explain the biased sequence near the beginning, and also why some adapters are not being removed, since you're trimming for TruSeq sequences. But if it was in fact TruSeq, then I'm not really sure about the biased composition near the beginning.
Unfortunately, because of the way FastQC compresses the base positions after base 9, it's impossible to get a good idea of what's going on at the end of the read from those graphs. But note that typical adapter-trimming will not remove adapters shorter than X bp at the very end, because it becomes too short to match the sequence confidently (X is usually a parameter). However, BBDuk can still remove those very short adapter sequences from PE reads by overlapping them to determine the insert size, so you might give that a try; just use the "tbo" flag. |
![]() |
![]() |
![]() |
#3 |
Member
Location: Germany/Netherlands Join Date: Feb 2014
Posts: 98
|
![]()
Just trim off the ends.
Is probably less of a headache than trying to figure out the problem. For the high GC at the end: It seems to be that in general the longer reads have a higher chance to have GC at the end, not AT. So if your reads are of inequal length, then you'll just get an increase of GC content at the end, because all the AT is more likely to be removed. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Germany Join Date: Sep 2013
Posts: 14
|
![]()
I agree with Brian. Are you sure it is a TruSeq2 library? We often see this kind of sequence content plots for Nextera libraries. In this case you should just use the NexteraPE-PE.fa adapter file.
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
It definitely looks like a TruSeq (or other mechanically fragmented) library to me. Nextera (tagmentase fragmented) have a very distinct and more exaggerated base composition bias at the 5' end. TruSeq or other libraries in which the input DNA is fragmented in a Covaris still show a slight bias in their 5' base composition due to base composition influencing fragmentation sensitivity.
|
![]() |
![]() |
![]() |
Tags |
adapter, fastqc, illumina, miseq, trimmomatic |
Thread Tools | |
|
|