Unconfigured Ad

**GenoMax** · 07-09-2013, 03:23 AM

Are the inserts smaller than the read lengths (150 cycles in this case)? If this is a paired-end experiment then you can easily see if the two reads overlap to a large degree by using a tool such as FLASH

**kirstyn** · 07-10-2013, 02:20 AM

Thanks for your reply

. The average size of my libraries is usually >350bp, which I assumed was ok for 150bp PE reads? Looking in Tablet there is a large degree of overlap between paired reads- is this indicative of the library being too short? I am mapping the reads to a reference sequence and looking for SNPs so I am not worried about overlap in the read pairs. How would this affect the 3'end of my reads?

**GenoMax** · 07-10-2013, 03:23 AM

Originally posted by kirstyn View Post

Thanks for your reply

. The average size of my libraries is usually >350bp, which I assumed was ok for 150bp PE reads? Looking in Tablet there is a large degree of overlap between paired reads- is this indicative of the library being too short? I am mapping the reads to a reference sequence and looking for SNPs so I am not worried about overlap in the read pairs. How would this affect the 3'end of my reads?

You can easily determine how big the inserts are by looking at the extent of overlap (it sounds like a fraction of your library is no where near the expected 350 bp size). If some of the inserts are smaller than 150 bp then you will start reading into the adapter at the other end and beyond. If these reads are not aligning well on the 3'-end then you may need to trim them.

**kmcarr** · 07-10-2013, 08:27 AM

Originally posted by kirstyn View Post

The first 20 or so bases at the 5'end are due to Nextera which I just trim off

I know this wasn't the question you were asking but it's not really necessary to trim off those 5' bases. The sequence is not incorrect. It simply represents the slight bias that the Nextera tagmentase has for certain sequence composition.

**kirstyn** · 07-12-2013, 01:24 AM

Originally posted by kmcarr View Post

I know this wasn't the question you were asking but it's not really necessary to trim off those 5' bases. The sequence is not incorrect. It simply represents the slight bias that the Nextera tagmentase has for certain sequence composition.

Yes thanks for that comment. I wasn't quite sure if I should trim off the 5' bases, especially since I am mapping my reads but I had read about both random hexamer and nextera transposome bias so I decided to trim! I think I will try it without too!

**MU Core** · 12-19-2016, 07:15 AM

I observe the same 3' characteristic as previously reported in this thread. A representative plot is provided. The plot shown is a DNA library with insert size >350bp. However, we see this in all our FastQC plots. It is independent of library type, instrument (NextSeq or HiSeq), or read length (50, 75, or 100 bases). Therefore, I'm not inclined to see this as a library prep/chemistry issue. Has anyone also encountered this characteristic and identified a reason? Thank you in advance for comments.

Attached Files

plot.jpg (23.5 KB, 240 views)

**nucacidhunter** · 12-19-2016, 08:39 AM

It is either library or possibly demultiplexing issue. Could you post plots from other runs with similar pattern(s) with the library electropherogram.

**Michael.Ante** · 12-19-2016, 08:45 AM

Did you see this pattern also with the "--nogroup" option?
The bases are binned without that option; which let the distribution may look smoother than it is. The last base, shown in your figure, is just a single bin.

**Brian Bushnell** · 12-19-2016, 08:49 AM

Originally posted by MU Core View Post

I observe the same 3' characteristic as previously reported in this thread. A representative plot is provided. The plot shown is a DNA library with insert size >350bp. However, we see this in all our FastQC plots. It is independent of library type, instrument (NextSeq or HiSeq), or read length (50, 75, or 100 bases). Therefore, I'm not inclined to see this as a library prep/chemistry issue. Has anyone also encountered this characteristic and identified a reason? Thank you in advance for comments.

Are you using Nextera for fragmentation? That's been identified as the cause of severe bias on the 5' (left) end. However, yours looks very sedate, so if I were to guess, I would say this is NOT a Nextera library. Have you looked into the empirical error rates (from mapping) of the left end to see if there is a corresponding increase? That will indicate whether this is bias, or an actual base-calling/non-genomic sequence issue.

The 3' end is just showing the normal Illumina biased/low-quality last base due to a lack of a subsequent base call needed for calibration; I always trim the last base in 76/101/151/etc. runs.

**MU Core** · 12-19-2016, 10:11 AM

Here are a couple more examples. Sample M is that of a DNA PCR-free library sequenced on a HiSeq. Sample W is a TruSeq mRNA library sequenced on a NextSeq.

Brian and Michael's suggestions both offer an explanation that I think explains these observations. It would also suggest that trimming the reads prior to the FastQC report being generated that the bias in the 3'end will removed. I'll give this a try and share the results.

Thank you again for your comments.

Attached Files

**Brian Bushnell** · 12-19-2016, 10:17 AM

Sample W looks very typical of Nextera. Trimming the 3' end is not recommended in these cases because the bases are correct. It will not change the bias, just hide the bias so that your FastQC report looks better.

**MU Core** · 12-19-2016, 10:42 AM

Brian, you were correct though that these libraries were not Nextera.

**Brian Bushnell** · 12-19-2016, 10:47 AM

Oh, that's odd, then. There are some other things like random-hexamer-primed libraries that also have similar issues. I think it would be worthwhile generating an error-rate histogram to verify whether the mismatch rate is increased in that region. You can do so with BBMap like this:

bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt bhist=bhist.txt whist=qhist.txt

If the error rate is not increased, I recommend against trimming.

**nucacidhunter** · 12-19-2016, 10:00 PM

I have seen this pattern in low diversity amplicons only and their FastQC pattern matches the Data By Cycle (%Base) in SAV of run.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Strange fastqc per base sequence content 3'end

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News