SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Kmer Result of FastQC Report Azedenkae Bioinformatics 1 09-29-2014 10:13 PM
FastQC Report: A horn in Per sequence GC content ?! leekb Illumina/Solexa 3 10-23-2013 09:07 PM
Overrepresented sequences from FastQC report morning latte Bioinformatics 7 08-27-2013 08:31 AM
SOS, Question about FastQC report! lovenlong Bioinformatics 2 05-05-2013 01:37 AM
Fastqc report melNGS Bioinformatics 3 07-24-2012 12:10 AM

Reply
 
Thread Tools
Old 11-25-2016, 10:05 AM   #1
hamcan
Member
 
Location: Toronto

Join Date: Nov 2016
Posts: 19
Default FastQC Report

I ran a HiSeq on environmental samples and the purpose of the run was to blast my sequences against the NCBI-nr database to see what species my reads match to. I am not doing a denovo assembly or genome assembly.

My FastQc report passes in all aspects except: per base sequence content, per sequence GC content and kmer content.
Should I be worried? How much should I rely on a fastqc report?

Thank you in advance!
hamcan is offline   Reply With Quote
Old 11-25-2016, 11:03 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

I would worry most about the per base sequence content, depending on what it looks like, why it didn't pass.
mastal is offline   Reply With Quote
Old 11-25-2016, 01:32 PM   #3
hamcan
Member
 
Location: Toronto

Join Date: Nov 2016
Posts: 19
Default

Quote:
Originally Posted by mastal View Post
I would worry most about the per base sequence content, depending on what it looks like, why it didn't pass.
Below are the pictures of the FastQC failed reports:
Attached Images
File Type: png kmercontent.png (33.0 KB, 28 views)
File Type: png PersequenceGC.png (38.4 KB, 18 views)
File Type: png perbase.png (30.2 KB, 36 views)
hamcan is offline   Reply With Quote
Old 11-25-2016, 01:38 PM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

The per base sequence content looks OK, you might need to trim the last few bases at the 3' ends of the reads. The kmer plot looks like there might be adapters at the 3' ends of the reads too.
mastal is offline   Reply With Quote
Old 11-25-2016, 02:31 PM   #5
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

I would trim up the first 19 bps at the 5' end (which probably are the adapters) and trim the last 50 bps at the 3' end.

Also I would suggest increasing the kmer count to k 10 in FastQC to get a better idea of things for the 3' end for how much to trim.

All the best with your project.

-Zapages
Zapages is offline   Reply With Quote
Old 11-25-2016, 03:01 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by Zapages View Post
I would trim up the first 19 bps at the 5' end (which probably are the adapters) and trim the last 50 bps at the 3' end.
I think not; Nextera libraries normally look like that at the beginning due to shearing bias, but the bases are correct. The 3' end looks like adapter sequence, though, and should be adapter-trimmed.
Brian Bushnell is offline   Reply With Quote
Old 11-25-2016, 04:42 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,961
Default

Quote:
Originally Posted by Zapages View Post
I would trim up the first 19 bps at the 5' end (which probably are the adapters) and trim the last 50 bps at the 3' end.

Also I would suggest increasing the kmer count to k 10 in FastQC to get a better idea of things for the 3' end for how much to trim.

All the best with your project.

-Zapages
No trimming necessary. Refer to this post by Dr. Simon Andrews, author of FastQC.
GenoMax is offline   Reply With Quote
Old 11-25-2016, 08:24 PM   #8
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

Quote:
Originally Posted by GenoMax View Post
No trimming necessary. Refer to this post by Dr. Simon Andrews, author of FastQC.
Very interesting development and something that I always thought about this too when I was working on my data sets.

Quote:
Since the biased composition is created by the selection of sequencing fragments and not by base call errors the only effect of trimming would be to change from having a library which starts over biased positions, to having a library which starts slightly downstream of biased positions.

Thank you for sharing.

I did a lot of RNA-Seq analysis last year and earlier this year. This news was not known at that time... When I free time, I definitely will go back and check some of my old results and see if there is any improvement in my differential expression results.

Quote:
Whilst the warnings generated by this problem reflect a real issue itís not something which can be fixed, and doesnít seem to have any serious consequences for downstream analysis. Ironically if you are producing RNA-Seq libraries it would make for better QC if you were to focus on libraries which didnít have this artefact in them, as they would be the ones which were truly suspicious.
I guess, we should go with more expensive PCR-free approaches: https://konradpaszkiewicz.wordpress....biased-genome/

Thoughts?

Would you recommend this approach for older generated data that used TruSeq Library Prep kits or had 5' that were really messy? As I think back, I remember dealing with some pretty messy RNA-Seq that had to be cleaned up from Illumina HiSeq 2500 machines. I will give my old results another look when I am free.
Zapages is offline   Reply With Quote
Old 11-27-2016, 06:03 PM   #9
hamcan
Member
 
Location: Toronto

Join Date: Nov 2016
Posts: 19
Default

Quote:
Originally Posted by Brian Bushnell View Post
I think not; Nextera libraries normally look like that at the beginning due to shearing bias, but the bases are correct. The 3' end looks like adapter sequence, though, and should be adapter-trimmed.
Hey, it was adapter trimmed at the 3' end! So i'm not sure what is going on..suggestions?
hamcan is offline   Reply With Quote
Old 11-27-2016, 06:13 PM   #10
hamcan
Member
 
Location: Toronto

Join Date: Nov 2016
Posts: 19
Default

Quote:
Originally Posted by GenoMax View Post
No trimming necessary. Refer to this post by Dr. Simon Andrews, author of FastQC.
This would explain the 5' end, but how about the 3' end?
hamcan is offline   Reply With Quote
Old 11-28-2016, 09:28 AM   #11
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

You may have used the wrong adapter sequences, or simply had incomplete trimming. I suggest starting with the raw reads and performing adapter-trimming as in the post I linked, then looking at the results.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
fastqc, hiseq, illumina, kmer, sequencing analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO