SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastQC, Kmer count, Trimmomatic: no success in trimming, still fail Kmer skmotay RNA Sequencing 6 10-09-2014 07:24 AM
Kmer Result of FastQC Report Azedenkae Bioinformatics 1 09-29-2014 11:13 PM
fastqc Kmer Interpretation Help ifthenelse Bioinformatics 0 01-08-2014 06:18 AM
FastQC GGGGG Kmer Potjie Bioinformatics 3 07-29-2013 02:24 AM
FastQC: odd kmer content zshuhua Introductions 3 05-13-2013 08:36 PM

Reply
 
Thread Tools
Old 05-19-2017, 05:36 AM   #1
wb1016
Junior Member
 
Location: Uppsala

Join Date: Apr 2009
Posts: 5
Default FastQC Kmer, polyA, polyG, polyC, polyT...

Hi all,

I got the Illumina reads from a NGS company. It is a bird genome. The quality control running by fastQC seems very weird. Too many poly kmers as attached.

Also, I have run the assembly, and the result was N50=150, no assembly at all!

I wonder if the reads of the sequencing is problematic??

Thanks in advanced!
Attached Images
File Type: png kmer_profiles.png (36.8 KB, 12 views)

Last edited by wb1016; 05-19-2017 at 06:12 AM.
wb1016 is offline   Reply With Quote
Old 05-19-2017, 05:47 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

What do the other FastQC plots look like, the per base sequence quality, and the per base sequence content?

Have you done any adapter or quality trimming before doing the assembly?
mastal is offline   Reply With Quote
Old 05-19-2017, 05:59 AM   #3
wb1016
Junior Member
 
Location: Uppsala

Join Date: Apr 2009
Posts: 5
Default

The Per base sequence content also seems problematic, and other plots are fine, especially the Sequence Quality is very good.

Have trimmed the adapters and the low-quality reads before assembly.


Quote:
Originally Posted by mastal View Post
What do the other FastQC plots look like, the per base sequence quality, and the per base sequence content?

Have you done any adapter or quality trimming before doing the assembly?
Attached Images
File Type: png per_base_sequence_content.png (13.6 KB, 10 views)

Last edited by wb1016; 05-19-2017 at 06:09 AM.
wb1016 is offline   Reply With Quote
Old 05-19-2017, 06:19 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

That actually looks OK, as long as the %GC content is what you expect for that genome.
mastal is offline   Reply With Quote
Old 05-19-2017, 06:26 AM   #5
wb1016
Junior Member
 
Location: Uppsala

Join Date: Apr 2009
Posts: 5
Default

That's true. The GC content of the bird genome is always biased.

The only thing is the poly kmers, don't know if this was a problematic sequencing and it caused the failure of the assembly..

Quote:
Originally Posted by mastal View Post
That actually looks OK, as long as the %GC content is what you expect for that genome.
wb1016 is offline   Reply With Quote
Old 05-19-2017, 06:49 AM   #6
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

Can you try reference-guided assembly with a related bird genome?
mastal is offline   Reply With Quote
Old 05-19-2017, 06:55 AM   #7
wb1016
Junior Member
 
Location: Uppsala

Join Date: Apr 2009
Posts: 5
Default

I invested a lot of money to build seven genome libraries (including very large ones) for the project, it is expected to be a de novo sequencing... really upset with this problem...

Quote:
Originally Posted by mastal View Post
Can you try reference-guided assembly with a related bird genome?
wb1016 is offline   Reply With Quote
Old 05-19-2017, 07:12 AM   #8
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

What assembler or assemblers have you used, and what coverage do you have? For some assemblers too high coverage also leads to problems.
mastal is offline   Reply With Quote
Old 05-19-2017, 08:31 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,583
Default

It is still strange that you have a lot of poly-N type reads. Have you looked to see what % those are of the total data and as individual (%A,%G etc). What did you use as "low quality" read trim cutoff?
GenoMax is offline   Reply With Quote
Old 05-19-2017, 07:21 PM   #10
wb1016
Junior Member
 
Location: Uppsala

Join Date: Apr 2009
Posts: 5
Default

Quote:
Originally Posted by mastal View Post
What assembler or assemblers have you used, and what coverage do you have? For some assemblers too high coverage also leads to problems.
Soapdenovo2 was the assembler

since i have no reference to align ,so the coverage remains unclear
wb1016 is offline   Reply With Quote
Old 05-20-2017, 03:16 AM   #11
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 662
Default

You could get a rough estimate, although it could turn out wrong. Use the genome sizes of the most closely related bird species with known genomes as a guide.

You said you had made several libraries of varying sizes. Do all the data have this problem, or is this just one data set that has this problem?
mastal is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:41 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO