SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Adding nucleotides onto the 5' end of adapter uberfinch Illumina/Solexa 0 12-06-2011 01:57 PM
Ambiguous Nucleotides not compatible in SIFT shanon e Bioinformatics 0 11-29-2011 03:32 PM
Ambiguous Nucleotides not compatible in SIFT shanon e Illumina/Solexa 0 11-10-2011 04:15 PM
Periodic variation in nucleotide distribution along the read & other strange things lionelguy Illumina/Solexa 16 08-03-2011 09:04 AM
Cost of nucleotides colicoli General 3 01-26-2008 04:58 AM

Reply
 
Thread Tools
Old 06-24-2011, 07:41 AM   #1
zippered_ohio
Junior Member
 
Location: USA

Join Date: Jun 2011
Posts: 6
Default [Galaxy] Strange QC Nucleotides Distribution Chart

Hi all,

I am new to the forum but excited about the wealth of knowledge available. I am working on a NGS project in Galaxy using data from an Illumina HiSeq 2000. The first part of my workflow uses the Toothbrush FASTQ Groomer to convert the raw, paired end Illumina .fastq files into .fastqsanger. Then, I use the FASTQ Summary Statistics tool and from there use the "Draw nucleotides distribution chart". The resulting chart can be seen here.

Have any of you seen anything like this? My intuition is that the problem lies with the grooming illumina to sanger step, but I am very new to the field. If there is any other information I can provide to help diagnose the problem, please let me know.

Thanks.
zippered_ohio is offline   Reply With Quote
Old 06-25-2011, 07:26 AM   #2
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi,

Please check the raw .fastq files to see what the result is. You may use FastQC, an easy-to-use tool that provides lots of quality-related information.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 06-25-2011, 03:18 PM   #3
zippered_ohio
Junior Member
 
Location: USA

Join Date: Jun 2011
Posts: 6
Default

Hi Douglas,

I took the first 10,000 lines of one of the files and ran it through FastQC. Here is the result.

zippered_ohio is offline   Reply With Quote
Old 06-25-2011, 05:47 PM   #4
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

From the summary, GC content is 50%, which looks good. I got warnings on per base GC content and per sequence GC content. I never used FastQC in Galaxy but standalone. It should generate plots. Can you check the plots to see what they show exactly.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 06-28-2011, 01:24 PM   #5
zippered_ohio
Junior Member
 
Location: USA

Join Date: Jun 2011
Posts: 6
Default

Here are the plots which are in error:






Edit: The increase in quality over the first 13 bases is apparently an artifact generated by the quality calculation algorithm used by Illumina, which now takes into account the preceding and following 13 bp's.

Last edited by zippered_ohio; 06-28-2011 at 01:26 PM. Reason: add info
zippered_ohio is offline   Reply With Quote
Old 06-28-2011, 11:43 PM   #6
tujchl
Member
 
Location: BEIJING, CHINA

Join Date: Sep 2009
Posts: 74
Default

I think your sequencing quality could be a big problem according to FastQC. and you`d better contact your sequencing staffs to explore reasons

BTW, you can see good and bad quality distribution on FastQC web site (http://www.bioinformatics.bbsrc.ac.u...qc_report.html)
(http://www.bioinformatics.bbsrc.ac.u...qc_report.html)

Last edited by tujchl; 06-29-2011 at 12:04 AM.
tujchl is offline   Reply With Quote
Reply

Tags
distribution, galaxy, nucleotide, nucleotides

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO