SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting FASTA/qual file pair from 454 to FASTQ oiiio Bioinformatics 9 01-01-2016 03:55 PM
how to separate barcode libraries from csfasta and qual file lei Bioinformatics 4 10-21-2011 06:38 PM
Split fastq to fasta and qual file? ewilbanks Bioinformatics 8 01-07-2011 02:02 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 02:27 PM
format problem:convert fastq to seq/qual file anyone1985 Bioinformatics 1 04-10-2009 08:27 AM

Reply
 
Thread Tools
Old 01-08-2011, 05:56 PM   #1
harrike
Member
 
Location: St. Louis, MO

Join Date: Jun 2010
Posts: 29
Default Novel question---What's .qual file for?

Recently I got my sequencing data based on SOLiD system. For each sample I got two .csfasta files labled 1 and 2, and two .qual files corresponding to the csfasta file. Since I am quite new to this high throughput sequencing technology, I don't know how to use the .qual file for analysis. I know the .csfasta files contain the sequence data which are in color space format. I am able to convert the color space sequences to base space. So can i just use the csfasta file for the following analysis, such as trimming adaptor, alignment, and statistical analysis, without the .qual file?

And the two .csfasta files (1 and 2) can be pooled together for analysis?

Hope to get your help. Thank you in advance.
harrike is offline   Reply With Quote
Old 01-09-2011, 05:32 AM   #2
zhidkov.ilia
Member
 
Location: Israel

Join Date: Dec 2010
Posts: 25
Default

Hi harrike,
I'll try to give you short answer to your vey long question.
You should use the qual files in order to perform more reliable alignment (the quality of base will help you to decrease the false positive call).
You probably would like to start by building FASTQ files from csfasta and qual files you got.
I'm really suggesting you not to convert the color space to bases before alignment or assembly, only after alignment.
Most probably you have paired reads; this is why you have two qual and two csfasta files.
As a beginner in Next Gen please consider to play with your data in Galaxy:
http://main.g2.bx.psu.edu/

Look for "NGS Toolbox Beta" and take a tutorial of SOLiD mapping.

This should help you better understand (in basic) how properly deal with your data.

Best,

Ilia
zhidkov.ilia is offline   Reply With Quote
Old 01-09-2011, 07:13 AM   #3
harrike
Member
 
Location: St. Louis, MO

Join Date: Jun 2010
Posts: 29
Default

IIia,

Thank you for your reply. You answer is quite helpful to me.

Actually, my lab has bought the CLC workbench, which is not available to me now since I am out of the lab for a period. I just want to understand the data more before my analysis.

Yes, there is a lot I need to know. Thank you for your kind help.
harrike is offline   Reply With Quote
Old 03-30-2012, 08:02 AM   #4
Tatanka
Junior Member
 
Location: Boulder, Colorado

Join Date: Mar 2012
Posts: 4
Default

Bump - Hello, I just joined this forum, but I'd like to ask about QUAL files. I recently analyzed the data in my FASTA file without a QUAL file to go with it from the pyrosequencing facility. I am going to rerun the analysis now that I have received an accompanying QUAL file, but I wanted to know how much the two analyses may be expected to differ. Is the first analysis pretty much useless without a QUAL file? Is the second analysis going to yield radically different results from the first?
Tatanka is offline   Reply With Quote
Old 03-30-2012, 08:53 AM   #5
twaddlac
Member
 
Location: Pittsburgh, PA

Join Date: Feb 2011
Posts: 49
Default

That depends on the type of analysis and what you're looking for. Before you redo the entire analysis you should try and plot the average distribution of the quality score per base to see if there are any regions that would alter your result - i.e. if the fifth base of every read had a very low quality.

But answer your question, I haven't seen a significant increase in performance when mapping though enough miss-called bases or low quality bases can affect your results and suggest something that is not really there. However the assembly process seems to be affected a lot, specifically de novo assembly. One trick that is commonly used is to filter the low quality reads (based on the quality score, of course) and then assemble which has worked wonders.

So it really depends on your analysis but all in all, from 454 - especially if you're interested in the 5' end of the reads (e.g. amplicon) - then you shouldn't see too much of a difference. I would be interested to hear your results if you happen to redo the entire analysis.

I hope that helps!
twaddlac is offline   Reply With Quote
Old 03-30-2012, 09:05 AM   #6
Tatanka
Junior Member
 
Location: Boulder, Colorado

Join Date: Mar 2012
Posts: 4
Default

Thanks! Yeah, I'm wondering how much relative abundance of a certain taxon will be affected by QUAL scores. Sounds like it would not be very much, which is good, because I need to share the analysis I have now with a collaborator. I will certainly tell them about what is missing from this analysis, but hopefully the picture that emerges now will still be useful. I will rerun this with the QUAL scores soon, so I will let you know how it turns out.
Tatanka is offline   Reply With Quote
Old 03-30-2012, 09:11 AM   #7
twaddlac
Member
 
Location: Pittsburgh, PA

Join Date: Feb 2011
Posts: 49
Default

How divergent is this taxon from the rest of the population? How strict are your settings? With that type of analysis the stringency of the parameters at which you distinguish one taxon from another could change dramatically IF the quality values are poor. I think it would be most advantageous to plot the quality distribution and filter out bad reads if need be.

Please keep me posted on what you find out, thanks!

Last edited by twaddlac; 03-30-2012 at 09:12 AM. Reason: fix error
twaddlac is offline   Reply With Quote
Old 04-09-2012, 05:49 AM   #8
Tatanka
Junior Member
 
Location: Boulder, Colorado

Join Date: Mar 2012
Posts: 4
Default

Well, the results are in. I performed the analysis both with and without the QUAL data, and the final results were pretty similar. In both analyses, the relative abundances of the various taxa were about the same, giving the same overall picture for community composition.
Tatanka is offline   Reply With Quote
Old 12-12-2012, 10:51 AM   #9
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Hello,

Are the values in the .qual files I get from SOLiD runs simply Phred scores using the standard equation , and without encoding into ASCII ??

Thanks a lot!

Carmen
carmeyeii is offline   Reply With Quote
Old 12-18-2012, 06:16 PM   #10
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Yes, they are.
carmeyeii is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO