Seqanswers Leaderboard Ad

**GenoMax** · 09-21-2012, 08:23 AM

You can do this in more than one way. I will give you a couple below.

If you are comfortable on command line in a UNIX environment then try fastx-toolkit from Hannon lab: http://hannonlab.cshl.edu/fastx_toolkit/

If you would rather do this via a GUI interface then try Galaxy. https://main.g2.bx.psu.edu/ They have a tutorial for metagenomic example available here: https://main.g2.bx.psu.edu/u/james/p...e-metagenomics This is for 454-data but you would get an idea of how to use the tools.

There are also several video screencasts available for individual tools: http://wiki.g2.bx.psu.edu/Learn/Screencasts

**newBioinfo** · 09-27-2012, 08:15 AM

Thanks GenoMax for the help.
I tried fastx-toolkit but I am getting error. I used the command
fastx_quality_stats -i trial.fastq -o abc.tx
and the error I am getting is
Invalid quality score value (char '#' ord 35 quality value -29) on line 4

My input file looks like this
@HWI-ST1035:115:C0RG7ACXX:5:1101:1216:2040 1:N:0:
NACAGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
#1=DDDFFCADHHIIIIIIIIGIIIIIIIGIIIIIIIFHIIIIICFHH#######################################################################################################
@HWI-ST1035:115:C0RG7ACXX:5:1101:1208:2076 1:N:0:
TACAGAGGTCTCAAGCGTTGTTCGGAATCACTGGGCGTAAAGCGTGCGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNN
+
@@CFFFFF2ACFDHIIIFHGIHHIIIIIIGHIIIIHIDHIHIII;FGI#######################################################################################################
@HWI-ST1035:115:C0RG7ACXX:5:1101:1168:2119 1:N:0:
TACGTAGGGTGCGAGCGTTGTCCGGAATTACTGGGCGTAAAGGGCTCGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNN
+
@@@FADDFH<CFHBIGHEHHIIIJJ<@GIHHIIJJIJ;FGIHJJHHGF#######################################################################################################
@HWI-ST1035:115:C0RG7ACXX:5:1101:1173:2185 1:N:0:
TACGTAGGGGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNN
+
?@@D=BDD+@DFAFIIIEFIACGFIFGICIBFFIIEI;FFIIEFFBCB#######################################################################################################
@HWI-ST1035:115:C0RG7ACXX:5:1101:1155:2196 1:N:0:
TACGTAGGGGGCAAGCGTTAATCGGAATTACTGGNCGNNNNNNNNGCGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNN
+
?@@DDDDD=)@DFBFGICGFE?BGIFBFIFEFII#####################################################################################################################
@HWI-ST1035:115:C0RG7ACXX:5:1101:1183:2201 1:N:0:
TACGGAGGGTGCGAGCGTTAATCGGAATAACTGGGCGTAAAGGGCACGNNNGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNN
+
@@@DDDDDF2AF1EGIIGIIIIGIIIII<FFFIIIIIAEFIFFFF?DD#######################################################################################################

Please help, I have been trying this from past many days...

**GenoMax** · 09-27-2012, 08:32 AM

See this thread for the suggestion of a new tool for trimming by kmcarr. http://seqanswers.com/forums/showthr...tx+toolkit+Q33

Also mentioned in the thread is the option -Q 33 (to tell fastx toolkit that your reads are in sanger fastq quality format).

You command would become: fastx_quality_stats -Q 33 -i trial.fastq -o abc.txt

**newBioinfo** · 09-27-2012, 10:00 AM

Thanks GeomeMax,
I used the command
fastx_quality_stats -Q 33 -i lane5_NoIndex_L005_R1_001.fastq -o quality_score.txt .
its been running for more than an hour now without any error, but the problem is that the output file which I am creating, quality_score.txt is still empty. Is there a problem in this.
Please help!!!!

**newBioinfo** · 09-27-2012, 10:23 AM

Hi,
Finally I got the result in my output file, but it is showing only 151 rows. I am confused, I thought this would give me quality score of each read, but what I am getting is only 151 results(my read length = 151). This means it is giving me the quality score of each base position in all the reads. I want to know how can I use this result to filter out my reads of poor quality score.
I am too confused.

**dpryan** · 09-27-2012, 10:37 AM

fastx_quality_stats yields per-base metrics (i.e., averaged across reads) rather than what you want. There may be pre-written scripts to do what you want (it'd be easy to write), but I don't know of any. One alternative might just be to quality trim and discard reads below some length (12bp or whatever your aligner says is the minimum). At the end of the day, that's what you're really interested in anyway since you have a bunch of Ns due to short transcripts.

**GenoMax** · 09-27-2012, 11:31 AM

You probably do not want to focus on "average" score across an entire read but rather look at individual base quality scores. As dpryan pointed out, you seem to have a number of "N's" (perhaps the snippet you posted is from the beginning of the file) so you would want to trim those bases out before trying alignments.

You can do the trimming using the "trimmomatic" (http://www.usadellab.org/cms/index.php?page=trimmomatic) tool that was suggested by kmcarr before which can take quality into account.

Originally posted by newBioinfo View Post

Hi,
Finally I got the result in my output file, but it is showing only 151 rows. I am confused, I thought this would give me quality score of each read, but what I am getting is only 151 results(my read length = 151). This means it is giving me the quality score of each base position in all the reads. I want to know how can I use this result to filter out my reads of poor quality score.
I am too confused.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

fastq format

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News