Seqanswers Leaderboard Ad

**gringer** · 12-04-2011, 05:16 PM

Do you want something more than what fastx_quality_stats from fastx-tools can provide?

Code:

usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE]
...
   [-N]         = New output format (with more information per nucleotide/cycle).
...
The *NEW* output format:
        cycle (previously called 'column') = cycle number
        max-count
        For each nucleotide in the cycle (ALL/A/C/G/T/N):
                count   = number of bases found in this column.
                min     = Lowest quality score value found in this column.
                max     = Highest quality score value found in this column.
                sum     = Sum of quality score values for this column.
                mean    = Mean quality score value for this column.
                Q1      = 1st quartile quality score.
                med     = Median quality score.
                Q3      = 3rd quartile quality score.
                IQR     = Inter-Quartile range (Q3-Q1).
                lW      = 'Left-Whisker' value (for boxplotting).
                rW      = 'Right-Whisker' value (for boxplotting).

**dgtnk** · 12-04-2011, 06:46 PM

agree with gringer

fastx_quality_stats from Fastx_Toolkit works well. It will not give you the raw list of quality scores, but will provide you the quartile values of read quality at each read position, which you can use for boxplotting in R.

**Dario1984** · 12-04-2011, 07:00 PM

Try using QualityScore in ShortRead.

**Blahah404** · 12-05-2011, 12:17 AM

You can easily extract a .qual file containing per-base quality scores from a fastq file, for example using biopython:

Code:

#!/usr/bin/env python

"""Usage: fastq2qual.py filename 
    where filename is a .fastq (without extension)
    will produce: filename.qual
"""

import sys
from Bio import SeqIO

file_name = sys.argv[1]

SeqIO.convert(file_name+".fastq", "fastq", file_name+".qual", "qual")

sys.exit()

**maubp** · 12-05-2011, 04:29 AM

Originally posted by Dario1984 View Post

Try using QualityScore in ShortRead.

+1

If you want to use R for the plotting and analysis, why not use R to read the FASTQ files as well?

**kwyattm** · 12-05-2011, 06:27 AM

The way I handled this was to write a perl script that 1)parses qseq to fastq 2)trims for adaptor and 3)parses quality score data to a text file. The text file is subsequently imported into R and simply graphed. I even get the graphs imported into a pdf and e-mailed to me when everything is done!

**gringer** · 12-05-2011, 06:33 AM

qseq -> fastq is already done in CASAVA, most likely including the removal of any adaptor sequences. CASAVA 1.8+ process the intensity files directly into fastq:

Illumina Raw output - SEQanswers

http://seqanswers.com/forums/showthread.php?t=13147

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**kwyattm** · 12-05-2011, 06:36 AM

Yep!

Originally posted by gringer View Post

qseq -> fastq is already done in CASAVA, most likely including the removal of any adaptor sequences. CASAVA 1.8+ process the intensity files directly into fastq:

http://seqanswers.com/forums/showthread.php?t=13147

Thanks, Ginger! Yeah, I knew about this, it's just an old script. Just passing along the information I had!

**brachysclereid** · 12-05-2011, 07:57 AM

Idease on q scores

Thanks!

I used the biopython suggestion and now have the .qual files. This is what I wanted.

kwyattm,
Is there a tool that will take a random sample of the .qual file in R for the purpose of plotting? I am curious about what your are using to make the plots.

Thanks again!

**gringer** · 12-05-2011, 08:44 AM

I used the biopython suggestion and now have the .qual files. This is what I wanted.

Just as a word of caution, you need to make sure the quality base is correct. Different sequencers have in the past used different bases / ascii values to represent the same qualities.

Is there a tool that will take a random sample of the .qual file in R for the purpose of plotting?

You can randomly sample data in R by using the 'sample' function, but boxplot should be able to manage with the full dataset. There's also a fastX tool for displaying quality statistics (fastq_quality_boxplot_graph), just in case you want something that's already been made by someone else.

**Dario1984** · 12-05-2011, 02:00 PM

Since he is working in R, it seems much more straightforward to read it in R.

e.g.

library(ShortRead)
fastqs <- readFastq("/path/to/fastqs")
qualities <- quality(fastqs)

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Ideas on collecting quality scores per base in an illumina fastq file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News