SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
CASAVA 1.8 --use-bases-mask aggp11 Illumina/Solexa 4 02-02-2012 06:34 PM
Using Tophat with low quality Illumina Reads sphil Bioinformatics 5 08-02-2011 08:28 AM
Periodical illumina read length distribution after trimming of low-quality bases luxmare General 4 12-20-2010 04:18 PM
quality scores, low mapped%? chrisbala Bioinformatics 12 03-24-2010 10:42 AM
Recommendation for low quality data dawe Bioinformatics 0 11-26-2009 02:02 PM

Reply
 
Thread Tools
Old 01-27-2010, 08:15 AM   #1
bbimber
Member
 
Location: wisconsin

Join Date: Jan 2010
Posts: 12
Default Quality trimmming / Mask low quality bases?

Does anyone know of a command line utility that can accept either FASTA/QUAL or FASTQ files and mask any bases below a given quality score (ie. convert them to N)?

Does anyone have suggestions on the best utilities to perform quality score based end trimming?

Thank you for any help or suggestions.
bbimber is offline   Reply With Quote
Old 01-27-2010, 09:08 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Most people I've talked to use their own Perl or Python scripts for this, although EMBOSS are looking at adding this kind of tool to their suite.
maubp is offline   Reply With Quote
Old 01-27-2010, 10:32 AM   #3
bbimber
Member
 
Location: wisconsin

Join Date: Jan 2010
Posts: 12
Default

ok, thanks for the reply. that's the impression I was getting from google.

in case anyone else reads this, i did come across this:

http://hannonlab.cshl.edu/fastx_toolkit/index.html
bbimber is offline   Reply With Quote
Old 01-27-2010, 11:00 AM   #4
bioenvisage
Member
 
Location: it

Join Date: Oct 2009
Posts: 40
Default

hi .. is there any script to mask or remove the low quality repeats and also simple repeats..I would also
like to know whether this will create problems in denovo assembly...
bioenvisage is offline   Reply With Quote
Old 01-28-2010, 12:44 PM   #5
Zigster
(Jeremy Leipzig)
 
Location: Philadelphia, PA

Join Date: May 2009
Posts: 116
Default

Quote:
Originally Posted by bbimber View Post
ok, thanks for the reply. that's the impression I
http://hannonlab.cshl.edu/fastx_toolkit/index.html
is the FASTQ Quality Filter a variable trimmer?
__________________
--
Jeremy Leipzig
Bioinformatics Programmer
--
My blog
Twitter
Zigster is offline   Reply With Quote
Old 03-16-2010, 05:25 AM   #6
gabriel.lichtenstein
Junior Member
 
Location: Buenos Aires

Join Date: Dec 2009
Posts: 7
Default updates on this thread?

would you suggest me a perl script for quality trimming illumina 1.3 reads?

what do you think about clc trim sequences tool? and abyss -q option?
gabriel.lichtenstein is offline   Reply With Quote
Old 03-16-2010, 05:34 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

I don't have any perl examples, but there are some very simple Python examples in the Biopython Tutorial (search for FASTQ):
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
and here:
http://news.open-bio.org/news/2009/0...on-fast-fastq/
maubp is offline   Reply With Quote
Old 03-16-2010, 05:48 AM   #8
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

I am unconvinced that trimming low quality reads is necessary at all. After all, most aligners (e.g., Maq, Bowtie, BWA; but not Eland!) take into account the quality score and disregard or downweight low quality reads automatically.

Simon
Simon Anders is offline   Reply With Quote
Old 03-16-2010, 06:11 AM   #9
bbimber
Member
 
Location: wisconsin

Join Date: Jan 2010
Posts: 12
Default

if you are interested in illumina, look into fastx toolkit (link above). there's a command line tool to do it. they might also have a web interface for it, but i'm not 100% positive. the logic behind their trimming is probably good for short reads, but not as optimal for longer ones like 454.
bbimber is offline   Reply With Quote
Old 03-25-2010, 02:40 PM   #10
xuer
Member
 
Location: germany

Join Date: Sep 2008
Posts: 17
Default

Quote:
Originally Posted by mard View Post
Yes it tells you the number of reads that have been marked as duplicates, as well as the total number of reads. But note that reads that Picard marks as duplicates do not necessarily have identical sequence they just map to the same chromosomal location.
so , it looks that Picard is not good choice for that
xuer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO