SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastX Trimmer takes too long and how to do multiple files? billstevens Bioinformatics 11 02-26-2015 11:33 AM
Fastx Trimmer : invalid sequence Oliviervg Bioinformatics 0 03-09-2012 01:56 AM
Tophat can't find Bowtie biznatch Bioinformatics 3 01-18-2012 11:05 AM
PubMed: ConDeTri - A Content Dependent Read Trimmer for Illumina Data. Newsbot! Literature Watch 0 11-01-2011 07:10 AM
Can not find idels!! scami Bioinformatics 4 10-08-2010 07:44 PM

Reply
 
Thread Tools
Old 07-24-2012, 04:57 PM   #1
danbrami
Junior Member
 
Location: Washington, DC

Join Date: Nov 2009
Posts: 6
Default Help me find a trimmer

Hi folks,

I need a command line trimmer that will handle a seq/qual pair or ab1 file (for sanger reads obviously).
I have tried Lucy and unfortunately, I found:
- it did not trim the resulting sequence, only given the clear range positions in header
-was not aggressive enough in removing low quality, ambiguous and vector contaminants.

I am trying TrimSeq but I am not sure it will do the job (I only want to look fo a specific vector, not a slew of them nor contaminants)

Ideally, you would direct to something like an emboss tool that does what i want quickly.

What do you folks use?

Thanks!

daniel
danbrami is offline   Reply With Quote
Old 07-25-2012, 12:39 AM   #2
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

The trimmers for Sanger data generally didn't create modified versions of the file, since the ab1/scf etc files were pretty big binary files and it didn't make much sense to duplicate them. The more usual approach was what you mentioned which was to simply write out the range of usable sequence into a separate configuration file which was then read by the assembler to show which parts of the sequence to use. This also had the advantage that since the trimmed sequence was only masked you could always bring it back later on if it proved to be useful in your alignment.

For this kind of stuff we still use the pregap program which is part of the staden suite. This has all of the options you need for trimming either poor quality sequence or vector and if you want it can write out the good quality sequence to a fasta file (but not to a trace file) if you're not intending to take the sequences forward for assembly.
simonandrews is offline   Reply With Quote
Old 08-22-2012, 02:39 AM   #3
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by danbrami View Post
Hi folks,

I need a command line trimmer that will handle a seq/qual pair or ab1 file (for sanger reads obviously).
I have tried Lucy and unfortunately, I found:
- it did not trim the resulting sequence, only given the clear range positions in header
-was not aggressive enough in removing low quality, ambiguous and vector contaminants.

I am trying TrimSeq but I am not sure it will do the job (I only want to look fo a specific vector, not a slew of them nor contaminants)

Ideally, you would direct to something like an emboss tool that does what i want quickly.

What do you folks use?

Thanks!

daniel
Try QTrim (hiv.sanbi.ac.za/software/qtrim)
It is python script and is commandline. It only does quality trimming and gives you the clean reads. No any vector trimming or adaptor trimming. Input files must be fastq file or fasta/qual files combined. The quality score should be PHRED score based. The command I tried was:
python qtrim.py -fastq myfastqfile -m 30 -l 50 -o myoutputfile
Himalaya is offline   Reply With Quote
Old 12-18-2012, 07:42 PM   #4
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Himalaya,

I don't quite understand ehat the INTERCRAP parameter in the Qtrim script is for. Do you?

Thanks,
Carmen
carmeyeii is offline   Reply With Quote
Old 12-23-2012, 08:55 AM   #5
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by carmeyeii View Post
Himalaya,

I don't quite understand ehat the INTERCRAP parameter in the Qtrim script is for. Do you?

Thanks,
Carmen
Carmen
Sorry i don't get which parameter you saying about. It works so easy for me. If you input file has 454 short reads in fastq format, the command i posted earlier works easily for you as well. Let me know for any help if you struggling.

Himalaya
Himalaya is offline   Reply With Quote
Old 12-23-2012, 10:15 PM   #6
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Default

how about NGS QC Toolkit.


Implemented in perl.
a_mt is offline   Reply With Quote
Old 12-24-2012, 07:30 AM   #7
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Quote:
Originally Posted by a_mt View Post
how about NGS QC Toolkit.


Implemented in perl.
Please check the NGS QC toolkit paper (http://www.plosone.org/article/info:...l.pone.0030619). The QC steps are clear in it. Depending upon your sequence reads and trimming requirement, you should be able to decide whether NGS QC toolkit is good enough for trimming or not. NGS QC is not my choice. You can also run the same sequence file into different quality trimming programs to know which one is performing better quality control in terms of number of sequences output, average read length of the output reads and also the number of bases with quality score below your threshold.
Himalaya is offline   Reply With Quote
Reply

Tags
ambiguous, quality, trim, vector

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:59 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO