SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastX-toolkit liu_xt005 Bioinformatics 13 10-11-2014 04:52 AM
FASTX-Toolkit: quality score value thinkRNA Bioinformatics 13 09-30-2014 09:25 AM
Newbie questions regarding Illumina read quality statistics using FASTX toolkit Lspoor Bioinformatics 21 09-05-2013 11:48 AM
GATK and IUB codes donorio.demeo Bioinformatics 1 11-03-2011 06:34 AM
Mapping to reference with IUB codes mapper Bioinformatics 0 10-12-2011 04:09 AM

Reply
 
Thread Tools
Old 06-19-2012, 12:39 AM   #1
DrMTB
Junior Member
 
Location: Scotland

Join Date: Jun 2012
Posts: 3
Default Fastx-Toolkit Newbie help IUB codes in fastq causing error

I simply wanted to try out a few command line alignments of ABI sanger sequences. I converted the .phd.1 files to fastq using a BioPerl script (http://www.bioperl.org/wiki/HOWTO:SeqIO). Then I wanted to trim these files according to the quality scores and so have been using the fastx-toolkit. Problem was the fastq nucleotide sequence gave an error ('found invalid nucleotide sequence') using any of the tools.
Using a test file as follows was OK:
@test
GTGCGGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT
+
&$"'&&$&&&""+)(&(*,213+)*),,/4:=2-,&",.1.9-7884

But replacing any base with an IUB code gave the error again.

Eg using (see R at position 5):
@test
GTGCRGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT
+
&$"'&&$&&&""+)(&(*,213+)*),,/4:=2-,&",.1.9-7884

$ fastx_quality_stats -i test.fastq -o test.stats -Q33
fastx_quality_stats: found invalid nucleotide sequence (GTGCRGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT) on line 2

This must be the cause of the error - but why? Fastq files can have IUB codes surely!?
Thanks for any help on this apparently simple question.

Using :
Ubuntu
FASTX Toolkit 0.0.13.1
DrMTB is offline   Reply With Quote
Old 06-19-2012, 01:42 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

IUPAC ambiguity codes are valid in FASTQ, but typically you'll only see N (and ACGT). It seems that FASTX doesn't like this - you could report it.
maubp is offline   Reply With Quote
Old 06-21-2012, 03:50 AM   #3
DrMTB
Junior Member
 
Location: Scotland

Join Date: Jun 2012
Posts: 3
Default

Thanks maubp
Reply from Fastx-toolkit author says that these tools were designed to work with files from Illumina machines containing only ACGT & N and will therefore fail on other kinds of input.
So that's that.
DrMTB is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:48 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO