I simply wanted to try out a few command line alignments of ABI sanger sequences. I converted the .phd.1 files to fastq using a BioPerl script (http://www.bioperl.org/wiki/HOWTO:SeqIO). Then I wanted to trim these files according to the quality scores and so have been using the fastx-toolkit. Problem was the fastq nucleotide sequence gave an error ('found invalid nucleotide sequence') using any of the tools.
Using a test file as follows was OK:
@test
GTGCGGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT
+
&$"'&&$&&&""+)(&(*,213+)*),,/4:=2-,&",.1.9-7884
But replacing any base with an IUB code gave the error again.
Eg using (see R at position 5):
@test
GTGCRGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT
+
&$"'&&$&&&""+)(&(*,213+)*),,/4:=2-,&",.1.9-7884
$ fastx_quality_stats -i test.fastq -o test.stats -Q33
fastx_quality_stats: found invalid nucleotide sequence (GTGCRGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT) on line 2
This must be the cause of the error - but why? Fastq files can have IUB codes surely!?
Thanks for any help on this apparently simple question.
Using :
Ubuntu
FASTX Toolkit 0.0.13.1
Using a test file as follows was OK:
@test
GTGCGGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT
+
&$"'&&$&&&""+)(&(*,213+)*),,/4:=2-,&",.1.9-7884
But replacing any base with an IUB code gave the error again.
Eg using (see R at position 5):
@test
GTGCRGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT
+
&$"'&&$&&&""+)(&(*,213+)*),,/4:=2-,&",.1.9-7884
$ fastx_quality_stats -i test.fastq -o test.stats -Q33
fastx_quality_stats: found invalid nucleotide sequence (GTGCRGTGGTGGAGACGCACTTGATAGTCCTTCTCCGCAGGTACTTT) on line 2
This must be the cause of the error - but why? Fastq files can have IUB codes surely!?
Thanks for any help on this apparently simple question.
Using :
Ubuntu
FASTX Toolkit 0.0.13.1
Comment