I am experiencing a very peculiar problem with velvet version 0.7.45 – results for the same input data are different under different conditions. I used a live sequence data (file my_sequence.txt) from a next generation shotgun sequencer – the data I received is in zip format (viz., my_sequence.txt.zip). The statement of the problem is as following.
1. When I use the velveth program on the zip data with file format fastq.gz, the zip sequence file is processed successfully by velveth with 167555 sequences found in the file.
2. When I use the same file in velveth with file format fastq instead of fastq.gz, the file is processed normally without any error and the result is same with 167555 sequences found. As I used the wrong file type, I was expecting velveth to terminate with an error, but velveth processed it successfully.
3. I then useed the unzip utility to unzip the file from my_sequence.txt.zip to my_sequence.txt. The unzipped file is stored in the local directory by default.
4. I run velveth on the unzipped sequence file (my_sequence.txt) using file format as fastq.gz by mistake instead of fastq file format. Surprisingly velveth does not throw any error; instead, it processes the unzipped file successfully with file format fastq.gz. Surprise enough, it now finds 991278 sequences, which is much higher (about 6 times) compared to the previous run on the zipped data for the same file.
5. Just being curious, I run velveth on the unzipped file once again, but this time I correct the file format – I use the file format as fastq; however, instead of running successfully, velveth throws error my_sequence.txt incomplete.: No such file or directory
We will appreciate if anyone could explain such behavior that looks inconsistent. The complete operational sequence is listed here.
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq -short ../../data/my_sequence.txt.zip
Reading FastQ file ../../data/my_sequence.txt.zip
167555 reads found.
Done
Reading read set file sillyDirectory/Sequences;
167555 sequences found
Done
167555 sequences in total.
Writing into roadmap file sillyDirectory/Roadmaps...
Inputting sequences...
Inputting sequence 0 / 167555
Inputting sequence 100000 / 167555
Done inputting sequences
Destroying splay table
Splay table destroyed
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq.gz -short ../../data/my_sequence.txt.zip
Reading FastQ file ../../data/my_sequence.txt.zip
167555 reads found.
Done
Reading read set file sillyDirectory/Sequences;
167555 sequences found
Done
167555 sequences in total.
Writing into roadmap file sillyDirectory/Roadmaps...
Inputting sequences...
Inputting sequence 0 / 167555
Inputting sequence 100000 / 167555
Done inputting sequences
Destroying splay table
Splay table destroyed
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ unzip ../../data/my_sequence.txt.zip
Archive: ../../data/my_sequence.txt.zip
inflating: my_sequence.txt
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq.gz -short my_sequence.txt
Reading FastQ file my_sequence.txt
991278 reads found.
Done
Reading read set file sillyDirectory/Sequences;
991278 sequences found
Done
991278 sequences in total.
Writing into roadmap file sillyDirectory/Roadmaps...
Inputting sequences...
Inputting sequence 0 / 991278
Inputting sequence 100000 / 991278
Inputting sequence 200000 / 991278
Inputting sequence 300000 / 991278
Inputting sequence 400000 / 991278
Inputting sequence 500000 / 991278
Inputting sequence 600000 / 991278
Inputting sequence 700000 / 991278
Inputting sequence 800000 / 991278
Inputting sequence 900000 / 991278
Done inputting sequences
Destroying splay table
Splay table destroyed
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq -short my_sequence.txt
Reading FastQ file my_sequence.txt
velveth: my_sequence.txt incomplete.: No such file or directory
asoke@asoke-laptop:~/velvet/velvet_0.7.45$
1. When I use the velveth program on the zip data with file format fastq.gz, the zip sequence file is processed successfully by velveth with 167555 sequences found in the file.
2. When I use the same file in velveth with file format fastq instead of fastq.gz, the file is processed normally without any error and the result is same with 167555 sequences found. As I used the wrong file type, I was expecting velveth to terminate with an error, but velveth processed it successfully.
3. I then useed the unzip utility to unzip the file from my_sequence.txt.zip to my_sequence.txt. The unzipped file is stored in the local directory by default.
4. I run velveth on the unzipped sequence file (my_sequence.txt) using file format as fastq.gz by mistake instead of fastq file format. Surprisingly velveth does not throw any error; instead, it processes the unzipped file successfully with file format fastq.gz. Surprise enough, it now finds 991278 sequences, which is much higher (about 6 times) compared to the previous run on the zipped data for the same file.
5. Just being curious, I run velveth on the unzipped file once again, but this time I correct the file format – I use the file format as fastq; however, instead of running successfully, velveth throws error my_sequence.txt incomplete.: No such file or directory
We will appreciate if anyone could explain such behavior that looks inconsistent. The complete operational sequence is listed here.
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq -short ../../data/my_sequence.txt.zip
Reading FastQ file ../../data/my_sequence.txt.zip
167555 reads found.
Done
Reading read set file sillyDirectory/Sequences;
167555 sequences found
Done
167555 sequences in total.
Writing into roadmap file sillyDirectory/Roadmaps...
Inputting sequences...
Inputting sequence 0 / 167555
Inputting sequence 100000 / 167555
Done inputting sequences
Destroying splay table
Splay table destroyed
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq.gz -short ../../data/my_sequence.txt.zip
Reading FastQ file ../../data/my_sequence.txt.zip
167555 reads found.
Done
Reading read set file sillyDirectory/Sequences;
167555 sequences found
Done
167555 sequences in total.
Writing into roadmap file sillyDirectory/Roadmaps...
Inputting sequences...
Inputting sequence 0 / 167555
Inputting sequence 100000 / 167555
Done inputting sequences
Destroying splay table
Splay table destroyed
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ unzip ../../data/my_sequence.txt.zip
Archive: ../../data/my_sequence.txt.zip
inflating: my_sequence.txt
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq.gz -short my_sequence.txt
Reading FastQ file my_sequence.txt
991278 reads found.
Done
Reading read set file sillyDirectory/Sequences;
991278 sequences found
Done
991278 sequences in total.
Writing into roadmap file sillyDirectory/Roadmaps...
Inputting sequences...
Inputting sequence 0 / 991278
Inputting sequence 100000 / 991278
Inputting sequence 200000 / 991278
Inputting sequence 300000 / 991278
Inputting sequence 400000 / 991278
Inputting sequence 500000 / 991278
Inputting sequence 600000 / 991278
Inputting sequence 700000 / 991278
Inputting sequence 800000 / 991278
Inputting sequence 900000 / 991278
Done inputting sequences
Destroying splay table
Splay table destroyed
asoke@asoke-laptop:~/velvet/velvet_0.7.45$ ./velveth sillyDirectory 21 -fastq -short my_sequence.txt
Reading FastQ file my_sequence.txt
velveth: my_sequence.txt incomplete.: No such file or directory
asoke@asoke-laptop:~/velvet/velvet_0.7.45$
Comment