SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script error for TopHat & Bowtie 2 for paired end and splice variant detection but? Richard Barker RNA Sequencing 8 08-12-2012 03:56 PM
Tophat 1.4.1 paired end alignment error -11 healp mediator Bioinformatics 2 02-23-2012 07:09 AM
Tophat - Error using Paired End jling Bioinformatics 0 02-10-2010 10:26 AM
error while maaping paired end reads in Maq nisha Introductions 12 08-03-2009 12:21 PM
sff_extract paired end extraction error coldturkey Bioinformatics 1 03-12-2009 04:48 AM

Reply
 
Thread Tools
Old 12-31-2012, 11:27 AM   #1
tonybert
Member
 
Location: seattle

Join Date: Aug 2012
Posts: 38
Default fastx_quality_stats error with paired end sequencesr

Greetings, I have just recently received a HiSeq Illumina run (paired end, 72bp) of several genomes and metagenomes.

I am currently trying to retrieve quality stat info for the demultiplexed samples after combining the two paired end .fastq files using shuffleSeqs.pl. When using fastx_quality_stats on the resulting combined file, i receive the following error:

fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 5. Is this a valid FASTQ file?

I went back and tried using fastx_quality_stats on both of the paired end samples independently, and it worked just fine.

Just curious if anyone else has run into a similar problem with trying to combine paired end sequence data, and if they would be willing to offer advice or a solution. It am fairly certain the combination step is the portion of the process that is introducing the problem.

shuffleSeqs.pl was downloaded from the following website:
http://pathogenomics.bham.ac.uk/blog...nome-assembly/

Although i am fairly certain this is a part of the velvet package as well.

Thanks,

-Tony
tonybert is offline   Reply With Quote
Old 12-31-2012, 01:48 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Well, what was line 5 of the file? Perhaps you could show us the output of the command 'head -n 10 example.fastq' or similar? Use the [ code ] and [ /code ] tags to ensure the forum displays the output nicely (available as a button in the advanced view editor).
maubp is offline   Reply With Quote
Old 12-31-2012, 03:00 PM   #3
tonybert
Member
 
Location: seattle

Join Date: Aug 2012
Posts: 38
Default

Below is your requested output (maubp):
head -n 10 shuffled.fastq
@HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACACG
@HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
GGTTTCGAAAAGAGGGGGGGGGGGGGAGAGGGGGGGAAACCGGTGGGGCCCCCCCCCAANAAAAAAAAAAAAAAAA
+
@@@DDDDDHFFDHGIID<C?<BHG<GGGDCBBHB;?FGHI9??BBFH@>GCF;A.-==@C@C36@;@D########
+
############################################################################
@HWI-ST700693:263:C0K6DACXX:3:1101:2339:2112
CATGTAGTGAACCATATGCTCCAGTAATACCTTGAACAATGACTCCTTTATTTTCATAATCAGAATCCTCTGGTTT
tonybert is offline   Reply With Quote
Old 12-31-2012, 03:26 PM   #4
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

That FASTQ file is certainly messed up. My guess is you used a FASTA interleaving script which assumed 2 lines per record... while FASTQ files usually have 4 lines per record.

Which script exactly did you use? There is no shuffleSequences.pl script on Nick's blog post - it just mentions using Velvet’s bundled Perl script of that name.
maubp is offline   Reply With Quote
Old 12-31-2012, 03:32 PM   #5
tonybert
Member
 
Location: seattle

Join Date: Aug 2012
Posts: 38
Default

Thanks for the prompt reply! Below is the script I used:

$ cat shuffleSequences.pl
#!/usr/bin/perl

$filenameA = $ARGV[0];
$filenameB = $ARGV[1];
$filenameOut = $ARGV[2];

open $FILEA, "< $filenameA";
open $FILEB, "< $filenameB";

open $OUTFILE, "> $filenameOut";

while(<$FILEA>) {
print $OUTFILE $_;
$_ = <$FILEA>;
print $OUTFILE $_;

$_ = <$FILEB>;
print $OUTFILE $_;
$_ = <$FILEB>;
print $OUTFILE $_;
}
tonybert is offline   Reply With Quote
Old 12-31-2012, 03:33 PM   #6
tonybert
Member
 
Location: seattle

Join Date: Aug 2012
Posts: 38
Default

as well, this script was not actually on Nick Loman's blog, however it was mentioned in the text. I copied it from following website:

http://code.google.com/p/velvet-rese...leSequences.pl
tonybert is offline   Reply With Quote
Old 12-31-2012, 03:42 PM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

I really can't recommend running random Perl scripts found online like that - it doesn't even have a comment at the start telling you what it should be doing. However, from my limited Perl knowledge, I think it is doing a very simple interleaving process assuming 2 lines per record, which would be OK for short read FASTA files with no line wrapping, but it does absolutely no error checking - thus it mangled your data without warning.

If you look at the actual Velvet repository, it has some more clearly labelled Perl scripts, with a version for FASTA and another for FASTQ:
https://github.com/dzerbino/velvet/t...equences_fasta

(They still need a bit of documentation, and in my personal view, error handling)
maubp is offline   Reply With Quote
Reply

Tags
combining paired end runs, fastx_quality_stats, hiseq paired end

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO