Seqanswers Leaderboard Ad

**maubp** · 12-31-2012, 02:48 PM

Well, what was line 5 of the file? Perhaps you could show us the output of the command 'head -n 10 example.fastq' or similar? Use the [ code ] and [ /code ] tags to ensure the forum displays the output nicely (available as a button in the advanced view editor).

**tonybert** · 12-31-2012, 04:00 PM

Below is your requested output (maubp):
head -n 10 shuffled.fastq
@HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACACG
@HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
GGTTTCGAAAAGAGGGGGGGGGGGGGAGAGGGGGGGAAACCGGTGGGGCCCCCCCCCAANAAAAAAAAAAAAAAAA
+
@@@DDDDDHFFDHGIID<C?<BHG<GGGDCBBHB;?FGHI9??BBFH@>GCF;A.-==@C@C36@;@D########
+
############################################################################
@HWI-ST700693:263:C0K6DACXX:3:1101:2339:2112
CATGTAGTGAACCATATGCTCCAGTAATACCTTGAACAATGACTCCTTTATTTTCATAATCAGAATCCTCTGGTTT

**maubp** · 12-31-2012, 04:26 PM

That FASTQ file is certainly messed up. My guess is you used a FASTA interleaving script which assumed 2 lines per record... while FASTQ files usually have 4 lines per record.

Which script exactly did you use? There is no shuffleSequences.pl script on Nick's blog post - it just mentions using Velvet’s bundled Perl script of that name.

**tonybert** · 12-31-2012, 04:32 PM

Thanks for the prompt reply! Below is the script I used:

$ cat shuffleSequences.pl
#!/usr/bin/perl

$filenameA = $ARGV[0];
$filenameB = $ARGV[1];
$filenameOut = $ARGV[2];

open $FILEA, "< $filenameA";
open $FILEB, "< $filenameB";

open $OUTFILE, "> $filenameOut";

while(<$FILEA>) {
print $OUTFILE $_;
$_ = <$FILEA>;
print $OUTFILE $_;

$_ = <$FILEB>;
print $OUTFILE $_;
$_ = <$FILEB>;
print $OUTFILE $_;
}

**tonybert** · 12-31-2012, 04:33 PM

as well, this script was not actually on Nick Loman's blog, however it was mentioned in the text. I copied it from following website:

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/velvet-research/source/browse/trunk/shuffleSequences.pl

**maubp** · 12-31-2012, 04:42 PM

I really can't recommend running random Perl scripts found online like that - it doesn't even have a comment at the start telling you what it should be doing. However, from my limited Perl knowledge, I think it is doing a very simple interleaving process assuming 2 lines per record, which would be OK for short read FASTA files with no line wrapping, but it does absolutely no error checking - thus it mangled your data without warning.

If you look at the actual Velvet repository, it has some more clearly labelled Perl scripts, with a version for FASTA and another for FASTQ:

velvet/contrib/shuffleSequences_fasta at master · dzerbino/velvet

https://github.com/dzerbino/velvet/tree/master/contrib/shuffleSequences_fasta

Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 1...

(They still need a bit of documentation, and in my personal view, error handling)

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

fastx_quality_stats error with paired end sequencesr

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News