Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastx_quality_stats error with paired end sequencesr

    Greetings, I have just recently received a HiSeq Illumina run (paired end, 72bp) of several genomes and metagenomes.

    I am currently trying to retrieve quality stat info for the demultiplexed samples after combining the two paired end .fastq files using shuffleSeqs.pl. When using fastx_quality_stats on the resulting combined file, i receive the following error:

    fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 5. Is this a valid FASTQ file?

    I went back and tried using fastx_quality_stats on both of the paired end samples independently, and it worked just fine.

    Just curious if anyone else has run into a similar problem with trying to combine paired end sequence data, and if they would be willing to offer advice or a solution. It am fairly certain the combination step is the portion of the process that is introducing the problem.

    shuffleSeqs.pl was downloaded from the following website:


    Although i am fairly certain this is a part of the velvet package as well.

    Thanks,

    -Tony

  • #2
    Well, what was line 5 of the file? Perhaps you could show us the output of the command 'head -n 10 example.fastq' or similar? Use the [ code ] and [ /code ] tags to ensure the forum displays the output nicely (available as a button in the advanced view editor).

    Comment


    • #3
      Below is your requested output (maubp):
      head -n 10 shuffled.fastq
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACACG
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GGTTTCGAAAAGAGGGGGGGGGGGGGAGAGGGGGGGAAACCGGTGGGGCCCCCCCCCAANAAAAAAAAAAAAAAAA
      +
      @@@DDDDDHFFDHGIID<C?<BHG<GGGDCBBHB;?FGHI9??BBFH@>GCF;A.-==@C@C36@;@D########
      +
      ############################################################################
      @HWI-ST700693:263:C0K6DACXX:3:1101:2339:2112
      CATGTAGTGAACCATATGCTCCAGTAATACCTTGAACAATGACTCCTTTATTTTCATAATCAGAATCCTCTGGTTT

      Comment


      • #4
        That FASTQ file is certainly messed up. My guess is you used a FASTA interleaving script which assumed 2 lines per record... while FASTQ files usually have 4 lines per record.

        Which script exactly did you use? There is no shuffleSequences.pl script on Nick's blog post - it just mentions using Velvet’s bundled Perl script of that name.

        Comment


        • #5
          Thanks for the prompt reply! Below is the script I used:

          $ cat shuffleSequences.pl
          #!/usr/bin/perl

          $filenameA = $ARGV[0];
          $filenameB = $ARGV[1];
          $filenameOut = $ARGV[2];

          open $FILEA, "< $filenameA";
          open $FILEB, "< $filenameB";

          open $OUTFILE, "> $filenameOut";

          while(<$FILEA>) {
          print $OUTFILE $_;
          $_ = <$FILEA>;
          print $OUTFILE $_;

          $_ = <$FILEB>;
          print $OUTFILE $_;
          $_ = <$FILEB>;
          print $OUTFILE $_;
          }

          Comment


          • #6
            as well, this script was not actually on Nick Loman's blog, however it was mentioned in the text. I copied it from following website:

            Comment


            • #7
              I really can't recommend running random Perl scripts found online like that - it doesn't even have a comment at the start telling you what it should be doing. However, from my limited Perl knowledge, I think it is doing a very simple interleaving process assuming 2 lines per record, which would be OK for short read FASTA files with no line wrapping, but it does absolutely no error checking - thus it mangled your data without warning.

              If you look at the actual Velvet repository, it has some more clearly labelled Perl scripts, with a version for FASTA and another for FASTQ:
              Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 1...


              (They still need a bit of documentation, and in my personal view, error handling)

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              23 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X