Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastx_quality_stats error with paired end sequencesr

    Greetings, I have just recently received a HiSeq Illumina run (paired end, 72bp) of several genomes and metagenomes.

    I am currently trying to retrieve quality stat info for the demultiplexed samples after combining the two paired end .fastq files using shuffleSeqs.pl. When using fastx_quality_stats on the resulting combined file, i receive the following error:

    fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 5. Is this a valid FASTQ file?

    I went back and tried using fastx_quality_stats on both of the paired end samples independently, and it worked just fine.

    Just curious if anyone else has run into a similar problem with trying to combine paired end sequence data, and if they would be willing to offer advice or a solution. It am fairly certain the combination step is the portion of the process that is introducing the problem.

    shuffleSeqs.pl was downloaded from the following website:


    Although i am fairly certain this is a part of the velvet package as well.

    Thanks,

    -Tony

  • #2
    Well, what was line 5 of the file? Perhaps you could show us the output of the command 'head -n 10 example.fastq' or similar? Use the [ code ] and [ /code ] tags to ensure the forum displays the output nicely (available as a button in the advanced view editor).

    Comment


    • #3
      Below is your requested output (maubp):
      head -n 10 shuffled.fastq
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACACG
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GGTTTCGAAAAGAGGGGGGGGGGGGGAGAGGGGGGGAAACCGGTGGGGCCCCCCCCCAANAAAAAAAAAAAAAAAA
      +
      @@@DDDDDHFFDHGIID<C?<BHG<GGGDCBBHB;?FGHI9??BBFH@>GCF;A.-==@C@C36@;@D########
      +
      ############################################################################
      @HWI-ST700693:263:C0K6DACXX:3:1101:2339:2112
      CATGTAGTGAACCATATGCTCCAGTAATACCTTGAACAATGACTCCTTTATTTTCATAATCAGAATCCTCTGGTTT

      Comment


      • #4
        That FASTQ file is certainly messed up. My guess is you used a FASTA interleaving script which assumed 2 lines per record... while FASTQ files usually have 4 lines per record.

        Which script exactly did you use? There is no shuffleSequences.pl script on Nick's blog post - it just mentions using Velvet’s bundled Perl script of that name.

        Comment


        • #5
          Thanks for the prompt reply! Below is the script I used:

          $ cat shuffleSequences.pl
          #!/usr/bin/perl

          $filenameA = $ARGV[0];
          $filenameB = $ARGV[1];
          $filenameOut = $ARGV[2];

          open $FILEA, "< $filenameA";
          open $FILEB, "< $filenameB";

          open $OUTFILE, "> $filenameOut";

          while(<$FILEA>) {
          print $OUTFILE $_;
          $_ = <$FILEA>;
          print $OUTFILE $_;

          $_ = <$FILEB>;
          print $OUTFILE $_;
          $_ = <$FILEB>;
          print $OUTFILE $_;
          }

          Comment


          • #6
            as well, this script was not actually on Nick Loman's blog, however it was mentioned in the text. I copied it from following website:

            Comment


            • #7
              I really can't recommend running random Perl scripts found online like that - it doesn't even have a comment at the start telling you what it should be doing. However, from my limited Perl knowledge, I think it is doing a very simple interleaving process assuming 2 lines per record, which would be OK for short read FASTA files with no line wrapping, but it does absolutely no error checking - thus it mangled your data without warning.

              If you look at the actual Velvet repository, it has some more clearly labelled Perl scripts, with a version for FASTA and another for FASTQ:
              Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 1...


              (They still need a bit of documentation, and in my personal view, error handling)

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X