Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eland standalone from fastq

    Hello, I'm trying to reproduce some results and the published data is on the NCBI database in fastq format. I wish to reproduce alignment and peak calling using the same software as the original study, and therefore I want to run eland as a first step. I believe that I should be able to use the files as input after using the MAQ utility to convert solexa to sanger fastq formats.

    However, it's not working out. I'ver renamed the lines manually so they look like what is expected for the sample names (@machineosition#0/1) but somehow there's still an error.

    My file looks like this :

    @SOLEXAWS1_2062VAAXX:1:1:46:643#0/1
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    +
    yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

    .................................................................

    @SOLEXAWS1_2062VAAXX:1:300:139:944#0/1
    TTTCTCTTCTCACCGCTGTTTGTTTCCCCTTG
    +
    yyyyytyyyyyebtyyyyyyyyyyylpyybvy

    After launching the following command :

    perl /path/Solexa/GAPipeline-1.5.1/bin/ELAND_standalone.pl -if input_file.fastq -it fastq -eg mm9/myMouseGenome/

    I get the error :

    Argument "" isn't numeric in sprintf at /path/Solexa/GAPipeline-1.5.1/lib/perl/Gerald/Common.pm line 381, <GEN0> line 1.

    Can someone please point out to me a possible error in my fastq file or usage of eland?

    Thanks

  • #2
    Originally posted by ikrier View Post
    Hello, I'm trying to reproduce some results and the published data is on the NCBI database in fastq format. I wish to reproduce alignment and peak calling using the same software as the original study, and therefore I want to run eland as a first step. I believe that I should be able to use the files as input after using the MAQ utility to convert solexa to sanger fastq formats.
    I thought the NCBI SRA provided FASTQ files already in the Sanger FASTQ format (not the Solexa/Illumina FASTQ variants)

    Comment


    • #3
      Sorry, I meant the opposite. I used MAQ to change the sanger from NCBI to the solexa expected by eland.

      Comment


      • #4
        Originally posted by ikrier View Post
        Sorry, I meant the opposite. I used MAQ to change the sanger from NCBI to the solexa expected by eland.
        Ah - going from the NCBI provided Sanger style FASTQ to Solexa/Illumina style FASTQ makes much more sense. But I was under the impression MAQ didn't offer this direction of conversion...

        Also note if you are trying to use the Illumina GAPipeline-1.5.1, then you probably want the newer Illumina 1.3+ FASTQ style (using PHRED scores), not the older Solexa FASTQ style (using Solexa scores). This distinction is important if you have poor quality reads. See also:

        Comment


        • #5
          Originally posted by ikrier View Post
          I get the error :

          Argument "" isn't numeric in sprintf at /path/Solexa/GAPipeline-1.5.1/lib/perl/Gerald/Common.pm line 381, <GEN0> line 1.

          Can someone please point out to me a possible error in my fastq file or usage of eland?
          I would guess that this is down to the formatting you have used in the FASTQ identifiers - if you understand Perl, then reading that Perl file may help.

          Comment


          • #6
            Hi, thanks for the idea. I'm using Maq 0.7.1 which apparently needs a patch to convert new fastqs to the sanger. But can it handle converting to the new solexa format as well do you think?

            Comment


            • #7
              I tried to read the perl file and modify accordingly my file. I thought the problem might be the identifier, because that's what line 381 is about. But even after I modified to take the "AAXX" part out and replace by a number (apparently the program expects a run number there after the underscore and before the colon, which I don't have) it still gave me the error. I really don't understand.

              Comment


              • #8
                Originally posted by ikrier View Post
                Hi, thanks for the idea. I'm using Maq 0.7.1 which apparently needs a patch to convert new fastqs to the sanger. But can it handle converting to the new solexa format as well do you think?
                I see why I was confused: the main maq binary itself has just sol2sanger, however, the bundled Perl script fq_all2std.pl has sol2std (which does the same thing, but not as quickly as I recall) plus more recent versions also have the inverse function std2sol (which you used). Confusingly the copy at this URL is currently out of date and lacks the std2sol function, http://maq.sourceforge.net/fq_all2std.pl

                Currently MAQ does not (to my knowledge) support the new Illumina 1.3+ FASTQ variant, without a patch like the one here:

                I think I saw a patch to fq_all2std.pl to update that too.

                However, this is not the root of your error message from Eland/Gerald.

                Comment


                • #9
                  Hi ikrier,

                  Whilst you renamed the header for the sequence data line "@SOLEXAWS1_2062VAAXX:1:1:46:643#0/1" you didn't rename the header for the quality scores. This is what an input file for eland looks like with matching headers for sequence and quality scores with the exception of @/+:

                  @PSI179204:8:1:6:425#0/1
                  GGCCAGTATTCCTGGAGGATATAACACTGACATCAGCAGG
                  +PSI179204:8:1:6:425#0/1
                  aabbbbab`]aa]SLZ[Za_aba]Y[[]X^``WRW[^^Ua

                  I'm not convinced that your pipeline for converting the sanger formatted data back to Illumina format is accurate, why not contact the authors who might give you access to the actual data without having to format and reformat?

                  Elaine

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X