Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Staden package question: is srf2fastq returning an error, or is this normal?

    I'm attempting to convert an SRF file into FASTQ format using srf2fastq, which is a tool included in the Staden package.

    I'm executing the following command:

    srf2fastq -s BC3 -a -n -c in.srf

    This should split the resulting FASTQ into two files since it's paired read data.

    When the program finishes, I get the following message:

    'Block of unknown type '3'. Aborting'

    The FASTQ files are created, but I don't know if they are truncated because of this "error".

    Is the above error normal?

  • #2
    This isn't normal. srf2fastq is telling you that it found something unexpected in the SRF file, because it is either corrupt or it has some sort of junk on the end.

    It is possible that srf2fastq managed to dump out all of the reads from the SRF file, but I wouldn't like to guarantee it in this case. Where did you get the SRF file from?

    Comment


    • #3
      The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.

      For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

      When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.

      Comment


      • #4
        Originally posted by mhayes View Post
        The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.
        That's a pity, it makes diagnosing the problem much more difficult. Have you tries getting in touch with the EGA? They could try converting the file at their end to see if they get the same error.

        For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

        When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.
        srf2fastq goes through the file sequentially, so if you have the last entry in the SRF file then you should have everything. You could also try using srf_list -l to see where the last entry is. For example:

        Code:
        srf_list -l 2010_1.srf | tail
        IL3_2010:1:100:1793:1774       6917868443 +  570 +  9625
        IL3_2010:1:100:1793:1939       6917869029 +  576 +  9625
        IL3_2010:1:100:1793:1988       6917869621 +  576 +  9625
        IL3_2010:1:100:1793:2011       6917870213 +  573 +  9625
        IL3_2010:1:100:1793:1851       6917870802 +  575 +  9625
        IL3_2010:1:100:1793:1104       6917871393 +  582 +  9625
        IL3_2010:1:100:1793:122        6917871991 +  577 +  9625
        IL3_2010:1:100:1793:1577       6917872583 +  578 +  9625
        IL3_2010:1:100:1793:1331       6917873177 +  578 +  9625
        IL3_2010:1:100:1793:1827       6917873771 +  580 +  9625
        The first number is the file position of the read, which should be near the end of the file for the last one. In this case the file is 7040277611 bytes long. The small difference is due to the name index on the end of the file. If you get a number that is much lower than the file length then you may well be missing some data. You can also try counting the number of files this way and see if it matches what you get in the fastq output.

        Comment


        • #5
          Thank you, rmdavies. The srf_list program also returns an "Unknown block type" error, and for all reads after this error, the file position is listed as "-1".

          However, all of the reads seem to be written to the fastq file, so there is no truncation. Regarding SRF to FASTQ conversion, I will assume that the error can be ignored for now, but I will contact the data providers just to be sure.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X