Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Staden package question: is srf2fastq returning an error, or is this normal?

    I'm attempting to convert an SRF file into FASTQ format using srf2fastq, which is a tool included in the Staden package.

    I'm executing the following command:

    srf2fastq -s BC3 -a -n -c in.srf

    This should split the resulting FASTQ into two files since it's paired read data.

    When the program finishes, I get the following message:

    'Block of unknown type '3'. Aborting'

    The FASTQ files are created, but I don't know if they are truncated because of this "error".

    Is the above error normal?

  • #2
    This isn't normal. srf2fastq is telling you that it found something unexpected in the SRF file, because it is either corrupt or it has some sort of junk on the end.

    It is possible that srf2fastq managed to dump out all of the reads from the SRF file, but I wouldn't like to guarantee it in this case. Where did you get the SRF file from?

    Comment


    • #3
      The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.

      For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

      When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.

      Comment


      • #4
        Originally posted by mhayes View Post
        The SRF files are from the European Genome-phenome Archive. These datasets are not publicly available, however.
        That's a pity, it makes diagnosing the problem much more difficult. Have you tries getting in touch with the EGA? They could try converting the file at their end to see if they get the same error.

        For one of the SRF files that caused this error, I dumped it to a text file and compared the read ID at the end of that file with the read ID at the end of the corresponding FASTQ file. These read IDs matched, so the FASTQ file in *that* instance was being completely generated, despite the error.

        When I mapped the reads to the reference genome, the coverage was very low. So I'm not sure if it is because of the dataset or because all of the reads are not being generated.
        srf2fastq goes through the file sequentially, so if you have the last entry in the SRF file then you should have everything. You could also try using srf_list -l to see where the last entry is. For example:

        Code:
        srf_list -l 2010_1.srf | tail
        IL3_2010:1:100:1793:1774       6917868443 +  570 +  9625
        IL3_2010:1:100:1793:1939       6917869029 +  576 +  9625
        IL3_2010:1:100:1793:1988       6917869621 +  576 +  9625
        IL3_2010:1:100:1793:2011       6917870213 +  573 +  9625
        IL3_2010:1:100:1793:1851       6917870802 +  575 +  9625
        IL3_2010:1:100:1793:1104       6917871393 +  582 +  9625
        IL3_2010:1:100:1793:122        6917871991 +  577 +  9625
        IL3_2010:1:100:1793:1577       6917872583 +  578 +  9625
        IL3_2010:1:100:1793:1331       6917873177 +  578 +  9625
        IL3_2010:1:100:1793:1827       6917873771 +  580 +  9625
        The first number is the file position of the read, which should be near the end of the file for the last one. In this case the file is 7040277611 bytes long. The small difference is due to the name index on the end of the file. If you get a number that is much lower than the file length then you may well be missing some data. You can also try counting the number of files this way and see if it matches what you get in the fastq output.

        Comment


        • #5
          Thank you, rmdavies. The srf_list program also returns an "Unknown block type" error, and for all reads after this error, the file position is listed as "-1".

          However, all of the reads seem to be written to the fastq file, so there is no truncation. Regarding SRF to FASTQ conversion, I will assume that the error can be ignored for now, but I will contact the data providers just to be sure.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin


            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          39 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          35 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X