Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert SRA to FASTQ with fastq-dump but problem of read length

    Hello,
    I have Illumina paired end reads of length 76bp.
    The problem is that when I use fastq-dump to obtain two files with paired reads separated, it splits the reads into 101bp and 51bp rather that 76+76...
    I tried with the options --split-files, --split-spot, --split-3 and always have the same result.
    I also tried different fastq-dump versions: 1 ; 2 ; 2.3.4 and 2.3.5.2.
    Do you have an idea how I can do that ?
    Thanks !

  • #2
    It is possible that the dataset you are looking at has asymmetric reads. Have you looked at the record in SRA to see if that is the case?

    Comment


    • #3
      I don't know if this is possible.
      When I convert SRA to FASTQ without any option, I obtain a fastq with 152bp reads.

      Here is the page where I downloaded the sra file : http://www.ncbi.nlm.nih.gov/sra/?term=SRR1174239

      Comment


      • #4
        Based on the record it looks like a standard 2 x 101 bp PE dataset.

        Update: The information in SRA appears to be incorrect since the dataset is dumping with 152 bp length (so would be 2 x 76).
        Last edited by GenoMax; 10-07-2014, 07:20 AM.

        Comment


        • #5
          Yeah I agree but then why do I obtain 152bp reads when using fastq-dump ?!!

          Comment


          • #6
            This appears to be an asymetric dataset (101 x 51) as originally suspected. See attached screencap.
            Attached Files

            Comment


            • #7
              OK but when I do fastqc on the fastq file with 101bp reads, I obtain the attached graph that is, to me, typical of problems of read splitting with bad length.
              Attached Files

              Comment


              • #8
                It does appear that something is fishy.

                You may want to email SRA support and ask them to look into this data set. You could also alert the submitter independently.

                Comment


                • #9
                  OK, thanks a lot !

                  Comment


                  • #10
                    I ran into this issue with another data set. The problem was SRA miss-parsed the fastq files. A few emails between the help desk and the original depositor resulted in SRA reformatting the files.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    25 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X