Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem understanding NCBI SRA fastq files

    I downloaded some sra files from NCBI's short read archive and converted them to fastq format. The experiment is described as paired end reads, so I expected to get two fastq files from each sra file. Instead, I only got one fastq file from each. Then I thought that I could find which reads were read1 reads and which ones were read2 reads, but I couldn't see anything to indicate whether it's a read1 or a read2. Here are some lines from one of the files:


    @SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
    NACAAAGGTAATTGCAAGTCCCTTCGTGCCAAAACGTCCAGCCCTTCCAACCCTGTGCAAATAAGTATCAGCTGAGTCTGAATCTGCATTCATTCTGGAATGACTCAGGAAGAAAGGCTAACAAGATATAAGAACTTCAAGGAAGGCCACAAGAGAATTC
    +SRR254172.11 ILLUMINA-20A1B2_0004_FC6282EAAXX:6:1:1921:953 length=160
    #)0+)**2,,@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@:3:::@@@22:<<:8@@:@@@@@@@IIHIIIIIIII?HHIIFIGIIIIIIEGIGHIIIIFAIBDIHHGEHDBEFIIB<IIHHI3EFEDFC@HH@F@2;8<>@0??

    You get one line starting with @, then a line with the sequence, then a line essentially identical to the @ line except starting with + rather than @, and then a line with base quality scores.

    Does anyone understand this format and how I can get fastq files for both read1 and read2?

    Thank you.

    Eric

  • #2
    The basics of FASTQ are described here http://nar.oxfordjournals.org/conten...r.gkp1137.full and http://en.wikipedia.org/wiki/FASTQ_format

    How did you do the conversion? I recall there are extra switches needed at the command line for paired end data...
    Last edited by maubp; 03-30-2012, 12:23 AM.

    Comment


    • #3
      Originally posted by maubp View Post
      How did you do the conversion? I recall there are extra switches needed at the command line for paired end data...
      Hi Maubp,

      I'll bet that that's where I'm making a mistake. I did the conversion in two ways, neither of which gave me the paired end reads I wanted:

      fastq-dump *.sra

      fastq-dump.2 *.sra

      Eric

      Comment


      • #4
        You need to use the --split-3 option.

        fastq-dump --split-3 *.sra

        Comment


        • #5
          Originally posted by jrm5100 View Post
          You need to use the --split-3 option.

          fastq-dump --split-3 *.sra
          Thanks so very much!!

          Eric

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            Yesterday, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          58 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X