Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Processing SOLiD data from SRA using Tophat

    Hello all,

    I'm attempting to run Tophat on SOLiD data from an SRA file and running into problems with the fastq file formatting.

    After running fastq-dump on the SRA file, I get the following format:

    @SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    T000002201013000130000000.01...20...2....2.....2...
    +SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    !+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
    Executing Tophat like this:

    tophat -C -o output --bowtie1 ColorIndex SRR.fastq

    Results in the following error:

    Error running bowtie:
    Too few quality values for read: 2899T33
    are you sure this is a FASTQ-int file?
    I researched this error and found that the problem may be I need to use the --quals option and provide a separate quality file. So, I split the fastq file into two separate files:

    @SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    T000002201013000130000000.01...20...2....2.....2...
    +SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    !+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
    And ran:

    tophat -C --quals -o output --bowtie1 ColorIndex SRR.fastq SRR_qual.fastq

    That generates the following error:

    Error encountered parsing file SRR.fastq:
    Premature end of file (missing quality values for SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50)
    I can't find any information on how to properly format the base and quality files when they are separated so that Tophat can read them. Is this my problem? Or something else?

    <EDIT>

    I properly formatted the two split files into proper FASTA:

    >SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    T000002201013000130000000.01...20...2....2.....2...
    >SRR1119927.1 solid309_20110721_FRAG_BC_yadegari_1_55_1170 length=50
    !+,0,,/'*&/)&&)2%&+2.0%37!7%!!!1%!!!%!!!!5!!!!!5!!!
    But now get the following error:

    Error running 'prep_reads'
    Error: beginning of quality values record not found! (!'/,<&.&&*'%1*%.2(%&20%'&!')!!!%&!!!1!!!!1!!!!!%!!!)
    Last edited by Helical; 06-19-2014, 06:43 AM.

  • #2
    TopHat is probably expecting the data to be in 2 files, .csfasta and .qual.

    I think there should be a command 'abi-dump', instead of fastq-dump,

    that will produce the file formats that you need.

    Comment


    • #3
      Did you use fastq-dump, or abi-dump to generate your original files? If the SRA submission was actually in color space reads, then you should use "abi-dump" NOT fastq-dump with the SRA toolkit. The abi-dump command will actually give you matched csfasta/csqual files.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      71 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      80 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X