Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat color space

    Hi All,

    I have trouble using tophat with files downloaded from SRA.

    My command is:

    ./tophat -C -p 5 -o ./825tophat ../bowtie-0.12.7/indexes/ath_gmc_colspace_110510 ~/SRR039825.fastq

    Error encountered parsing file /home/SRR039825.fastq:
    Length mismatch between sequence and quality strings for SRR039825.1 923_6_55 (36 vs 36).

    The sequence is here:
    @SRR039825.1 923_6_55
    T00310202021210203103230203233012210
    +
    !;>1<998495<<3$4.40%/87-101*&3,8#%'#

    I dig into the code, and find the problem is at line 931-934 of tophat.py, where the length of the sequence has to be 1 character longer than the quality score.

    Why is this and how can I fix it?

    Thanks,

    Song Li

  • #2
    A little update on this issue:

    I wrote a script that trim off the first quality value in my fastq file. Then tophat runs smoothly through the whole analysis.

    I am still not sure that's the correct way of solving this problem.

    Thanks,


    Originally posted by SongLi View Post
    Hi All,

    I have trouble using tophat with files downloaded from SRA.

    My command is:

    ./tophat -C -p 5 -o ./825tophat ../bowtie-0.12.7/indexes/ath_gmc_colspace_110510 ~/SRR039825.fastq

    Error encountered parsing file /home/SRR039825.fastq:
    Length mismatch between sequence and quality strings for SRR039825.1 923_6_55 (36 vs 36).

    The sequence is here:
    @SRR039825.1 923_6_55
    T00310202021210203103230203233012210
    +
    !;>1<998495<<3$4.40%/87-101*&3,8#%'#

    I dig into the code, and find the problem is at line 931-934 of tophat.py, where the length of the sequence has to be 1 character longer than the quality score.

    Why is this and how can I fix it?

    Thanks,

    Song Li
    Last edited by SongLi; 12-22-2010, 08:24 AM.

    Comment


    • #3
      Originally posted by SongLi View Post
      A little update on this issue:

      I wrote a script that trim off the first quality value in my fastq file. Then tophat runs smoothly through the whole analysis.

      I am still not sure that's the correct way of solving this problem.

      Thanks,
      This is due to the format used by NCBI. NCBI transforms all the data from different platforms to a standard FASTQ format.
      Tophat uses bowtie for reads mapping and it expects csfasta and qual files if the data is color-spaced. Sequence in csfasta has additional 'T' adapter comparing to qual file, so tophat expects one more base. Just tell bowtie you use fastq format rather than fasta.

      Comment


      • #4
        hi all,

        I am new to NGS analysis field. I am working on RNA-Seq data, aim it to identify all novel junctions and transcripts. I would appreciate if any one can help me out in using tophat and cufflinks for that matter.

        Thanks in advance

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X