Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • aromanowski
    Junior Member
    • May 2013
    • 2

    Error running Tophat2 + Bowtie 1 on old SOLiD4 RNAseq SE 50bp data

    Dear All,
    I have downloaded some data from SRA (SRP011410), which were on an ABI SOLiD4 (50bp RNAseq reads).

    I am trying to align the reads to the Ensembl TAIR10 reference genome with bowtie 1 and I get the following error:

    Code:
    [2018-02-13 18:59:36] Beginning TopHat run (v2.1.1)
    -----------------------------------------------
    [2018-02-13 18:59:36] Checking for Bowtie
    		  Bowtie version:	 1.2.2.0
    [2018-02-13 18:59:36] Checking for Bowtie index files (genome)..
    [2018-02-13 18:59:36] Checking for reference FASTA file
    [2018-02-13 18:59:36] Generating SAM header for genome-color
    [2018-02-13 18:59:37] Preparing reads
    	 left reads: min. length=50, max. length=50, 43369633 kept reads (234454 discarded)
    [2018-02-13 19:08:20] Mapping left_kept_reads to genome genome-color with Bowtie 
    	[FAILED]
    Error running bowtie:
    Reads file contained a pattern with more than 1024 quality values.
    Please truncate reads and quality values and and re-run Bowtie
    terminate called after throwing an instance of 'int'
    In order to get to that stage, I:
    1) I converted the SRA data to .csfata and .quals files using the abi-dump (v2.8.2) command from the sra toolkit.
    2) I used the Ensembl TAIR10 genome to build the colorspace indexes for bowtie 1, using the command:
    Code:
    bowtie-build -C genome.fa
    3) I ran the following code for tophat2:

    Code:
    tophat2 -p 8 -I 5000 --bowtie1 --color --quals --library-type=fr-secondstrand -o <output_dir> genome-color <csfasta file> <quals file>
    I appreciate any help that you can give me to solve this issue!!!

    Thank you very much!

    Best regards,
    Andres
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    I expect that you should be putting csfastq files into bowtie, rather than csfasta + qual.

    But you'd be a lot better off leaving SOLiD data alone. Replicate the experiment using full-length cDNA sequencing on a MinION; you'll get more data that is much easier to reliably match to isoforms, and probably faster and cheaper than it'll take to sort through the minefield of SOLiD bioinformatics.

    Comment

    • aromanowski
      Junior Member
      • May 2013
      • 2

      #3
      Originally posted by gringer View Post
      I expect that you should be putting csfastq files into bowtie, rather than csfasta + qual.
      Thank you for your advice! I tried using the fastq (generated with fastq-dump instead of abi-dump) and got the same error:

      Code:
      tophat2 -p 8 -I 5000 --bowtie1 --color --library-type=fr-secondstrand -o Col-0_WL_01_thout genome-color SRR444071.fastq 
      
      [2018-02-19 11:55:17] Beginning TopHat run (v2.1.1)
      -----------------------------------------------
      [2018-02-19 11:55:17] Checking for Bowtie
      		  Bowtie version:	 1.2.2.0
      [2018-02-19 11:55:17] Checking for Bowtie index files (genome)..
      [2018-02-19 11:55:17] Checking for reference FASTA file
      [2018-02-19 11:55:17] Generating SAM header for genome-color
      [2018-02-19 11:55:17] Preparing reads
      	 left reads: min. length=50, max. length=50, 40721267 kept reads (225627 discarded)
      [2018-02-19 12:01:53] Mapping left_kept_reads to genome genome-color with Bowtie 
      	[FAILED]
      Error running bowtie:
      Reads file contained a pattern with more than 1024 quality values.
      Please truncate reads and quality values and and re-run Bowtie
      terminate called after throwing an instance of 'int'
      Maybe I should quality filter these reads before starting the mapping? I could definitely try that...

      I also tried using the STAR aligner with this same .fastq file and got 0 mapped reads.

      Your suggestions sound more and more enticing every minute that goes by:
      Originally posted by gringer View Post
      But you'd be a lot better off leaving SOLiD data alone. Replicate the experiment using full-length cDNA sequencing on a MinION; you'll get more data that is much easier to reliably match to isoforms, and probably faster and cheaper than it'll take to sort through the minefield of SOLiD bioinformatics.
      I got a similar dataset (not exactly same tissue, but same genotypes and conditions) from another group that used Illumina, so will try to map those ones with Tophat2 + Bowtie2.

      Anyway, it would have been nice to solve the SOLID issue just for the fun of it. However, time is of the essence and I better get the data instead of satisfying this (now personal) problem! =)

      Thank you,
      Andres

      Comment

      • gringer
        David Eccles (gringer)
        • May 2011
        • 845

        #4
        Something else to try: get rid of reads that have '.' in their sequence:

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 10:09 AM
        0 responses
        10 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        27 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        22 views
        0 reactions
        Last Post SEQadmin2  
        Working...