Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error running Tophat2 + Bowtie 1 on old SOLiD4 RNAseq SE 50bp data

    Dear All,
    I have downloaded some data from SRA (SRP011410), which were on an ABI SOLiD4 (50bp RNAseq reads).

    I am trying to align the reads to the Ensembl TAIR10 reference genome with bowtie 1 and I get the following error:

    Code:
    [2018-02-13 18:59:36] Beginning TopHat run (v2.1.1)
    -----------------------------------------------
    [2018-02-13 18:59:36] Checking for Bowtie
    		  Bowtie version:	 1.2.2.0
    [2018-02-13 18:59:36] Checking for Bowtie index files (genome)..
    [2018-02-13 18:59:36] Checking for reference FASTA file
    [2018-02-13 18:59:36] Generating SAM header for genome-color
    [2018-02-13 18:59:37] Preparing reads
    	 left reads: min. length=50, max. length=50, 43369633 kept reads (234454 discarded)
    [2018-02-13 19:08:20] Mapping left_kept_reads to genome genome-color with Bowtie 
    	[FAILED]
    Error running bowtie:
    Reads file contained a pattern with more than 1024 quality values.
    Please truncate reads and quality values and and re-run Bowtie
    terminate called after throwing an instance of 'int'
    In order to get to that stage, I:
    1) I converted the SRA data to .csfata and .quals files using the abi-dump (v2.8.2) command from the sra toolkit.
    2) I used the Ensembl TAIR10 genome to build the colorspace indexes for bowtie 1, using the command:
    Code:
    bowtie-build -C genome.fa
    3) I ran the following code for tophat2:

    Code:
    tophat2 -p 8 -I 5000 --bowtie1 --color --quals --library-type=fr-secondstrand -o <output_dir> genome-color <csfasta file> <quals file>
    I appreciate any help that you can give me to solve this issue!!!

    Thank you very much!

    Best regards,
    Andres

  • #2
    I expect that you should be putting csfastq files into bowtie, rather than csfasta + qual.

    But you'd be a lot better off leaving SOLiD data alone. Replicate the experiment using full-length cDNA sequencing on a MinION; you'll get more data that is much easier to reliably match to isoforms, and probably faster and cheaper than it'll take to sort through the minefield of SOLiD bioinformatics.

    Comment


    • #3
      Originally posted by gringer View Post
      I expect that you should be putting csfastq files into bowtie, rather than csfasta + qual.
      Thank you for your advice! I tried using the fastq (generated with fastq-dump instead of abi-dump) and got the same error:

      Code:
      tophat2 -p 8 -I 5000 --bowtie1 --color --library-type=fr-secondstrand -o Col-0_WL_01_thout genome-color SRR444071.fastq 
      
      [2018-02-19 11:55:17] Beginning TopHat run (v2.1.1)
      -----------------------------------------------
      [2018-02-19 11:55:17] Checking for Bowtie
      		  Bowtie version:	 1.2.2.0
      [2018-02-19 11:55:17] Checking for Bowtie index files (genome)..
      [2018-02-19 11:55:17] Checking for reference FASTA file
      [2018-02-19 11:55:17] Generating SAM header for genome-color
      [2018-02-19 11:55:17] Preparing reads
      	 left reads: min. length=50, max. length=50, 40721267 kept reads (225627 discarded)
      [2018-02-19 12:01:53] Mapping left_kept_reads to genome genome-color with Bowtie 
      	[FAILED]
      Error running bowtie:
      Reads file contained a pattern with more than 1024 quality values.
      Please truncate reads and quality values and and re-run Bowtie
      terminate called after throwing an instance of 'int'
      Maybe I should quality filter these reads before starting the mapping? I could definitely try that...

      I also tried using the STAR aligner with this same .fastq file and got 0 mapped reads.

      Your suggestions sound more and more enticing every minute that goes by:
      Originally posted by gringer View Post
      But you'd be a lot better off leaving SOLiD data alone. Replicate the experiment using full-length cDNA sequencing on a MinION; you'll get more data that is much easier to reliably match to isoforms, and probably faster and cheaper than it'll take to sort through the minefield of SOLiD bioinformatics.
      I got a similar dataset (not exactly same tissue, but same genotypes and conditions) from another group that used Illumina, so will try to map those ones with Tophat2 + Bowtie2.

      Anyway, it would have been nice to solve the SOLID issue just for the fun of it. However, time is of the essence and I better get the data instead of satisfying this (now personal) problem! =)

      Thank you,
      Andres

      Comment


      • #4
        Something else to try: get rid of reads that have '.' in their sequence:

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X