SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat2/2.1.0 error running from bam2fastx natalie_1 RNA Sequencing 1 02-29-2016 09:02 AM
Tophat2: Error running bowtie: JonB Bioinformatics 5 01-18-2014 03:09 AM
TopHat2 error running 'prep_reads' Kaskar Bioinformatics 5 07-20-2013 01:16 AM
TopHat2 Error running 'prep_reads' srmaruyama Bioinformatics 0 05-30-2013 03:34 PM
Tophat2 Error running 'long_spanning_reads': dvanic Bioinformatics 16 04-26-2013 12:56 PM

Reply
 
Thread Tools
Old 02-14-2018, 03:51 AM   #1
aromanowski
Junior Member
 
Location: Buenos Aires, Argentina

Join Date: May 2013
Posts: 2
Default Error running Tophat2 + Bowtie 1 on old SOLiD4 RNAseq SE 50bp data

Dear All,
I have downloaded some data from SRA (SRP011410), which were on an ABI SOLiD4 (50bp RNAseq reads).

I am trying to align the reads to the Ensembl TAIR10 reference genome with bowtie 1 and I get the following error:

Code:
[2018-02-13 18:59:36] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2018-02-13 18:59:36] Checking for Bowtie
		  Bowtie version:	 1.2.2.0
[2018-02-13 18:59:36] Checking for Bowtie index files (genome)..
[2018-02-13 18:59:36] Checking for reference FASTA file
[2018-02-13 18:59:36] Generating SAM header for genome-color
[2018-02-13 18:59:37] Preparing reads
	 left reads: min. length=50, max. length=50, 43369633 kept reads (234454 discarded)
[2018-02-13 19:08:20] Mapping left_kept_reads to genome genome-color with Bowtie 
	[FAILED]
Error running bowtie:
Reads file contained a pattern with more than 1024 quality values.
Please truncate reads and quality values and and re-run Bowtie
terminate called after throwing an instance of 'int'
In order to get to that stage, I:
1) I converted the SRA data to .csfata and .quals files using the abi-dump (v2.8.2) command from the sra toolkit.
2) I used the Ensembl TAIR10 genome to build the colorspace indexes for bowtie 1, using the command:
Code:
bowtie-build -C genome.fa
3) I ran the following code for tophat2:

Code:
tophat2 -p 8 -I 5000 --bowtie1 --color --quals --library-type=fr-secondstrand -o <output_dir> genome-color <csfasta file> <quals file>
I appreciate any help that you can give me to solve this issue!!!

Thank you very much!

Best regards,
Andres
aromanowski is offline   Reply With Quote
Old 02-18-2018, 02:33 PM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 815
Default

I expect that you should be putting csfastq files into bowtie, rather than csfasta + qual.

But you'd be a lot better off leaving SOLiD data alone. Replicate the experiment using full-length cDNA sequencing on a MinION; you'll get more data that is much easier to reliably match to isoforms, and probably faster and cheaper than it'll take to sort through the minefield of SOLiD bioinformatics.
gringer is offline   Reply With Quote
Old 02-19-2018, 09:04 AM   #3
aromanowski
Junior Member
 
Location: Buenos Aires, Argentina

Join Date: May 2013
Posts: 2
Default

Quote:
Originally Posted by gringer View Post
I expect that you should be putting csfastq files into bowtie, rather than csfasta + qual.
Thank you for your advice! I tried using the fastq (generated with fastq-dump instead of abi-dump) and got the same error:

Code:
tophat2 -p 8 -I 5000 --bowtie1 --color --library-type=fr-secondstrand -o Col-0_WL_01_thout genome-color SRR444071.fastq 

[2018-02-19 11:55:17] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2018-02-19 11:55:17] Checking for Bowtie
		  Bowtie version:	 1.2.2.0
[2018-02-19 11:55:17] Checking for Bowtie index files (genome)..
[2018-02-19 11:55:17] Checking for reference FASTA file
[2018-02-19 11:55:17] Generating SAM header for genome-color
[2018-02-19 11:55:17] Preparing reads
	 left reads: min. length=50, max. length=50, 40721267 kept reads (225627 discarded)
[2018-02-19 12:01:53] Mapping left_kept_reads to genome genome-color with Bowtie 
	[FAILED]
Error running bowtie:
Reads file contained a pattern with more than 1024 quality values.
Please truncate reads and quality values and and re-run Bowtie
terminate called after throwing an instance of 'int'
Maybe I should quality filter these reads before starting the mapping? I could definitely try that...

I also tried using the STAR aligner with this same .fastq file and got 0 mapped reads.

Your suggestions sound more and more enticing every minute that goes by:
Quote:
Originally Posted by gringer View Post
But you'd be a lot better off leaving SOLiD data alone. Replicate the experiment using full-length cDNA sequencing on a MinION; you'll get more data that is much easier to reliably match to isoforms, and probably faster and cheaper than it'll take to sort through the minefield of SOLiD bioinformatics.
I got a similar dataset (not exactly same tissue, but same genotypes and conditions) from another group that used Illumina, so will try to map those ones with Tophat2 + Bowtie2.

Anyway, it would have been nice to solve the SOLID issue just for the fun of it. However, time is of the essence and I better get the data instead of satisfying this (now personal) problem! =)

Thank you,
Andres
aromanowski is offline   Reply With Quote
Old 02-19-2018, 09:31 AM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 815
Default

Something else to try: get rid of reads that have '.' in their sequence:

http://seqanswers.com/forums/showthread.php?t=6297
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO