SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alignment PE reads with different length by TopHat Trudy Bioinformatics 4 10-17-2013 11:47 AM
TopHat 1.1 failing on colorspace SE reads krobison Bioinformatics 22 09-22-2011 11:25 AM
Alignment PE reads with different length by TopHat Trudy Introductions 1 03-24-2011 02:33 PM
TopHat & Cufflinks failing to assemble full length transcripts jlhaner Bioinformatics 3 10-13-2010 10:46 AM
problems with tophat failing to find bowtie (MacOSX) martinobarenco Bioinformatics 5 09-23-2010 06:03 AM

Reply
 
Thread Tools
Old 03-31-2012, 04:33 AM   #1
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default Tophat problem: failing reads alignment

Hi all,
i'm dealing for the first time with Rna-seq data and i'm training performing some exercises running tophat on "wgEncodeCshlLongRnaSeqK562CellTotalFastqRep1.fastq" reads data from ENCODE project.
I'v issued the cocmmand "tophat -p 24 -G genes.gtf -o K562_1 hg19 reads.fastq" where genes.gtf is the transcript annotation file (hg19) from Illumina, hg19 is the reference genome (bowtie index) and reads.fastq is the single end reads file mentioned above (152 in length)
After 15 hrs i have in the output directory a very short .bam file (around 300 Kbyte).
Looking at the log files i find:

bowtie.left_kept_reads.fixmap.log:
::::::::::::::
# reads processed: 77046522
# reads with at least one reported alignment: 82 (0.00%)
# reads that failed to align: 77046440 (100.00%)
Reported 82 alignments to 1 output stream(s)

::::::::::::::
bowtie.left_kept_reads_seg1.fixmap.log
::::::::::::::
# reads processed: 77046440
# reads with at least one reported alignment: 17276886 (22.42%)
# reads that failed to align: 59283777 (76.95%)
# reads with alignments suppressed due to -m: 485777 (0.63%)
Reported 83222144 alignments to 1 output stream(s)
::::::::::::::
bowtie.left_kept_reads_seg2.fixmap.log
::::::::::::::
# reads processed: 77046440
# reads with at least one reported alignment: 22628110 (29.37%)
# reads that failed to align: 53761233 (69.78%)
# reads with alignments suppressed due to -m: 657097 (0.85%)
Reported 119440312 alignments to 1 output stream(s)
::::::::::::::
bowtie.left_kept_reads_seg3.fixmap.log
::::::::::::::
# reads processed: 77046440
# reads with at least one reported alignment: 18924892 (24.56%)
# reads that failed to align: 57402732 (74.50%)
# reads with alignments suppressed due to -m: 718816 (0.93%)
Reported 122913616 alignments to 1 output stream(s)
::::::::::::::
bowtie.left_kept_reads_seg4.fixmap.log
::::::::::::::
# reads processed: 77046440
# reads with at least one reported alignment: 12507630 (16.23%)
# reads that failed to align: 64207195 (83.34%)
# reads with alignments suppressed due to -m: 331615 (0.43%)
Reported 56082256 alignments to 1 output stream(s)
::::::::::::::
bowtie.left_kept_reads_seg5.fixmap.log
::::::::::::::
# reads processed: 77046440
# reads with at least one reported alignment: 19159264 (24.87%)
# reads that failed to align: 57337091 (74.42%)
# reads with alignments suppressed due to -m: 550085 (0.71%)
Reported 100864131 alignments to 1 output stream(s)
::::::::::::::
bowtie.left_kept_reads_seg6.fixmap.log
::::::::::::::
# reads processed: 77046440
# reads with at least one reported alignment: 7086758 (9.20%)
# reads that failed to align: 69837946 (90.64%)
# reads with alignments suppressed due to -m: 121736 (0.16%)
Reported 26553396 alignments to 1 output stream(s)

Moreover i find a lot (around 200) of "malformed closure" warnings in long_spanning_reads.log

Thanx a lot for any suggestions/advice.
Annibal is offline   Reply With Quote
Old 04-03-2012, 04:05 AM   #2
swaraj
Member
 
Location: Naples, Italy

Join Date: Feb 2012
Posts: 50
Default

I would ask you to include the genome in your tophat run

so you new command will be

"tophat -p 24 -G genes.gtf /path/to/genome -o K562_1 hg19 reads.fastq"

The genome file name would be the common prefix for files you generate using the genome fasta file and bowtie

"bowtie-build genome.fa genome"
swaraj is offline   Reply With Quote
Old 04-03-2012, 04:50 AM   #3
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default

I've included it.
As mentioned above it is "hg19 ". This are the .ebwt index and bowtie correctly build the hg19.fa reference file.

I've tried the same procedure but using paired end fastq reads (2x76) from different Rna-seq (wgEncodeCshlLongRnaSeqK562CellLongnonpolyaFastqRd1Rep1.fastq.gz and wgEncodeCshlLongRnaSeqK562CellLongnonpolyaFastqRd2Rep1.fastq.gz ) and it worked.
Maybe the problem is the format of the single end reads data of 152 nt?

Thanx
Annibal is offline   Reply With Quote
Old 04-06-2012, 06:35 AM   #4
Julien Roux
Member
 
Location: Chicago

Join Date: Dec 2011
Posts: 24
Default

Quote:
Originally Posted by Annibal View Post
Maybe the problem is the format of the single end reads data of 152 nt?
Probably this is part of the problem since by default Tophat only allows 2 mismatches on the whole read. I had a similar problem when analyzing reads of 107bp. Switching to --bowtie-n mode might help since the mismatches are counted only in the seed region (28 first bp). But still, I found no way to increase the parameter "-e" of Bowtie from Tophat command line, and I suspect it might be too restrictive for long reads.
If you find a way to improve the alignment, please please keep us informed!
Julien Roux is offline   Reply With Quote
Old 04-09-2012, 04:58 AM   #5
anurag.gautam
Member
 
Location: India

Join Date: Oct 2010
Posts: 15
Default

can anybody help with this error
I tried to map Illumina paired-end RNA seq reads of Rice to reference genome.
I ran the tophat with the following command:

/opt/tophat-1.4.1.Linux_x86_64/tophat -p 4 -o output -G osa.gtf /home/anurag.gautam/03_Genomes/Oryza_sativa_Indica/bowtie/osa SRR037735_1.fastq SRR037735_2.fastq
[Mon Apr 9 18:11:39 2012] Beginning TopHat run (v1.4.1)
-----------------------------------------------
[Mon Apr 9 18:11:39 2012] Preparing output location output/
[Mon Apr 9 18:11:39 2012] Checking for Bowtie index files
[Mon Apr 9 18:11:39 2012] Checking for reference FASTA file
[Mon Apr 9 18:11:39 2012] Checking for Bowtie
Bowtie version: 0.12.7.0
[Mon Apr 9 18:11:39 2012] Checking for Samtools
Samtools Version: 0.1.16
[Mon Apr 9 18:11:39 2012] Generating SAM header for /home/anurag.gautam/03_Genomes/Oryza_sativa_Indica/bowtie/osa
format: fastq
quality scale: phred33 (default)
[Mon Apr 9 18:11:39 2012] Reading known junctions from GTF file
Warning: TopHat did not find any junctions in GTF file
[Mon Apr 9 18:11:39 2012] Preparing reads
left reads: min. length=75, count=9884891
right reads: min. length=75, count=9873028
[Mon Apr 9 18:18:36 2012] Creating transcriptome data files..
[FAILED]
Error: gtf_to_fasta returned an error.

Please help with this error.
anurag.gautam is offline   Reply With Quote
Old 04-19-2012, 01:40 AM   #6
Annibal
Member
 
Location: Italy

Join Date: Mar 2012
Posts: 10
Default

Quote:
Originally Posted by Julien Roux View Post
Probably this is part of the problem since by default Tophat only allows 2 mismatches on the whole read. I had a similar problem when analyzing reads of 107bp. Switching to --bowtie-n mode might help since the mismatches are counted only in the seed region (28 first bp). But still, I found no way to increase the parameter "-e" of Bowtie from Tophat command line, and I suspect it might be too restrictive for long reads.
If you find a way to improve the alignment, please please keep us informed!
Don't know if it works, i haven't reviewed the code but i suppose you can trick the program editing the tophat file since it only calls the bowtie exec...
At line 717:
if option == "--bowtie-n":
self.bowtie_alignment_option = "-n"

Just replace the "-n" with "-e your_value -n" and when you run tophat with --bowtie-n it will invoke bowtie with -e yourvalue -n
Annibal is offline   Reply With Quote
Old 05-18-2012, 12:34 PM   #7
yingzhang
Junior Member
 
Location: Minneapolis

Join Date: Feb 2012
Posts: 9
Default

I will first check whether the gtf file is in the right format. Then I will check whether I have allocated enough memory for TopHat. My job once got killed at the exact step because it used much more memory than I specified.

Quote:
Originally Posted by anurag.gautam View Post
can anybody help with this error
I tried to map Illumina paired-end RNA seq reads of Rice to reference genome.
I ran the tophat with the following command:

/opt/tophat-1.4.1.Linux_x86_64/tophat -p 4 -o output -G osa.gtf /home/anurag.gautam/03_Genomes/Oryza_sativa_Indica/bowtie/osa SRR037735_1.fastq SRR037735_2.fastq
[Mon Apr 9 18:11:39 2012] Beginning TopHat run (v1.4.1)
-----------------------------------------------
[Mon Apr 9 18:11:39 2012] Preparing output location output/
[Mon Apr 9 18:11:39 2012] Checking for Bowtie index files
[Mon Apr 9 18:11:39 2012] Checking for reference FASTA file
[Mon Apr 9 18:11:39 2012] Checking for Bowtie
Bowtie version: 0.12.7.0
[Mon Apr 9 18:11:39 2012] Checking for Samtools
Samtools Version: 0.1.16
[Mon Apr 9 18:11:39 2012] Generating SAM header for /home/anurag.gautam/03_Genomes/Oryza_sativa_Indica/bowtie/osa
format: fastq
quality scale: phred33 (default)
[Mon Apr 9 18:11:39 2012] Reading known junctions from GTF file
Warning: TopHat did not find any junctions in GTF file
[Mon Apr 9 18:11:39 2012] Preparing reads
left reads: min. length=75, count=9884891
right reads: min. length=75, count=9873028
[Mon Apr 9 18:18:36 2012] Creating transcriptome data files..
[FAILED]
Error: gtf_to_fasta returned an error.

Please help with this error.
yingzhang is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO