Seqanswers Leaderboard Ad

**greggrant** · 10-13-2009, 06:47 AM

Originally posted by Cole Trapnell View Post

Hmm - that's a new one. What version of OS X are you running this on?

Mac OS X 10.5.5

2.8 GHz Quad-Core Intel Xeon

**greggrant** · 10-13-2009, 06:48 AM

Originally posted by NJD View Post

I was getting the same message with bowtie-0.11.2-bin-macos-10.5-x86_64.zip. Working from source and setting BITS=64 seems to be fine. Mac OS X 10.5.8.

Thanks for this. I hope not to have to work from source since I never have much luck doing that kind of thing.

**Ben Langmead** · 10-13-2009, 08:21 AM

Originally posted by greggrant View Post

Thanks for this. I hope not to have to work from source since I never have much luck doing that kind of thing.

Hi guys,

Please try the 10.4 package instead and let me know if that gives the same error. If 10.4 works fine, then I think I know what's wrong.

Thanks for the reports,
Ben

**greggrant** · 10-13-2009, 08:34 AM

Originally posted by Ben Langmead View Post

Hi guys,

Please try the 10.4 package instead and let me know if that gives the same error. If 10.4 works fine, then I think I know what's wrong.

Thanks for the reports,
Ben

There is no 10.4, the options are 10.0 and 10.1, then it goes to 11.2 and 11.3.

**Ben Langmead** · 10-13-2009, 09:34 AM

Originally posted by greggrant View Post

There is no 10.4, the options are 10.0 and 10.1, then it goes to 11.2 and 11.3.

Sorry, I was unclear. I meant the Mac OS 1.4 binary package. There is one for Bowtie version 0.11.2.

Ben

**greggrant** · 10-13-2009, 09:42 AM

Originally posted by Ben Langmead View Post

Hi guys,

Please try the 10.4 package instead and let me know if that gives the same error. If 10.4 works fine, then I think I know what's wrong.

Thanks for the reports,
Ben

OK I managed to compile 11.1 using BITS=64 as suggested, it compiled and ran without crashing. BUT the results are the same "Warning: junction database is empty!".

Here is the tophat_out directory:

404 Not Found

http://greg.grant.org/tophat_files2.tar.gz

This is really frustrating!

**Cole Trapnell** · 10-13-2009, 11:34 AM

I can't reproduce the issue you are seeing with just that tarball.

Can you post the small sample of reads you are using? The left_kept_reads.fq file contains reads that have different lengths etc. TopHat is really designed for Illumina reads - you can certainly use 454, but you'll need to trim them down to be all the same length.

**greggrant** · 10-13-2009, 12:26 PM

Originally posted by Cole Trapnell View Post

I can't reproduce the issue you are seeing with just that tarball.

Can you post the small sample of reads you are using? The left_kept_reads.fq file contains reads that have different lengths etc. TopHat is really designed for Illumina reads - you can certainly use 454, but you'll need to trim them down to be all the same length.

This file has the reads:

404 Not Found

http://greg.grant.org/reads.gz

**greggrant** · 10-13-2009, 01:28 PM

Originally posted by Cole Trapnell View Post

I can't reproduce the issue you are seeing with just that tarball.

Can you post the small sample of reads you are using? The left_kept_reads.fq file contains reads that have different lengths etc. TopHat is really designed for Illumina reads - you can certainly use 454, but you'll need to trim them down to be all the same length.

Thanks for your help, I split my reads to be all 50 reads in length. I didn't lose any sequence since I tiled longer reads with 50s. And this time it ran and did not report the Warning. However, the junctions file is empty. It can't be that there are no junctions. Here is a tarball of my input file and the tophat_out directory:

404 Not Found

http://greg.grant.org/tophat_stuff.tar.gz

Does this look right?

Thanks again for your help!

**Ben Langmead** · 10-13-2009, 01:58 PM

Originally posted by greggrant View Post

OK I managed to compile 11.1 using BITS=64 as suggested, it compiled and ran without crashing.

Hi guys,

For what it's worth, I believe that I have now fixed the problem with the Bowtie macos-10.5 binary packages from version 0.11.3. I replaced the bad packages with good ones up on sourceforge. Let me know if you have more problems like that.

Apologies,
Ben

**greggrant** · 10-13-2009, 02:32 PM

Originally posted by Ben Langmead View Post

Hi guys,

For what it's worth, I believe that I have now fixed the problem with the Bowtie macos-10.5 binary packages from version 0.11.3. I replaced the bad packages with good ones up on sourceforge. Let me know if you have more problems like that.

Apologies,
Ben

Thank you! Can you take a look at my file and see why it reports no junctions? There have to be some junctions in the transcripts. Thanks again for your help.

**Cole Trapnell** · 10-13-2009, 02:56 PM

I just ran the left_kept_reads.fq file from the above package - there are handful of reads that are 49 bp, which explains your result. When the reads are less than 50bp long, TopHat uses a coverage-island based algorithm to find junctions. When they are longer than that, TopHat starts using a split segment algorithm in addition to the the coverage-based approach. For reads 75bp or longer, TopHat disables the coverage-based algorithm (since it's slower and has a larger memory footprint), and uses only the split-segment algorithm. Since your file has those 49bp reads, TopHat is reverting only to the coverage-based algorithm, and since this input set is small, TopHat has a hard time identifying possible splice junctions.

I was able to get junctions by running the pipeline with --segment-length 24, to force TopHat to use both the coverage based search and the split-segment search.

You may want to make 75bp reads out of your reads if possible, as that should dramatically improve your junction sensitivity. If you do that, you may also want to explicitly pass --coverage-search to TopHat, to further improve sensitivity. You should also consider passing a GFF file of mouse annotations to TopHat.

**greggrant** · 10-13-2009, 06:28 PM

Originally posted by Cole Trapnell View Post

I just ran the left_kept_reads.fq file from the above package - there are handful of reads that are 49 bp, which explains your result. When the reads are less than 50bp long, TopHat uses a coverage-island based algorithm to find junctions. When they are longer than that, TopHat starts using a split segment algorithm in addition to the the coverage-based approach. For reads 75bp or longer, TopHat disables the coverage-based algorithm (since it's slower and has a larger memory footprint), and uses only the split-segment algorithm. Since your file has those 49bp reads, TopHat is reverting only to the coverage-based algorithm, and since this input set is small, TopHat has a hard time identifying possible splice junctions.

I was able to get junctions by running the pipeline with --segment-length 24, to force TopHat to use both the coverage based search and the split-segment search.

You may want to make 75bp reads out of your reads if possible, as that should dramatically improve your junction sensitivity. If you do that, you may also want to explicitly pass --coverage-search to TopHat, to further improve sensitivity. You should also consider passing a GFF file of mouse annotations to TopHat.

That helps a lot, I got it running and I reran it with read length 75 and --coverage-search. It returned 30 junctions, which seems low, I was expecting thousands. Is the power to find junctions usually that low with 100K 75 bp reads?

I tried to upload the bed and wig files output by tophat to the genome browser and it didn't like either of them, it gave the following errors:

> Error File 'junctions.bed' - Unrecognized format line 2 of custom track: gi|94389945|ref|NT_039515.6|Mm11_39555_37 19897107 19897630 JUNC00000001 1 + 19897107 19897630 255,0,0 2 47,28 0,495 (note: chrom names are case sensitive)

> Error File 'coverage.wig' - Unrecognized format type=bedGraph line 2 of custom track

Am I missing something?

Thanks again for your help!

**Cole Trapnell** · 10-13-2009, 07:00 PM

The track errors are easy to resolve - UCSC expects user-supplied tracks to have chromosome names that it knows about. This typically means "chr1", "chrX", etc. One way to guarantee compatibility between TopHat, Cufflinks, and UCSC is to map reads against a Bowtie index built from UCSC chromosomes. You can always convert them after each run, but this is annoying, IMO.

Also: I agree, the junction count is disturbingly low. In the data you sent me, approximately 60% of them were contiguously mappable by TopHat. I just went and ran a handful of the remaining unmappable ones through BLAT against mm9, and I didn't turn up any plausible spliced alignments, just a bunch of relatively low-identity hits to repeats, chrM, etc. Have you tried running the whole set through BLAT? It didn't seem like TopHat was missing junctions left and right, but then again, neither TopHat nor Bowtie are designed for 454 reads, so it's worth running an independent check.

I'm happy to look at this more with you, but we should probably take this offline at this point. Please email me directly if you want to continue looking at it together.

**greggrant** · 10-26-2009, 03:19 PM

Originally posted by Cole Trapnell View Post

Also: I agree, the junction count is disturbingly low.

Tophat doesn't look like it will work for me so I put together a plan to do it another way using bowtie. Then I found out that on my mac pro, I am getting an alignment speed of approximately 60 reads/hour. That's significantly less than the 25,000,000/hour that I expected. This run of one sequence against the m_musculus index that I downloaded from the bowtie site takes about 45 seconds. Here's the command I used. Any idea why it would take so long? I rebooted to make sure there was nothing else taxing the resources.

bowtie -c /Applications/bowtie-0.10.0/indexes/m_musculus GAAAGTCATGCGTTTCAAGTTTGGCAAGGAATAGAAACAGACGGGCTTATGAAAATAAGGAAAACATCACCCCCAGGCG

That sequence should have no spaces, I don't know why the forum inserts a space in the middle of it...

Thanks in advance for any suggestions.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News