Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by NJD View PostI was getting the same message with bowtie-0.11.2-bin-macos-10.5-x86_64.zip. Working from source and setting BITS=64 seems to be fine. Mac OS X 10.5.8.
Comment
-
Originally posted by greggrant View PostThanks for this. I hope not to have to work from source since I never have much luck doing that kind of thing.
Please try the 10.4 package instead and let me know if that gives the same error. If 10.4 works fine, then I think I know what's wrong.
Thanks for the reports,
Ben
Comment
-
Originally posted by Ben Langmead View PostHi guys,
Please try the 10.4 package instead and let me know if that gives the same error. If 10.4 works fine, then I think I know what's wrong.
Thanks for the reports,
Ben
Comment
-
Originally posted by Ben Langmead View PostHi guys,
Please try the 10.4 package instead and let me know if that gives the same error. If 10.4 works fine, then I think I know what's wrong.
Thanks for the reports,
Ben
Here is the tophat_out directory:
This is really frustrating!
Comment
-
I can't reproduce the issue you are seeing with just that tarball.
Can you post the small sample of reads you are using? The left_kept_reads.fq file contains reads that have different lengths etc. TopHat is really designed for Illumina reads - you can certainly use 454, but you'll need to trim them down to be all the same length.
Comment
-
Originally posted by Cole Trapnell View PostI can't reproduce the issue you are seeing with just that tarball.
Can you post the small sample of reads you are using? The left_kept_reads.fq file contains reads that have different lengths etc. TopHat is really designed for Illumina reads - you can certainly use 454, but you'll need to trim them down to be all the same length.
Comment
-
Originally posted by Cole Trapnell View PostI can't reproduce the issue you are seeing with just that tarball.
Can you post the small sample of reads you are using? The left_kept_reads.fq file contains reads that have different lengths etc. TopHat is really designed for Illumina reads - you can certainly use 454, but you'll need to trim them down to be all the same length.
Does this look right?
Thanks again for your help!
Comment
-
Originally posted by greggrant View PostOK I managed to compile 11.1 using BITS=64 as suggested, it compiled and ran without crashing.
For what it's worth, I believe that I have now fixed the problem with the Bowtie macos-10.5 binary packages from version 0.11.3. I replaced the bad packages with good ones up on sourceforge. Let me know if you have more problems like that.
Apologies,
Ben
Comment
-
Originally posted by Ben Langmead View PostHi guys,
For what it's worth, I believe that I have now fixed the problem with the Bowtie macos-10.5 binary packages from version 0.11.3. I replaced the bad packages with good ones up on sourceforge. Let me know if you have more problems like that.
Apologies,
Ben
Comment
-
I just ran the left_kept_reads.fq file from the above package - there are handful of reads that are 49 bp, which explains your result. When the reads are less than 50bp long, TopHat uses a coverage-island based algorithm to find junctions. When they are longer than that, TopHat starts using a split segment algorithm in addition to the the coverage-based approach. For reads 75bp or longer, TopHat disables the coverage-based algorithm (since it's slower and has a larger memory footprint), and uses only the split-segment algorithm. Since your file has those 49bp reads, TopHat is reverting only to the coverage-based algorithm, and since this input set is small, TopHat has a hard time identifying possible splice junctions.
I was able to get junctions by running the pipeline with --segment-length 24, to force TopHat to use both the coverage based search and the split-segment search.
You may want to make 75bp reads out of your reads if possible, as that should dramatically improve your junction sensitivity. If you do that, you may also want to explicitly pass --coverage-search to TopHat, to further improve sensitivity. You should also consider passing a GFF file of mouse annotations to TopHat.
Comment
-
Originally posted by Cole Trapnell View PostI just ran the left_kept_reads.fq file from the above package - there are handful of reads that are 49 bp, which explains your result. When the reads are less than 50bp long, TopHat uses a coverage-island based algorithm to find junctions. When they are longer than that, TopHat starts using a split segment algorithm in addition to the the coverage-based approach. For reads 75bp or longer, TopHat disables the coverage-based algorithm (since it's slower and has a larger memory footprint), and uses only the split-segment algorithm. Since your file has those 49bp reads, TopHat is reverting only to the coverage-based algorithm, and since this input set is small, TopHat has a hard time identifying possible splice junctions.
I was able to get junctions by running the pipeline with --segment-length 24, to force TopHat to use both the coverage based search and the split-segment search.
You may want to make 75bp reads out of your reads if possible, as that should dramatically improve your junction sensitivity. If you do that, you may also want to explicitly pass --coverage-search to TopHat, to further improve sensitivity. You should also consider passing a GFF file of mouse annotations to TopHat.
I tried to upload the bed and wig files output by tophat to the genome browser and it didn't like either of them, it gave the following errors:
> Error File 'junctions.bed' - Unrecognized format line 2 of custom track: gi|94389945|ref|NT_039515.6|Mm11_39555_37 19897107 19897630 JUNC00000001 1 + 19897107 19897630 255,0,0 2 47,28 0,495 (note: chrom names are case sensitive)
> Error File 'coverage.wig' - Unrecognized format type=bedGraph line 2 of custom track
Am I missing something?
Thanks again for your help!
Comment
-
The track errors are easy to resolve - UCSC expects user-supplied tracks to have chromosome names that it knows about. This typically means "chr1", "chrX", etc. One way to guarantee compatibility between TopHat, Cufflinks, and UCSC is to map reads against a Bowtie index built from UCSC chromosomes. You can always convert them after each run, but this is annoying, IMO.
Also: I agree, the junction count is disturbingly low. In the data you sent me, approximately 60% of them were contiguously mappable by TopHat. I just went and ran a handful of the remaining unmappable ones through BLAT against mm9, and I didn't turn up any plausible spliced alignments, just a bunch of relatively low-identity hits to repeats, chrM, etc. Have you tried running the whole set through BLAT? It didn't seem like TopHat was missing junctions left and right, but then again, neither TopHat nor Bowtie are designed for 454 reads, so it's worth running an independent check.
I'm happy to look at this more with you, but we should probably take this offline at this point. Please email me directly if you want to continue looking at it together.
Comment
-
Originally posted by Cole Trapnell View PostAlso: I agree, the junction count is disturbingly low.
bowtie -c /Applications/bowtie-0.10.0/indexes/m_musculus GAAAGTCATGCGTTTCAAGTTTGGCAAGGAATAGAAACAGACGGGCTTATGAAAATAAGGAAAACATCACCCCCAGGCG
That sequence should have no spaces, I don't know why the forum inserts a space in the middle of it...
Thanks in advance for any suggestions.
Comment
Latest Articles
Collapse
-
by seqadmin
Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...-
Channel: Articles
12-16-2024, 07:57 AM -
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-17-2024, 10:28 AM
|
0 responses
33 views
0 likes
|
Last Post
by seqadmin
12-17-2024, 10:28 AM
|
||
Started by seqadmin, 12-13-2024, 08:24 AM
|
0 responses
49 views
0 likes
|
Last Post
by seqadmin
12-13-2024, 08:24 AM
|
||
Started by seqadmin, 12-12-2024, 07:41 AM
|
0 responses
34 views
0 likes
|
Last Post
by seqadmin
12-12-2024, 07:41 AM
|
||
Started by seqadmin, 12-11-2024, 07:45 AM
|
0 responses
46 views
0 likes
|
Last Post
by seqadmin
12-11-2024, 07:45 AM
|
Comment