SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. rebrendi Bioinformatics 11 06-22-2016 10:55 AM
bowtie index problem (bowtie-build and then bowtie-inspect) tgenahmet Bioinformatics 4 09-10-2013 12:51 PM
Bowtie with many entries (>) in index fredrik01 Bioinformatics 0 03-24-2011 12:28 AM
Bowtie Index Files zun Bioinformatics 0 10-21-2010 07:55 PM
Tophat and Bowtie Index Siva Bioinformatics 1 03-09-2010 02:44 PM

Reply
 
Thread Tools
Old 11-11-2009, 01:53 AM   #1
Ckornelius
Junior Member
 
Location: Berlin

Join Date: Nov 2009
Posts: 2
Default BOWTIE: index woes

Hi,

thanks for this great program.
I would like to align sequences against the h_sapiens_37_asm index and get the chromosome numbers included in the output.

so far when I align a sequence with:

bowtie -p 4 -t -a --best --strata -v 3 h_sapiens_asm -c AAAATATATTAAACGCAGCTAGAGAAGCTAGAGAGAAGGGGCAGG

I get:

0 + gi|89161218|ref|NC_000023.9|NC_000023 151228492 AAAATATATTAAACGCAGCTAGAGAAGCTAGAGAGAAGGGGCAGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0

So I get the strand orientation and the position of the read, however I don't find the chomosome number (X in this case). Is this because of the index I use, as with the h_sapiens index I face the same problem?
What would I need to do to get chrX instead of gi|89161218|ref|NC_000023.9|NC_000023??

Do I need to construct a new index with bowtie-build and what options would I need to provide as I could not find a reference to chromosome numbers?
So far i could only think of running the bowtie command again with the --concise option and then convert the two output files, but I think that wouldn't be a smart option. Thanks.

Cheers,
Ck.
Ckornelius is offline   Reply With Quote
Old 11-11-2009, 03:08 AM   #2
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Bowtie simply reports the text on the FastA header line of the sequences it indexes (possibly with some limit as to how long that line can be).

If you want the bowtie output to contain the chromosome name then you need to ensure that the fasta files you index have the chromosome name on their header line - preferably at the start.

So instead of having a chromosome file which starts like:

>gi|89161218|ref|NC_000023.9|NC_000023 Homo sapiens chromosome X, reference assembly, complete sequence
CTAACC....


You'd change it to something like:

>ChrX
CTAACC.....

..and then recreate your index files.
simonandrews is offline   Reply With Quote
Old 11-11-2009, 01:32 PM   #3
mbjohnson
Member
 
Location: Boston, MA

Join Date: Apr 2009
Posts: 15
Default

This has been covered on a few posts, the answer seeming to be to build your own index from the UCSC chr fasta files (ftp://hgdownload.cse.ucsc.edu/golden...8/chromosomes/) instead of using the pre-built.

I'm trying to do this now, but the bowtie-build process is taking forever, even on our university computing cluster. The first time I ran it, it's still going after a week, so I tried starting it again with more resources (quad-core node w/ i think 16GB ram) and it's still going after ~50 hours. Is this normal?! I ran it with the flags -f -r --ntoa and so far it's produced *.1.ebwt and *.2.ebwt (851MB and 1.5GB respectively) but the *.rev.1/2.ebwt files are still empty.

And does anyone who's generated this index want to share it? It seems like a lot of people want to use their results with UCSC genome browser and need this index to do so.

Thanks!
Matt
mbjohnson is offline   Reply With Quote
Old 11-12-2009, 12:47 AM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

I suspect it's the options or the bowtie version you're using which are slowing you down. We've processed several eukaryotic genome assemblies with bowtie-build, and whilst they weren't quick to build they did complete within about 5 hours on a pretty standard quad core server (I'm not even sure it used more than one core).

We ran the build with essentially no options, just taking the defaults and it all worked pretty well. We were using the x86_64 bowtie-build v0.10.1.

The generated indices are too big for us to share, but it shouldn't be too arduous for people to create their own indices if needed.
simonandrews is offline   Reply With Quote
Old 11-12-2009, 03:44 AM   #5
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 401
Default

I agree with Simon: in my experience bowtie-build with default parameters on a roughly 3Gb input fasta takes around 5-10 hours.

This is on fairly modest hardware - 64 bit 8 core server with 8Gb of RAM. Note that only one core gets used.

colin
colindaven is offline   Reply With Quote
Old 11-12-2009, 08:05 AM   #6
mbjohnson
Member
 
Location: Boston, MA

Join Date: Apr 2009
Posts: 15
Default

hmm...

bowtie-build version 0.11.3
64-bit
Compiler: gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)

My input sequence was a comma-separated list of all the files in the UCSC hg18 chromosomes directory, including the "random" and the "hap" files (altogether ~3.9Gb).

I guess I'll try again with no flags whatsoever...
mbjohnson is offline   Reply With Quote
Old 11-12-2009, 08:18 AM   #7
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Quote:
Originally Posted by mbjohnson View Post
I guess I'll try again with no flags whatsoever...
Yes, good idea. If the reference contains very long stretches of Ns, --ntoa can slow things down significantly. The reason is that bowtie-build essentially does a suffix sort, and when a very large chunk of those suffixes are identical at the beginning (AAAAAAAAAAA....), it's more work to decide which suffix has priority. If you leave them as Ns, bowtie-build will clip the reference around the Ns and exclude them from the sort, which should be faster.

Ben
Ben Langmead is offline   Reply With Quote
Old 11-12-2009, 01:32 PM   #8
mbjohnson
Member
 
Location: Boston, MA

Join Date: Apr 2009
Posts: 15
Default

Thanks for the explanation, Ben! Just leaving off the ntoa option sped it right up.
mbjohnson is offline   Reply With Quote
Old 11-13-2009, 01:28 PM   #9
Ckornelius
Junior Member
 
Location: Berlin

Join Date: Nov 2009
Posts: 2
Default

Thanks for the help
I started to build my own index with bowtie-build and it works now. Thought I could do it on my notebook but as the run took 2 days, I think that 2 Gb RAM (32 bit opensuse 11.1) are a bit too less.
Ckornelius is offline   Reply With Quote
Old 01-19-2010, 09:04 AM   #10
fnovo
Junior Member
 
Location: spain

Join Date: Jan 2010
Posts: 1
Default

Hi, you can see the chr number in the refseq IDs. In your example:

0 + gi|89161218|ref|NC_000023.9|NC_000023 151228492

"NC_000023" actually points you to chrX (you have NC_000001 for chr 1, NC_000002 for chr2 and so on; they assigned NC_000023 to chrX).

Hope that helps,
fnovo is offline   Reply With Quote
Old 10-28-2010, 09:01 AM   #11
emucaki
Member
 
Location: .

Join Date: Apr 2009
Posts: 12
Default

Gonna bump for this question:

When using hg18/19, the .fa file headers are like this

>chr1
CTAACC.....
>chr2
CTAACC.....

Is it possible to get bowtie/tophat to recognize a header that has positional information like so?

>chr1_1111111_1111333
CTAACC...
>chr1_2222222_2222333
CTAACC...

To save time I want to align to just a handful of genes (some on the same chromosome). If the headers are like this, would bowtie/tophat alignment take positional data into account when building the wig files? Or do I have to change my fasta file to include NNNNNNNNNNNN... for every position I don't want?
emucaki is offline   Reply With Quote
Old 11-08-2010, 05:07 AM   #12
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

I assume this is a question more about TopHat than Bowtie, since Bowtie doesn't build WIGs. You're probably better off re-posting this question with "TopHat" in the subject, or making a feature request on the TopHat site.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:22 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO