SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TopHat Error: Could not find Bowtie index files /bowtie-0.12.5/indexes/. rebrendi Bioinformatics 11 06-22-2016 09:55 AM
bowtie index problem (bowtie-build and then bowtie-inspect) tgenahmet Bioinformatics 4 09-10-2013 11:51 AM
New dual index Nextera TruSeq adapter sequences? koadman Illumina/Solexa 3 08-29-2012 05:17 PM
Getting a list of all index sequences Mouth_Breather Illumina/Solexa 3 07-12-2012 02:26 AM
reverse index for bowtie jay2008 Bioinformatics 0 06-05-2012 04:11 AM

Reply
 
Thread Tools
Old 10-28-2013, 06:28 AM   #1
shawn.mek
Member
 
Location: Colorado

Join Date: Feb 2013
Posts: 12
Question Use Bowtie Index to get sequences using locations

We have the fasta files (obviously) for the hg19 genome, we used them to create a big Bowtie index.

I was hoping not to have to keep the fasta file. Instead just look up sequences in the Bowtie index when I get chromosome locations.

I know when the alignment comes back it tells me where the alignment occurs and which fasta record (header) that it came from. So all the info is there, but I can't figure out how to pull out a sequence given a location.

Does anyone know if this is possible, or know much about the index format (perhaps I could write a little program to fish out a sequence)?


Thanks
shawn.mek is offline   Reply With Quote
Old 10-28-2013, 06:42 AM   #2
winsettz
Member
 
Location: US

Join Date: Sep 2012
Posts: 91
Default

Quote:
Originally Posted by shawn.mek View Post
We have the fasta files (obviously) for the hg19 genome, we used them to create a big Bowtie index.

I was hoping not to have to keep the fasta file. Instead just look up sequences in the Bowtie index when I get chromosome locations.

I know when the alignment comes back it tells me where the alignment occurs and which fasta record (header) that it came from. So all the info is there, but I can't figure out how to pull out a sequence given a location.

Does anyone know if this is possible, or know much about the index format (perhaps I could write a little program to fish out a sequence)?


Thanks
You should be able to extract that information from the sam output. I've not used bowtie2-inspect before, but it could be what you are looking for.

Code:
bowtie2-inspect
No index name given!
Bowtie 2 version 2.1.0 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-inspect [options]* <bt2_base>
  <bt2_base>         bt2 filename minus trailing .1.bt2/.2.bt2

  By default, prints FASTA records of the indexed nucleotide sequences to
  standard out.  With -n, just prints names.  With -s, just prints a summary of
  the index parameters and sequences.  With -e, preserves colors if applicable.

Options:
  -a/--across <int>  Number of characters across in FASTA output (default: 60)
  -n/--names         Print reference sequence names only
  -s/--summary       Print summary incl. ref names, lengths, index properties
  -e/--bt2-ref      Reconstruct reference from .bt2 (slow, preserves colors)
  -v/--verbose       Verbose output (for debugging)
  -h/--help          print detailed description of tool and its options
  --help             print this usage message
winsettz is offline   Reply With Quote
Old 10-28-2013, 07:33 AM   #3
shawn.mek
Member
 
Location: Colorado

Join Date: Feb 2013
Posts: 12
Default

Just to clarify, I mean using the index - giving it a chromosome name (fasta header) and location numbers, and getting back a sequence.

I don't want to run an alignment, just pull out the sequence. So no SAM output.

For this I'm using bowtie, not bowtie2. But of bowtie2 can do this...

Thanks
shawn.mek is offline   Reply With Quote
Old 10-30-2013, 10:20 PM   #4
shawn.mek
Member
 
Location: Colorado

Join Date: Feb 2013
Posts: 12
Default

The bowtie-inspect thing does get all the info out, but thats 3gb of info since I can't select a location
shawn.mek is offline   Reply With Quote
Old 10-31-2013, 10:18 AM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Although bowtie index essentially keeps the genome, I doubt it is optimized or designed for your purpose. Use faidx if you only want to retrieve a few regions.
lh3 is offline   Reply With Quote
Old 10-31-2013, 11:19 AM   #6
shawn.mek
Member
 
Location: Colorado

Join Date: Feb 2013
Posts: 12
Default

I want to retrieve lots of regions efficiently, but thanks for pointing me to faidx, I'll see how it works.
shawn.mek is offline   Reply With Quote
Old 10-31-2013, 12:23 PM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If you really have a LOT of positions, then it's best to read the genome into memory. samtools faidx is great for a smallish number of sites, but it grabs the sequence from disk, making it a bit slow for a large number of queries.
dpryan is offline   Reply With Quote
Old 10-31-2013, 01:32 PM   #8
shawn.mek
Member
 
Location: Colorado

Join Date: Feb 2013
Posts: 12
Default

yeah, I'm torn on holding it in memory or not. Toy with different workflows
shawn.mek is offline   Reply With Quote
Old 10-31-2013, 03:40 PM   #9
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Although bowtie index essentially keeps the genome, I doubt it is optimized or designed for your purpose.
The bowtie index is optimised for searching, but it's an overkill (and inefficient) for getting subsequences. If you want compressed indexed storage for just DNA sequence retrieval, then the 2bit format is probably best:

http://genome.ucsc.edu/FAQ/FAQformat.html#format7

The code points to a way to retrieve ranges:

http://genome-source.cse.ucsc.edu/gi...oBit.h;hb=HEAD
Code:
/* Parse a .2bit file and sequence spec into an object.
 * The spec is a string in the form:
 *
 *    file/path/input.2bit[:seqSpec1][,seqSpec2,...]
 *
 * where seqSpec is either
 *     seqName
 *  or
 *     seqName:start-end
So there's probably a program somewhere for getting subsequences out of that file using seqName:start-end notation.

edit: indeed, BLAT has such functions included. See here for a bit of discussion about 2bit retrieval using Perl:

http://www.perlmonks.org/?node_id=672251

Last edited by gringer; 10-31-2013 at 03:43 PM.
gringer is offline   Reply With Quote
Old 10-31-2013, 03:55 PM   #10
ctseto
Member
 
Location: SE MN

Join Date: Oct 2013
Posts: 44
Default

http://seqanswers.com/forums/showthr...highlight=2bit

If you just need to retrieve known regions.
ctseto is offline   Reply With Quote
Reply

Tags
bowtie, bowtie index, chromosome, genome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO