SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 02:06 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 07:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 09:59 AM
New Short Read Aligner sparks Bioinformatics 48 08-26-2009 08:01 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 07:10 PM

Reply
 
Thread Tools
Old 03-11-2009, 07:57 AM   #81
What_Da_Seq
Member
 
Location: RTP

Join Date: Jul 2008
Posts: 28
Default How does Bowtie handle ambiguous bases in the refgenome

Does anybody have experience in preparing a Bowtie search index where certain bases have been modified with ambiguous bases like "Y" which stands for "C" or "T" and if so will these locations be called matches or missmatches if the to be aligned Solexa read has either a "C" or a "T" at that position.

Thanks
What_Da_Seq is offline   Reply With Quote
Old 03-11-2009, 08:10 AM   #82
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

The Bowtie indexing step elides stretches of ambiguous bases in the reference. As a result, alignments that overlap an ambiguous base in the reference are never considered "valid" by Bowtie and will not be reported.

This is explained in a couple of paragraphs in the manual that are new as of 0.9.9.1:

Quote:
A result of Bowtie's indexing strategy is that alignments involving one or more ambiguous reference characters (N, -, R Y, etc.) are considered invalid by Bowtie, regardless of the alignment policy. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy.

Also, alignments that "fall off" the reference sequence are not considered legal by Bowtie, though some such alignments will become legal once gapped alignment is implemented.
Ben Langmead is offline   Reply With Quote
Old 03-11-2009, 12:39 PM   #83
What_Da_Seq
Member
 
Location: RTP

Join Date: Jul 2008
Posts: 28
Default

Thanks Ben. I could not identify an option for "bowtie-build" that is geared towards maximum efficiency (not speed nor memory efficiency) in generating alignments (least amount of non-aligned reads) in the Bowtie alignment.
Your help is appreciated.

Thanks
What_Da_Seq is offline   Reply With Quote
Old 03-11-2009, 12:46 PM   #84
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Yes, all bowtie-build options are identical in terms of the index's ability to generate alignments (except those that have slight, non-specifics effect like --ntoa or --oldpmap).
Ben Langmead is offline   Reply With Quote
Old 03-11-2009, 10:48 PM   #85
tniranj1
Junior Member
 
Location: Baltimore

Join Date: Mar 2009
Posts: 4
Default Help with installation

I'm new to next-gen sequencing and have started playing around with different alignment tools for data that will soon be coming in to my lab. From what I've heard, Bowtie sounds perfect, and I appreciate the speedy feedback that's been made available to the community.
I do have a slight installation problem. I get the following error during "Make".

SeqAn-1.1/seqan/basic/basic_generated_forwards.h:507: error: parse error before numeric constant
SeqAn-1.1/seqan/basic/basic_generated_forwards.h:761: confused by earlier errors, bailing out
make: *** [bowtie-build] Error 1

I installed the platform-independent version on my Mac (OS 10.3.9... yes, it's old I know, we're upgrading soon). Appreciate any help with resolving this.

-TiN
tniranj1 is offline   Reply With Quote
Old 03-12-2009, 04:30 AM   #86
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

What version of g++ do you have (try 'g++ -v') and what version of Bowtie are you trying to compile? Is there another g++ version installed besides the default? I'm not familiar with 10.3, but you can try running g++3 and g++4 and see if either of those work.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 03-12-2009, 09:07 AM   #87
tniranj1
Junior Member
 
Location: Baltimore

Join Date: Mar 2009
Posts: 4
Default

I'm using gcc version 3.3 with bowtie 0.9.9.1. Do I need version 4 or higher for g++ in order for installation of bowtie to work, or is 3 sufficient?
Thanks,
-TiN
tniranj1 is offline   Reply With Quote
Old 03-12-2009, 09:18 AM   #88
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Well, the oldest g++ I've used is 3.4.6, which works without warnings. I just tried 3.2.3 and got a bunch of warnings and errors; mostly in the SeqAn headers. So, yes, if you happen to have a newer g++ version somewhere on your machine then please try that. E.g., try typing g++ then hitting tab to see if there's something called g++4 or g++34 or similar. If there is something called g++34, for example, then make bowtie using 'make GCC_SUFFIX=34'. Let me know if that doesn't work; I can try to fix this in a future version of Bowtie.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 03-12-2009, 05:17 PM   #89
tniranj1
Junior Member
 
Location: Baltimore

Join Date: Mar 2009
Posts: 4
Default

I just installed gcc 3.4.6 and changed the etc/profile $PATH to reflect the update. When I ran make again, significantly more SeqAn-1.1 errors popped up (too much to post). There is no suffix to the new g++ file. Should I shoot for gcc4.x or would it be more appropriate to wait until our Leopard computer comes in... I would prefer to start testing with this computer now, though.
Really appreciate the help!
-TiN
tniranj1 is offline   Reply With Quote
Old 03-12-2009, 05:23 PM   #90
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Darn! Sorry to waste your time.

I can testify that the gcc4 and gcc346 versions on Leopard (from the developer's tools) work fine for me, as do the various gcc4 versions I've tried on Linux. I'm sorry that that 3.4.6 doesn't seem to be working under 10.3. I will add it to my TODO list to address some of that problematic SeqAn code before the next release. In the meantime, since 3.4.6 didn't work, waiting for your Leopard computer is the option that has the least chance of wasting more of your time.

Ben
Ben Langmead is offline   Reply With Quote
Old 03-12-2009, 05:28 PM   #91
tniranj1
Junior Member
 
Location: Baltimore

Join Date: Mar 2009
Posts: 4
Default

You're probably right... I'm using for too old an operating system. Hopefully I will have results-oriented questions for you in the future, rathe than technical installation stuff. ;-)
Thanks again,
-TiN
tniranj1 is offline   Reply With Quote
Old 03-13-2009, 07:45 AM   #92
SillyPoint
Member
 
Location: Frederick MD, USA

Join Date: May 2008
Posts: 39
Default

Ben, your answer to What Da Seq's question a couple of days ago about ambiguity characters was:

Quote:
A result of Bowtie's indexing strategy is that alignments involving one or more ambiguous reference characters (N, -, R Y, etc.) are considered invalid by Bowtie, regardless of the alignment policy. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy.
Do I interpret "legal, subject to the alignment policy" to mean they are accepted, but counted as mismatches subject to the -n limit?

Thanks,

--SP
SillyPoint is offline   Reply With Quote
Old 03-13-2009, 07:53 AM   #93
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Yes, that's correct. An ambiguous characters in the read is "charged" as a mismatch, which can affect whether the alignment is legal according to the alignment policy.
Ben Langmead is offline   Reply With Quote
Old 04-06-2009, 10:42 AM   #94
ieuanclay
Member
 
Location: Basel, Switzerland

Join Date: Feb 2009
Posts: 27
Default

Hi Ben,

Thanks for your advice earlier, post-processing of results is now working well!

However I have found that using --all isn't a problem, until i specify --nostrata as well which causes me to rapidly run out of memory (>32Gb) and get the std::bad_alloc error I think a few people mentioned earlier. I had built indices with smaller -o, but even using a high offrate, small footprint index has the same problem. Any suggestions (other than not using --nostrata...)?

THanks again,

Ieuan
ieuanclay is offline   Reply With Quote
Old 04-06-2009, 11:59 AM   #95
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi Ieuan,

Can you tell me which Bowtie version/index/reads/arguments you're using? Also, could you give the same experiment a try with version 0.9.9.2 (just released)? I'll take a look.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 04-07-2009, 01:48 AM   #96
ieuanclay
Member
 
Location: Basel, Switzerland

Join Date: Feb 2009
Posts: 27
Default

Hi Ben,

I have seen this in 0.9.9 and 0.9.9.1 (x64), m.musculus ncbi36 and 37 indices (offrates 2,3,4,5). Args were -q --solexa-quals -a --unfq ... -p 2/3/6 . Input was only ~1/2 Mb of fastq reads! Worrying because the real input will be >2Gb! In all cases the combination of -a and --nostrata seemed to be causing the problem, because with only -a the footprint was as expected.

I will try 0.9.9.2 today and get back to you - i checked yesterday and noticed you had improved the --best behaviour (thanks!) so i'll try it with that too.

Thanks again,

Ieuan

## update ##
0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!

Last edited by ieuanclay; 04-07-2009 at 06:32 AM. Reason: update
ieuanclay is offline   Reply With Quote
Old 04-08-2009, 09:30 AM   #97
thondeboer
Member
 
Location: Bay Area

Join Date: Jan 2009
Posts: 24
Default

Hi Ben,

Complete Genomics here....
Have you tried to use our gapped read structure yet with Bowtie? As you may know, we have quite an unusual read structure so most mapping software is not able to use this effectively and we have build our own, but our customers would probably want to use other mapping software as well if only to compare our mapping to theirs...

The data is available in the SRA under number SRA008092

ftp://ftp.ncbi.nlm.nih.gov/sra/Submi...008/SRA008092/

You can also get a sample data set which is part of the API we have released.

http://www.completegenomics.com/developer/default.aspx

We are considering changing to the SAM/BAM format as the export of our mapping data...Are you considering supporting SAM/BAM as an output format as well?

Thanks!

Thon
thondeboer is offline   Reply With Quote
Old 04-08-2009, 01:26 PM   #98
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hey Thon,

We haven't tried implementing gapped alignment yet, though tools like BWA and SOAP2 show it's doable in this framework. Can you describe the "unusual read structure"?

Yes, we would certainly like to support SAM/BAM output eventually. It's on the TODO list!

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Old 04-08-2009, 02:30 PM   #99
thondeboer
Member
 
Location: Bay Area

Join Date: Jan 2009
Posts: 24
Default

Hi Ben,

You can read more on our read structure on our website and on this forum as well:

http://seqanswers.com/forums/showthread.php?t=1307

http://www.completegenomics.com/page...ologyPaper.pdf

But basically we have a gapped read structure of 5 + 10 + 10 + 10 (times two) bases.
The first gap is "negative" that is, has overlap between the 5 and 10 base reads.
The other gaps are positive, that is, gaps in the more classical sense.

You won't know the negative gap value (it can vary from 1 to 3 overlaps) unless you map the data (or unless there is only one way to overlap) onto the reference genome.

Good to hear you are in support of SAM/BAM. We are considering this as our export format as well...

Thon
Complete Genomics
thondeboer is offline   Reply With Quote
Old 04-09-2009, 02:16 PM   #100
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi Ieuan,

Quote:
Originally Posted by ieuanclay View Post
0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!
--best mode got an overhaul in 0.9.9.2 such that --best now conducts a best-first search, rather than a depth-first search with buffering and flushing of results, as before. My suspicion is that the old approach was, for some reads, buffering a huge number of results and exhausting memory. I'll take a harder look, though.

Thanks,
Ben
Ben Langmead is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO