SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sequencing for PCR products of different sizes deepakpatilp Illumina/Solexa 0 10-25-2010 07:15 AM
HiSeq 2000 File Sizes E_Klee Illumina/Solexa 3 09-28-2010 11:16 AM
insert sizes nozzer Bioinformatics 1 07-09-2010 05:49 AM
Maq - sol2sanger problem - different sizes for the pair? cliff Bioinformatics 20 01-13-2010 04:48 AM
Are Solexa GAPipeline v.1.0 ELAND results realignable with new v.1.4 ELAND module? marlei Bioinformatics 1 10-15-2009 05:51 AM

Reply
 
Thread Tools
Old 02-12-2009, 11:40 AM   #1
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default Eland for different sizes

We are working with sequences that have different lengths... 32, 31, 30 etc in one run. Does anyone know how to run Eland to allow for different lengths or do I have to run multiple times for each specified length?
doxologist is offline   Reply With Quote
Old 02-12-2009, 07:33 PM   #2
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

I think the basic criteria for eland is the query sequences should be of same length in the first place.
seq_GA is offline   Reply With Quote
Old 02-13-2009, 06:34 AM   #3
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

thanks... that's what I thought... just wanted to know if someone had a better hackish way of doing different lengths... than what I have now of running it multiple times.
doxologist is offline   Reply With Quote
Old 02-13-2009, 08:50 AM   #4
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

We used to do that - run multiple times. Back when we ran eland, it used to require a separate compiled version for each sequence length, although that has probably changed in the meantime, so at that point there was no way it could be done.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 02-13-2009, 09:05 AM   #5
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

so from your answer, you no longer do that?

how do you deal with different lengths now?

currently, we do a combination of multiple Eland runs and Bowtie... but for some things, Eland output is more suited for our needs.
doxologist is offline   Reply With Quote
Old 02-13-2009, 09:14 AM   #6
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

You're right - we don't do that anymore. We used to do it because the tail end of eland reads was more error prone, so we'd sequentially strip each fragment down by 2 bases at a time, and then take the longest that would align. Now, we use aligners that take the quality into account, and can do more mismatches, so it's really unnecessary.

From my own personal experience with bowtie, I found it to be unreliable - it missed a lot of alignments, and Eland just wasn't flexible enough, so we've mainly moved to maq. (with a few other aligners fitting in to particular roles that they do well.) I understand that the author of maq is now moving to a new aligner (bwa), which uses the same burrows-wheeler algorithm as bowtie, but should be a better implementation. I haven't tried it yet, but you might want to check that out.

As for sequences of different lengths, I think maq should be able to handle those without any fancy tricks. However, if you want to do Eland, my best guess would be to write a parser that separates the reads (and the supporting files, if you use them) into independent files, which each have their own read-length, and then align each one individually. Personally, that sounds like a rather painful way to do it...

Good luck.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 02-13-2009, 09:17 AM   #7
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

thanks... that's lot of good info.

about the Eland with different lengths... yeah... the parser painful method is what we do now.

the new bwa sounds exciting...
in terms of bowtie missing A LOT of alignments... what do you mean by that?
doxologist is offline   Reply With Quote
Old 02-13-2009, 09:25 AM   #8
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Quote:
Originally Posted by doxologist View Post
in terms of bowtie missing A LOT of alignments... what do you mean by that?
I ran the same input files over the same scaffolds with bowtie and with maq, and got MANY fewer hits with bowtie. It may have had to do with the organism only having a draft reference or something, or the sheer number of scaffolds it had, but whatever it was, the number of hits dropped dramatically.

Anecdotally, I've heard the same thing from other people who've used it as well. I don't want to suggest it's a bad application - it's a giant leap forward in many respects, but I suspect the implementation needs some fine tuning.
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 02-13-2009, 09:33 AM   #9
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

hmm... interesting. I've asked the question at the Bowtie thread... we'll see if other people have similar experiences.

about Bowtie being a huge leap.. no doubt. if imitation is the best compliment... it's getting a large number of compliments... SOAP2, bwa, etc.
doxologist is offline   Reply With Quote
Old 02-13-2009, 09:42 AM   #10
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

I'll be very interested to hear what people have to say on the issue. I'm a big fan of bowtie, so if it turns out I was just doing something wrong, I'd love to know.

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 02-13-2009, 10:21 AM   #11
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

This is what Ben responded:
The circumstances under which Bowtie might miss alignments that are "valid" according to its alignment policy are outlined in the manual (see last paragraph of section "Maq-like Policy"). These misses only occur in -n 2 and -n 3 modes, and they can be avoided by increasing the --maxbts parameter (at the cost of some speed). Unless your read data is very low quality, the fraction of reads missed due to the backtracking limit in -n 2 mode is generally very small (<1%).

Note that when you run 'maq' with -n 2 option (the default), it will find some alignments that actually have 3 mismatches in the seed. Bowtie will *not* report alignments with 3 mismatches in the seed unless -n 3 is specified. It's likely that this is the source of the difference that the anecdotal reports are referring to.
doxologist is offline   Reply With Quote
Old 02-13-2009, 12:33 PM   #12
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Thanks - that doesn't seem to describe what I saw, or what was reported to me, but, as I said, it's quite possible that I did something wrong when I used it.

Edit: I should also add that I have been doing a lot of Paired End Tag sequencing lately, which is not yet supported by Bowtie - so there were other reasons for staying with Maq. I don't want to make it sound like this was the only reason we didn't stick with Bowtie.
__________________
The more you know, the more you know you don't know. —Aristotle

Last edited by apfejes; 02-13-2009 at 12:35 PM. Reason: For clarity.
apfejes is offline   Reply With Quote
Old 02-14-2009, 07:23 AM   #13
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

I agree... I think the added functionality and versatility are the main reasons that people are sticking with MAQ. I remember reading Heng's comment somewhere that he wanted to emphasize that. I'm excited about how the SAM format would hopefully makes programs more able to talk to each other and more functional.
doxologist is offline   Reply With Quote
Old 02-14-2009, 07:59 AM   #14
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

I'm also excited about the SAM format, as well, although the prospect of implementing it in Java is quite terrifying - particularly with the claim in the SAMTools manual that explicitly states that Java doesn't support multipart gzip files...

I'm sure it'll be a challenge!
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 02-14-2009, 08:39 AM   #15
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Broad will upload SAMtools Java APIs in a few days.
lh3 is offline   Reply With Quote
Old 02-14-2009, 09:34 AM   #16
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Fantastic.. do you know if will it be GPL compatible?
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO