SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Input files for Roche GSMapper cow_girl Bioinformatics 4 11-11-2014 01:04 PM
GSMapper trimming Peitx Bioinformatics 6 10-10-2011 12:23 PM
Roche gsMapper output exon contigs rather than full-length sequence? sulicon Bioinformatics 0 02-28-2011 05:51 PM
gsMapper contigs haonmada 454 Pyrosequencing 1 01-22-2010 12:25 PM
gsMapper issues mjleaks 454 Pyrosequencing 1 05-12-2009 07:13 AM

Reply
 
Thread Tools
Old 01-13-2009, 05:34 AM   #1
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default Roche's gsMapper

Hello

Has anyone here ever changed the parameters used by gsMapper when mapping their read data to a reference genome? If so, can anyone elaborate on what "minimum overlap length" and "alignment identity score" means? (the definition in the manual is far too little)

Cheers

Layla
Layla is offline   Reply With Quote
Old 01-13-2009, 03:20 PM   #2
hlu
Member
 
Location: Branford, Connecticut

Join Date: Jan 2009
Posts: 32
Default

I have not modified the default setting in gsMapper running.

gsMapper algorithm is similar to other assembly software (phrap), using the similar concept of "overlap" between reads to obtain contigs.

The difference is that 454 gsMapper is all based on raw flow space. Therefore, the scores, the length I believe is on flow space.

For example, minimum overlap length, default value is 40 based on Manual. I believe 40 means 40 flows, not 40 bases. 40 flows is roughly between 16bp to 20 bp.

You can play with the value, but I doubt that you can get any real difference in result.
hlu is offline   Reply With Quote
Old 09-14-2009, 12:40 AM   #3
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 266
Default

I don't think this is true. I think It's 40 bases not 40 flows. IIRC (not that it's in the manual), flowspace is only used in calling the consensus *after* mapping the reads (in sequence space).

I could be wrong. It's a shame its not easy to find these things out.

Also, I think these settings should have a big effect on the result. 'Seed size' is a trade off between sensitivity and running time. The bigger the seed size, the quicker the running time, but the more 'nearly perfect' hits you will miss. The lower the seed size, the higher the sensitivity, but the specificity dramatically reduces at some point, so many false matches need to be inspected at later stages of the mapping.
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.

Last edited by dan; 09-14-2009 at 12:44 AM. Reason: Responding to the second point too.
dan is offline   Reply With Quote
Old 09-15-2009, 12:32 AM   #4
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

We used some different values for "minimum length" and "minimum identity": -ml 90% -mi 96% to get more reliable variation detection in areas with lower coverage.
Tuxido is offline   Reply With Quote
Old 09-15-2009, 12:05 PM   #5
AlexB
Member
 
Location: Netherlands

Join Date: Sep 2009
Posts: 18
Wink

Maybe silly but I simply did a BLAT analysis of the reads (which is really fast) to a reference genome which allowed me to simply choose any cut-off I like (length as well as sensitivity %homology). But probably this also depends on the specific requirements.....
My 2 cents.
Alex
AlexB is offline   Reply With Quote
Old 09-16-2009, 10:30 AM   #6
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by AlexB View Post
Maybe silly but I simply did a BLAT analysis of the reads (which is really fast) to a reference genome which allowed me to simply choose any cut-off I like (length as well as sensitivity %homology). But probably this also depends on the specific requirements.....
My 2 cents.
Alex
Alex, with the homopolymer issue, do you have something standard to take care of all those small indels that blat might be returning? I believe gsMapper has some in-built filters to take care of some of those false positives..
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 09-16-2009, 01:22 PM   #7
AlexB
Member
 
Location: Netherlands

Join Date: Sep 2009
Posts: 18
Default

I have to admit that in such detail we never looked so I can't comment. Since we were relatively new to the technology at the time we compared the results of gsmapper to the ones returned by BLAT and using certain homology/length cutoffs we more or less reproduced the results. This was using a 2Mb genome though... Can you be more precise with what you exactly mean I will keep my eye on it.
AlexB is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO