SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
hg19 genome reference for short read mapping yh253 Bioinformatics 4 12-29-2013 09:11 PM
Assisted de novo genome assembly? Create new consensus mapping reads to reference? zmartine Bioinformatics 8 02-10-2012 12:31 AM
Transcript mapping without a reference genome gringer Bioinformatics 2 11-04-2011 08:49 AM
Mapping reference genome to ensembl identifier bnfoguy Bioinformatics 0 06-13-2011 06:04 PM
454 assembling against reference genome donniemarco 454 Pyrosequencing 2 08-17-2009 06:22 AM

Reply
 
Thread Tools
Old 08-18-2010, 08:52 AM   #21
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default

@Heng Li

Well I don't care too much about SNPs, actually what I work with resembles more chip-seq technology. All I need to know is the position, not the alignment. I like BWA because I need to work with both 454 and HiSeq, and compare them, so I prefer BWA because seems to be able to manage both. Does Ssaha2 manage high throughput?
aleferna is offline   Reply With Quote
Old 08-18-2010, 09:14 AM   #22
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Quote:
Originally Posted by aleferna View Post
@Heng Li

Well I don't care too much about SNPs, actually what I work with resembles more chip-seq technology. All I need to know is the position, not the alignment. I like BWA because I need to work with both 454 and HiSeq, and compare them, so I prefer BWA because seems to be able to manage both. Does Ssaha2 manage high throughput?
Ssaha2 is designed for high throughput sequencing. As I said, it is usually faster than blat, although less easy to use, I would say.
lh3 is offline   Reply With Quote
Old 08-19-2010, 04:59 AM   #23
SoftGenetics
Registered Vendor
 
Location: pa

Join Date: Apr 2009
Posts: 32
Default

Quote:
Originally Posted by query View Post
What is the best tool available to map 454 reads to a reference genome? What is the method used by gs reference Mapper (analysis tool that comes with 454) and does it do a decent job of mapping and identifying variants?
You may wish to try the mapper in NextGEne it is especially robust for the detection of indels using a 3 step process...you can obtain a free time limited trial on the softgenetics web site.
SoftGenetics is offline   Reply With Quote
Old 08-20-2010, 01:27 AM   #24
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default

@Adamo

Here is the script that I've been using. DISCLAIMER: I made this for my own data and it has not been tested on regular sequence data, so please read the code make sure you understand what the script does before using it. It is tuned to join BWASW Z 100 with ALN N 4 sam files.

Also, its a python script but the system wouldn't upload it with extention .py.
Attached Files
File Type: pl JoinBWA_ALN_BWASW.py.pl (1.6 KB, 28 views)
aleferna is offline   Reply With Quote
Old 08-20-2010, 02:00 AM   #25
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

Quote:
Originally Posted by aleferna View Post
@Adamo

Here is the script that I've been using. DISCLAIMER: I made this for my own data and it has not been tested on regular sequence data, so please read the code make sure you understand what the script does before using it. It is tuned to join BWASW Z 100 with ALN N 4 sam files.

Also, its a python script but the system wouldn't upload it with extention .py.
Ok, I didn't notice you'd posted here!
Thanks a lot, I'm gonna see what's in it now.
Adamo is offline   Reply With Quote
Old 08-21-2010, 03:35 PM   #26
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

Instead of using Z=100 on the whole data set, it might be a better (meaning faster) idea to first align the data set with Z=1 (default value) and then realign the ones that do not satisfy your alignment criteria with a higher value for Z. This should speed up the process if you assume that a high number of the reads will map to the reference.
robs is offline   Reply With Quote
Old 08-21-2010, 03:40 PM   #27
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

Quote:
Originally Posted by aleferna View Post
The first time I ran BWA with the long aligner I didn't realize that there was a short/long option and since I have both in my library I was very disappointed of BWA. I started testing algorithm after algorithm and finally reviewed BWA again. This time I made a small script that will just join 2 sam files, one for the small aligner and one from the long aligner. It will choose the alignment from the short aligner if it cannot find it in the long aligner, this was the winning combination.

I've mentioned this chart in another thread, but here you can see that BWA is the only one that can cover the full range of read sizes in 454 datasets (or in 100bp solexa data after you remove the pair end adapters!)

http://www.nada.kth.se/~afer/benchmark.jpeg

Moreover, I know using the Z=100 seems a bit of an overkill but with 454 data and a decent computer BWA will take just a few minutes and I did measure Z=1,10,25,50,100,250 and even 500. Z = 100 seems to be the peak, after this I cannot squeeze any specificity out of the algorithm, but you do see a change from Z=10 to Z=100.
Looking at your chart, you actually get better sensitivity for longer reads with low error rates using the default settings instead of using Z=100. Any idea what causes a higher Z-best value to result in lower sensitivity?
robs is offline   Reply With Quote
Old 08-22-2010, 06:57 PM   #28
boyzoe
Junior Member
 
Location: Changsha, China

Join Date: Jul 2010
Posts: 4
Default

Quote:
Originally Posted by lh3 View Post
Ssaha2 is designed for high throughput sequencing. As I said, it is usually faster than blat, although less easy to use, I would say.
Actually, I couldn't install in ubuntu. After extraction, I could see the files (read me, ssaha2, ssaha2build, ssaha snp). However, after put the command into terminal, it told me that command can't found. This bothers me for a week.

My RNA-seq data is not for a species that genome is sequenced but zebrafish genome maybe suitable for these sample are fishes which are close relative of zebrafish. The goal is to analysis SNP and recombination in hybirds and their parents. Is there any guys have idea?

Really appreciate for you guys!

Last edited by boyzoe; 08-23-2010 at 07:35 AM.
boyzoe is offline   Reply With Quote
Old 08-23-2010, 04:26 AM   #29
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by boyzoe View Post
Actually, I couldn't install in ubuntu. After extraction, I could see the files (read me, ssaha2, ssaha2build, ssaha snp). However, after put the command into terminal, it told me that command can't found. This bothers me for a week.
Try:

./ssaha

(assuming the file is in the current directory, indicated by the dot in Unix). If you tried this:

ssaha

it would look for an installed copy of ssaha on the system path - but it would not try the current directory. At least, that is how recent versions of Ubuntu are configured.
maubp is offline   Reply With Quote
Old 08-23-2010, 07:28 AM   #30
aleferna
Senior Member
 
Location: sweden

Join Date: Sep 2009
Posts: 121
Default

Quote:
Originally Posted by robs View Post
Looking at your chart, you actually get better sensitivity for longer reads with low error rates using the default settings instead of using Z=100. Any idea what causes a higher Z-best value to result in lower sensitivity?
@rob

you mean like 200bp 0% error? where Z100 is 97.29% and default is 97.30%??
aleferna is offline   Reply With Quote
Old 08-23-2010, 05:26 PM   #31
boyzoe
Junior Member
 
Location: Changsha, China

Join Date: Jul 2010
Posts: 4
Thumbs up

Quote:
Originally Posted by maubp View Post
Try:

./ssaha

(assuming the file is in the current directory, indicated by the dot in Unix). If you tried this:

ssaha

it would look for an installed copy of ssaha on the system path - but it would not try the current directory. At least, that is how recent versions of Ubuntu are configured.
Really thanks, maubp. The problem solved~!
boyzoe is offline   Reply With Quote
Old 08-25-2010, 07:32 PM   #32
robs
Senior Member
 
Location: San Diego, CA

Join Date: May 2010
Posts: 116
Default

Quote:
Originally Posted by aleferna View Post
@rob

you mean like 200bp 0% error? where Z100 is 97.29% and default is 97.30%??
The same for 500bp reads where the default is better for 0-2% error rates.
robs is offline   Reply With Quote
Old 08-29-2010, 01:52 PM   #33
RNM
Junior Member
 
Location: Aus

Join Date: Apr 2009
Posts: 1
Default

I have got somewhat related question, how do I tweak gsMapper parameters in order to get the reads mapped without introducing gaps. I am looking for SNPs and InDels and the gsMapper output (454AllDiffs.txt) shows gaps rather than mismatches. The mappings to the reference sequence looks fine, but when it comes to detecting variants the mapper is not doing what I expected it to do. Is there some extra step that I am missing for SNP/Indel detection? I have also tried AVA, but as my sequence is not an amplicon (far too big than the standard definition of amplicon in 454 terms), AVA isnt of much help either.
RNM is offline   Reply With Quote
Old 02-09-2011, 06:36 AM   #34
Estefania
Junior Member
 
Location: Rosario, Argentina

Join Date: Feb 2011
Posts: 4
Default

Hello Everybody

I would like to know if it is better to use 1 or more reference genome at a time, using the reference gs mapper, for 454 reads

Thank You
Estefania is offline   Reply With Quote
Reply

Tags
454, bwa-sw, ssaha2

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO