SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing

Similar Threads
Thread Thread Starter Forum Replies Last Post
SNP Calling for 454 reads empyrean Bioinformatics 1 11-17-2011 07:23 AM
SNP calling from transcriptome data lre1234 Bioinformatics 2 08-26-2011 08:16 AM
SNP calling for captured sequence data Mali Salmon Genomic Resequencing 14 05-05-2011 02:21 AM
New Paper: High Quality SNP Calling Using Illumina Data at Shallow Coverage nmalhis Bioinformatics 0 03-01-2010 02:40 PM
SNP calling on 454 data bioinfosm Bioinformatics 0 10-15-2008 10:53 AM

Reply
 
Thread Tools
Old 10-15-2008, 10:55 AM   #1
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default SNP calling on 454 data

Anyone has ideas on how to make variation calls on 454 re-sequencing data?

perhaps using the Alldiffs or HCDiffs files from gsmapper software? or some other tools. I believe there needs to be some downstream analysis after Marth lab's mosaik tool, in order to get variation positions and % calls for A C G Ts
bioinfosm is offline   Reply With Quote
Old 10-15-2008, 02:51 PM   #2
cariaso
Member
 
Location: Wageningen, the Netherlands

Join Date: Jan 2008
Posts: 31
Default

I'd hoped we were working on similar things, but it seems not. Your problem seem to be more about recognizing novel snps, which is substantially different from my need to recognized named snps.

Specifically I need to turn the PGP10 exome fasta into a series of dbSNP rs#s and report observed genotypes. Results will be tab delimited and look something like
http://www.snpedia.com/files/prometh...-23andMe-1.csv
Since this is about recognizing named entities, I'd like to extend it to also recognize non-SNP features such as Huntington's, and possibly CNVs.

Sorry I can't be more helpful, but if anyone has code or advice on either topic I'm interested in both.
cariaso is offline   Reply With Quote
Old 10-16-2008, 08:36 AM   #3
Tom Bair
Member
 
Location: Iowa

Join Date: Oct 2008
Posts: 28
Default

We are working with this, we use mostly the HCDiffs file with alot of post processing. Key things we look at are read depth (hcdiffs is a depth of 3, 2 one way 1 the other)I would say 5 is a better minimum, 15 if you are looking for hets. We also filter for known snps using the dbsnp track from ucsc database and if it is in an exon (also from ucsc) since most people I am working with are looking at nimblegen capture experiments, primarily focused on exons. If you are looking outside exons conservation score appears somewhat useful.

don't know if that helps at all
Tom Bair is offline   Reply With Quote
Old 10-22-2008, 08:09 AM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 481
Default

Thanks Tom, that was helpful.

Any others looking for SNPs from 454 data? I heard brute blast approach with no gaps also works! lots of try-it-out-yourself
bioinfosm is offline   Reply With Quote
Old 10-29-2008, 09:42 AM   #5
timread
Member
 
Location: Atlanta, Georgia

Join Date: Oct 2008
Posts: 14
Default

We are primarily looking for SNPs in bacterial genomes (ie no heterozygotes). For a first look we parse the HCDifs file for differences with >85% agreement. We then proceed to validation. Most of the single base insertions and deletions turn out to be false positives.
timread is offline   Reply With Quote
Old 10-31-2008, 10:05 AM   #6
Tom Bair
Member
 
Location: Iowa

Join Date: Oct 2008
Posts: 28
Default

timread,

Could you give some parameters on read depth for the false vs true positives? Or do you find no correlation.

Thanks

Tom
Tom Bair is offline   Reply With Quote
Old 11-02-2008, 11:08 AM   #7
timread
Member
 
Location: Atlanta, Georgia

Join Date: Oct 2008
Posts: 14
Default

No correlation I can see in the differences called by newbler runmapper that we validated (which are generally high quality calls). I dont think we have a large enough sample size though. We have noted trends in the raw output from runmapper for calls that fall underneath our cutoof filter. Like a large number of 1 bp insertions and deletions are <25-fold read coverage and <50% concordance.

tim
timread is offline   Reply With Quote
Old 11-21-2008, 01:45 PM   #8
Josliu
Junior Member
 
Location: State College, PA

Join Date: Nov 2008
Posts: 4
Smile SNP calling for 454 data

You may use NextGENe software to call SNPs using 454 data. The software links the calling to dbSNP database if GenBank format is provided. SoftGenetics may provide a demo to use NextGENe to your own data.

josliu
Josliu is offline   Reply With Quote
Old 11-25-2008, 04:28 AM   #9
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default Capture and beyond

This is quite a tricky process, especially without the support of bioinformaticians. The downstream analysis is much more complex than carrying out the capture array itself. The HCDiffs file does seem very promising for extracting useful information for SNPs.

Tim, could please say what you mean when you say that you parse the HCDifs file for ""differences with >85% agreement"". and also the kind of validation you do? As I am also certain that alot of our indels will be false positives.
(Thankyou)
Has anybody attempted denovo contig assembly from their capture array data?

Layla
Layla is offline   Reply With Quote
Old 11-25-2008, 10:04 AM   #10
timread
Member
 
Location: Atlanta, Georgia

Join Date: Oct 2008
Posts: 14
Default

Quote:
Originally Posted by Layla View Post
Tim, could please say what you mean when you say that you parse the HCDifs file for ""differences with >85% agreement"". and also the kind of validation you do? As I am also certain that alot of our indels will be false positives.


Layla
Layla - by '85% agreement', I mean 85% of the 454 reads agree with the variant call. This is the final column on the header line of the HCDifs file. Verification is by Sanger sequencing.
timread is offline   Reply With Quote
Old 11-27-2008, 02:12 AM   #11
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Smile

Thank you Tim, I realized what you meant 2 seconds after I had posted the question! Yes, I have been focusing on that file and using diffs > 75% agreement. Cheers, Layla
Layla is offline   Reply With Quote
Old 12-14-2009, 01:32 PM   #12
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default denovo contig assembly from capture array

Quote:
Originally Posted by Layla View Post
This is quite a tricky process, especially without the support of bioinformaticians. The downstream analysis is much more complex than carrying out the capture array itself. The HCDiffs file does seem very promising for extracting useful information for SNPs.

Tim, could please say what you mean when you say that you parse the HCDifs file for ""differences with >85% agreement"". and also the kind of validation you do? As I am also certain that alot of our indels will be false positives.
(Thankyou)
Has anybody attempted denovo contig assembly from their capture array data?

Layla
Layla,

Have you found anyone that has done the contig assembly? I'm curious...
RockChalkJayhawk is offline   Reply With Quote
Old 12-20-2009, 04:30 AM   #13
Layla
Member
 
Location: London

Join Date: Sep 2008
Posts: 58
Default

Nope, Sorry! I am no longer working on this project
Layla is offline   Reply With Quote
Old 12-23-2009, 03:35 AM   #14
Tuxido
Member
 
Location: Nijmegen, Netherlands

Join Date: Jun 2009
Posts: 22
Default

We also use the hcdiffs in combination with our own downstream analysis where we annotate the data with known SNPs and other useful info. Seems to work fine as long as you have sufficient coverage and there's not too many variants close to each other. With lower coverage you start getting more false positives but you also start missing variants. Actually we once did a comparison with a SNP array and the HCDiffs of version 1.0 of the mapper software and that didn't look that good, as we were missing quite a few variants.
Tuxido is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:01 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.