SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
MAQ for SNP genotyping phatjoe Bioinformatics 0 10-17-2011 07:03 PM
samtools mpileup for SNP genotyping (VCF4) kkoh Bioinformatics 0 03-02-2011 12:36 PM
PubMed: Pyrosequencing for SNP genotyping. Newsbot! Literature Watch 0 09-22-2009 02:00 AM
Navigenics will offer Next Gen Sequencing in addition to HTP Affy SNP Genotyping ECO Personalized Genomics 0 11-15-2007 12:29 PM

Reply
 
Thread Tools
Old 02-06-2012, 08:44 AM   #1
fcr
Member
 
Location: Seville

Join Date: Jan 2012
Posts: 19
Default Genotyping By Sequencing (GBS) and SNP calling

Dear all,

I am interested in using the GBS method and then perform SNP detection on Illumina reads.
However, I am not sure about which would be the appropriate software to preform this task on:

i) a de novo species
ii) using a draft reference genome with multiple scaffolds


What about using VarScan for the first case and TASR for the second?
Does anyone has any experience with these....I also wonder if STACKs will perform well on these data...

Thanks in advance,
Fernando
fcr is offline   Reply With Quote
Old 02-06-2012, 09:31 AM   #2
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Hola!

You could try this software for de novo variant calling with no reference at all - ideal for a de novo species
software:
http://cortexassembler.sourceforge.n...ortex_var.html
paper in Nature Genetics here:
http://dx.doi.org/10.1038/ng.1028

This works for your case 1. It also works for case 2, and can either use or ignore your draft, as you prefer. The paper shows how well the method works, and also shows how you can get better results with a de novo species if you have data from multiple samples, rather than just from one.

To be clear - I am biased, as I am an author :-)

Good luck!

Zam
Zam is offline   Reply With Quote
Old 02-07-2012, 03:19 PM   #3
swangg
Junior Member
 
Location: USA

Join Date: Feb 2012
Posts: 1
Default

Hi Zam,

Cortex_var will do assemble before calling variants, but reads generated by GBS are not supposed to have overlapping. Is it OK in this case to use cortex_var? If it still works fine, cortex_var would be a great tool for calling SNP in GBS data.

Thanks,

swang
swangg is offline   Reply With Quote
Old 02-08-2012, 02:04 AM   #4
fcr
Member
 
Location: Seville

Join Date: Jan 2012
Posts: 19
Default

Thanks for your response Zam,

I will consider your program. and come back after reading more about it.

I guess one way to overcome swang concern is to use pair-end reads to facilitate the assembly, does it?

Cheers,
fcr
fcr is offline   Reply With Quote
Old 02-08-2012, 02:13 AM   #5
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Hi there. Thanks for pointing this out swangg. I must admit in my ignorance, I misunderstood, and thought "genotyping by sequencing" was a generic term to distinguish genotyping from shotgun sequencing from genotyping using a chip/array. Anyway, the answer is that I don't know enough about how the reads from sequencing by genotyping are produced after the restriction cut. As swangg said, Cortex will only call a SNP (or variant) if there are enough reads to cover both alleles (ie k-1 bases before and after the variant, on both alleles).

I know people are happily using Cortex with RAD-sequencing data, with results that look good - they have paired end reads where one end is at the tag and the other end is an insert away, and they use just the second read to look for variants. I also have seen Cortex used on restriction data where the number of SNP calls was lower than I expected.

fcr - I don't think pairing is the answer to swanng's question - it basically reduces to a question of how the reads are sampled. Do you get single ended reads which precisely are adjacent to the restriction cut site? Or do you get paired, where one end is at the cut site, in which case you could find SNPs using the second read.

Hope that clarifies a bit.
Zam is offline   Reply With Quote
Old 02-08-2012, 02:43 AM   #6
fcr
Member
 
Location: Seville

Join Date: Jan 2012
Posts: 19
Default

Hi Zam,

That's pretty interesting. But what the reason of using the second pair for a RAD fragment to call SNPs? My guess is that sequence quality is always better at the beginning of the read...and the interesting thing is look for variation in the genome rather than precisely after the restriction enzyme target.

Thanks a lot,
fcr
fcr is offline   Reply With Quote
Old 02-08-2012, 02:48 AM   #7
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Hi fcr - yes - the RAD target itself is not of interest - the idea is the first read gets the tag/restriction site, and hopefully is monomorphic. The second read is 200bp away (or whatever). Do this for a bunch of different samples, and at a fixed tag, each sample has a set of reads 200bp away from that site. If there are SNPs there, then you find them in those reads. It's just a way of looking for variants in a non-model genome where you don't have a reference, and want to ensure you get a bunch of reads from really different places.
Zam is offline   Reply With Quote
Old 03-06-2012, 04:34 AM   #8
rururara
Member
 
Location: montreal

Join Date: Jan 2011
Posts: 31
Default Genotyping By Sequencing (GBS) and SNP calling

The idea to use Cortex for GBS is quite interesting. I thought utilized Velvet/Oases to assemble the genome would be the best way. I should consider and try Cortex tomorrow .

In my case , i work with diploid plant and it's multi samples. I also plan to do SNP detection with Sequenom after SNP discovery. But I notice an issue to obtain the SNP position from de novo assembly. Is that simply to take SNP position given by Cortex or I need to do custom made script to obtain the position?
rururara is offline   Reply With Quote
Old 03-06-2012, 04:43 AM   #9
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Cortex will produce SNP+other variant calls for you, and will genotype your samples. It will also produce flanking sequence for your calls, which should help you set up your Sequenom. That will all happen automatically, without any outside information.

Cortex will also give you a position relative to whichever reference you specify (it produces a VCF file), but it won't build that reference for you. If you have no such reference, you can still get a VCF-like file, without meaningful chr/pos. i.e. you don't need chr/position in order to get your calls and design your primers.

If your samples are from a single population, then Cortex can also accurately classify calls as repeat, variant or error, by comparing models for how coverage would behave on the two alleles.
Zam is offline   Reply With Quote
Old 03-06-2012, 11:17 PM   #10
rururara
Member
 
Location: montreal

Join Date: Jan 2011
Posts: 31
Default Genotyping By Sequencing (GBS) and SNP calling

Thanks Zam,

I did also discuss with other researcher regarding this. Some of them prefer to write custom made script to call the SNP. but like me, i'm not really genius to write complicate script. so i still have to rely with variant call tools.
somehow i'm still thinking that is that reasonable to obtain which chr for the SNPs position in the case of de novo assembly? Can I get that from Cortex?
rururara is offline   Reply With Quote
Old 03-07-2012, 12:58 AM   #11
rururara
Member
 
Location: montreal

Join Date: Jan 2011
Posts: 31
Default

Dear all,

Sorry if I'm asking a silly sequence.
Can I clarify with u guys regarding GBS? Is that purposely done with DNA extraction? What about RNA extraction? Based on reading I found that for GBS approach they use restriction enzyme to reduce the genome complexity. I'm confuse now.

Does Cortex work with RNA-seq?

Hope anyone can explain here.
Thanks.
rururara is offline   Reply With Quote
Old 03-07-2012, 01:05 AM   #12
Zam
Member
 
Location: Oxford

Join Date: Apr 2010
Posts: 51
Default

Hi there,
To answer your questions about Cortex
1. cortex will only give you chr/pos coordinates if you give it a reference. It does not attempt to build a whole genome assembly.
2. Cortex will work with RNA-seq, but all of the modelling work is tailored for DNA sequencing. Feel free to use it with RNA-seq as an experiment/exploration, it does provide the useful ability to compare multiple samples, and I am using it on RNA-seq data myself. However the error-cleaning methods and model are (currently) not well tailored for RNA-seq data, and right now you are probably better off with other tools.
Zam is offline   Reply With Quote
Old 06-27-2012, 04:36 AM   #13
Geneus
Member
 
Location: New Jersey

Join Date: Dec 2010
Posts: 61
Default

Anyone ever use this pipeline for GBS?

http://www.maizegenetics.net/gbs-bioinformatics
Geneus is offline   Reply With Quote
Old 07-29-2012, 04:06 AM   #14
Harremsis
Junior Member
 
Location: Berlin, Germany

Join Date: Jul 2012
Posts: 1
Default

Hi Geneus,

I'm currently working with GBS data and that maizegenetics pipeline (TASSEL). It performed pretty well, in the end giving me SNPs for a set of individuals which had been sequenced paired-end on an Illumina HiSeq with multiplexing/pooling.

However, there are some issues with TASSEL that are suboptimal for my usecase:
  1. Reads are being cropped to 64bp by the pipeline. I'd like to use more of my original 100bp reads.
  2. If you are mapping your reads against a reference (as I do) chromosome names in that reference have to be numeric. This seems like a somewhat random constraint but you need to account for it by renaming your chromosomes accordingly.
    Also note that TASSEL itself does not include a mapper. I used BWA to do the mapping against the reference genome. Once you get the SAM file out of that you can use TASSEL to go on (e.g. calling SNPs).

Especially the 64pb constraint bothers me a little which is why I would be very interested to know the answer to the original question in this thread:

Is STACKS safe to use for GBS data?
Harremsis is offline   Reply With Quote
Old 07-29-2012, 05:06 AM   #15
Geneus
Member
 
Location: New Jersey

Join Date: Dec 2010
Posts: 61
Default

Quote:
Originally Posted by Harremsis View Post
Hi Geneus,

I'm currently working with GBS data and that maizegenetics pipeline (TASSEL). It performed pretty well, in the end giving me SNPs for a set of individuals which had been sequenced paired-end on an Illumina HiSeq with multiplexing/pooling.

However, there are some issues with TASSEL that are suboptimal for my usecase:
  1. Reads are being cropped to 64bp by the pipeline. I'd like to use more of my original 100bp reads.
  2. If you are mapping your reads against a reference (as I do) chromosome names in that reference have to be numeric. This seems like a somewhat random constraint but you need to account for it by renaming your chromosomes accordingly.
    Also note that TASSEL itself does not include a mapper. I used BWA to do the mapping against the reference genome. Once you get the SAM file out of that you can use TASSEL to go on (e.g. calling SNPs).

Especially the 64pb constraint bothers me a little which is why I would be very interested to know the answer to the original question in this thread:

Is STACKS safe to use for GBS data?
I am pretty sure that Dr. Buckler's lab uses Novoalign as an alignment tool...at least, so I am told. I know Dr. Buckler quite well and I am sure if you reached back to him he would be happy to engage in a detailed conversation on the TASSEL pipeline and listen to any suggestions...he is brilliant yet open minded...a very rare combination.

I cannot comment on STACKS...perhaps yet another question for Dr. Buckler?
Geneus is offline   Reply With Quote
Old 09-27-2012, 05:45 AM   #16
molecules
MU Informatics Research Core F
 
Location: Columbia, MO

Join Date: Oct 2010
Posts: 6
Default

STACKS should be fine.

At the GBS Workshop I attended at Cornell (with speakers including Dr. Buckler), we were told that any tool you can use to analyze RAD data, you can also use to analyze GBS data. So there should be no problem using STACKS for your GBS data.

Also, last I checked bwa and bowtie2 were the aligners mentioned in GBS training by the Buckler group.

disclaimer: I have no experience with STACKS. As part of the Maize Diversity Project, which Ed Buckler heads, I use the Buckler GBS pipeline to work with GBS/RAD maize data.
molecules is offline   Reply With Quote
Old 09-27-2012, 06:45 AM   #17
Geneus
Member
 
Location: New Jersey

Join Date: Dec 2010
Posts: 61
Default

Quote:
Originally Posted by molecules View Post
STACKS should be fine.

At the GBS Workshop I attended at Cornell (with speakers including Dr. Buckler), we were told that any tool you can use to analyze RAD data, you can also use to analyze GBS data. So there should be no problem using STACKS for your GBS data.

Also, last I checked bwa and bowtie2 were the aligners mentioned in GBS training by the Buckler group.

disclaimer: I have no experience with STACKS. As part of the Maize Diversity Project, which Ed Buckler heads, I use the Buckler GBS pipeline to work with GBS/RAD maize data.
Yes...you are correct...Bowtie2 is what they are using in the GBS pipeline...I got confirmation of that. His lab does however use Novoalign for their WGS.
Geneus is offline   Reply With Quote
Old 04-20-2013, 02:37 PM   #18
jage.g
Junior Member
 
Location: Chile

Join Date: Apr 2013
Posts: 1
Default Gbs

Hi
my name is Jorge and I student of Biotecnology engineering in Chile. I am think to make a survey of genotyping of Nothofagus dombelyi, and I need to use GBS but I'm not sure which protocol to use, use only one or two restriction enzyme restriction enzymes which are the advantages and disadvantages.
jage.g is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO