Seqanswers Leaderboard Ad

**Zam** · 02-06-2012, 10:31 AM

Hola!

You could try this software for de novo variant calling with no reference at all - ideal for a de novo species
software:

CORTEX website

http://cortexassembler.sourceforge.net/index_cortex_var.html

paper in Nature Genetics here:

http://dx.doi.org/10.1038/ng.1028

This works for your case 1. It also works for case 2, and can either use or ignore your draft, as you prefer. The paper shows how well the method works, and also shows how you can get better results with a de novo species if you have data from multiple samples, rather than just from one.

To be clear - I am biased, as I am an author :-)

Good luck!

Zam

**swangg** · 02-07-2012, 04:19 PM

Hi Zam,

Cortex_var will do assemble before calling variants, but reads generated by GBS are not supposed to have overlapping. Is it OK in this case to use cortex_var? If it still works fine, cortex_var would be a great tool for calling SNP in GBS data.

Thanks,

swang

**fcr** · 02-08-2012, 03:04 AM

Thanks for your response Zam,

I will consider your program. and come back after reading more about it.

I guess one way to overcome swang concern is to use pair-end reads to facilitate the assembly, does it?

Cheers,
fcr

**Zam** · 02-08-2012, 03:13 AM

Hi there. Thanks for pointing this out swangg. I must admit in my ignorance, I misunderstood, and thought "genotyping by sequencing" was a generic term to distinguish genotyping from shotgun sequencing from genotyping using a chip/array. Anyway, the answer is that I don't know enough about how the reads from sequencing by genotyping are produced after the restriction cut. As swangg said, Cortex will only call a SNP (or variant) if there are enough reads to cover both alleles (ie k-1 bases before and after the variant, on both alleles).

I know people are happily using Cortex with RAD-sequencing data, with results that look good - they have paired end reads where one end is at the tag and the other end is an insert away, and they use just the second read to look for variants. I also have seen Cortex used on restriction data where the number of SNP calls was lower than I expected.

fcr - I don't think pairing is the answer to swanng's question - it basically reduces to a question of how the reads are sampled. Do you get single ended reads which precisely are adjacent to the restriction cut site? Or do you get paired, where one end is at the cut site, in which case you could find SNPs using the second read.

Hope that clarifies a bit.

**fcr** · 02-08-2012, 03:43 AM

Hi Zam,

That's pretty interesting. But what the reason of using the second pair for a RAD fragment to call SNPs? My guess is that sequence quality is always better at the beginning of the read...and the interesting thing is look for variation in the genome rather than precisely after the restriction enzyme target.

Thanks a lot,
fcr

**Zam** · 02-08-2012, 03:48 AM

Hi fcr - yes - the RAD target itself is not of interest - the idea is the first read gets the tag/restriction site, and hopefully is monomorphic. The second read is 200bp away (or whatever). Do this for a bunch of different samples, and at a fixed tag, each sample has a set of reads 200bp away from that site. If there are SNPs there, then you find them in those reads. It's just a way of looking for variants in a non-model genome where you don't have a reference, and want to ensure you get a bunch of reads from really different places.

**rururara** · 03-06-2012, 05:34 AM

Genotyping By Sequencing (GBS) and SNP calling

The idea to use Cortex for GBS is quite interesting. I thought utilized Velvet/Oases to assemble the genome would be the best way. I should consider and try Cortex tomorrow .

In my case , i work with diploid plant and it's multi samples. I also plan to do SNP detection with Sequenom after SNP discovery. But I notice an issue to obtain the SNP position from de novo assembly. Is that simply to take SNP position given by Cortex or I need to do custom made script to obtain the position?

**Zam** · 03-06-2012, 05:43 AM

Cortex will produce SNP+other variant calls for you, and will genotype your samples. It will also produce flanking sequence for your calls, which should help you set up your Sequenom. That will all happen automatically, without any outside information.

Cortex will also give you a position relative to whichever reference you specify (it produces a VCF file), but it won't build that reference for you. If you have no such reference, you can still get a VCF-like file, without meaningful chr/pos. i.e. you don't need chr/position in order to get your calls and design your primers.

If your samples are from a single population, then Cortex can also accurately classify calls as repeat, variant or error, by comparing models for how coverage would behave on the two alleles.

**rururara** · 03-07-2012, 12:17 AM

Genotyping By Sequencing (GBS) and SNP calling

Thanks Zam,

I did also discuss with other researcher regarding this. Some of them prefer to write custom made script to call the SNP. but like me, i'm not really genius to write complicate script. so i still have to rely with variant call tools.
somehow i'm still thinking that is that reasonable to obtain which chr for the SNPs position in the case of de novo assembly? Can I get that from Cortex?

**rururara** · 03-07-2012, 01:58 AM

Dear all,

Sorry if I'm asking a silly sequence.
Can I clarify with u guys regarding GBS? Is that purposely done with DNA extraction? What about RNA extraction? Based on reading I found that for GBS approach they use restriction enzyme to reduce the genome complexity. I'm confuse now.

Does Cortex work with RNA-seq?

Hope anyone can explain here.
Thanks.

**Zam** · 03-07-2012, 02:05 AM

Hi there,
To answer your questions about Cortex
1. cortex will only give you chr/pos coordinates if you give it a reference. It does not attempt to build a whole genome assembly.
2. Cortex will work with RNA-seq, but all of the modelling work is tailored for DNA sequencing. Feel free to use it with RNA-seq as an experiment/exploration, it does provide the useful ability to compare multiple samples, and I am using it on RNA-seq data myself. However the error-cleaning methods and model are (currently) not well tailored for RNA-seq data, and right now you are probably better off with other tools.

**Geneus** · 06-27-2012, 04:36 AM

Anyone ever use this pipeline for GBS?

http://www.maizegenetics.net/gbs-bioinformatics

**Harremsis** · 07-29-2012, 04:06 AM

Hi Geneus,

I'm currently working with GBS data and that maizegenetics pipeline (TASSEL). It performed pretty well, in the end giving me SNPs for a set of individuals which had been sequenced paired-end on an Illumina HiSeq with multiplexing/pooling.

However, there are some issues with TASSEL that are suboptimal for my usecase:

Reads are being cropped to 64bp by the pipeline. I'd like to use more of my original 100bp reads.
If you are mapping your reads against a reference (as I do) chromosome names in that reference have to be numeric. This seems like a somewhat random constraint but you need to account for it by renaming your chromosomes accordingly.
Also note that TASSEL itself does not include a mapper. I used BWA to do the mapping against the reference genome. Once you get the SAM file out of that you can use TASSEL to go on (e.g. calling SNPs).

Especially the 64pb constraint bothers me a little which is why I would be very interested to know the answer to the original question in this thread:

Is STACKS safe to use for GBS data?

**Geneus** · 07-29-2012, 05:06 AM

Originally posted by Harremsis View Post

Hi Geneus,

I'm currently working with GBS data and that maizegenetics pipeline (TASSEL). It performed pretty well, in the end giving me SNPs for a set of individuals which had been sequenced paired-end on an Illumina HiSeq with multiplexing/pooling.

However, there are some issues with TASSEL that are suboptimal for my usecase:

Reads are being cropped to 64bp by the pipeline. I'd like to use more of my original 100bp reads.
If you are mapping your reads against a reference (as I do) chromosome names in that reference have to be numeric. This seems like a somewhat random constraint but you need to account for it by renaming your chromosomes accordingly.
Also note that TASSEL itself does not include a mapper. I used BWA to do the mapping against the reference genome. Once you get the SAM file out of that you can use TASSEL to go on (e.g. calling SNPs).

Especially the 64pb constraint bothers me a little which is why I would be very interested to know the answer to the original question in this thread:

Is STACKS safe to use for GBS data?

I am pretty sure that Dr. Buckler's lab uses Novoalign as an alignment tool...at least, so I am told. I know Dr. Buckler quite well and I am sure if you reached back to him he would be happy to engage in a detailed conversation on the TASSEL pipeline and listen to any suggestions...he is brilliant yet open minded...a very rare combination.

I cannot comment on STACKS...perhaps yet another question for Dr. Buckler?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Genotyping By Sequencing (GBS) and SNP calling

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News