Seqanswers Leaderboard Ad

**ketan_bnf** · 02-22-2011, 08:55 PM

Hi sulicon,

Are you mapping contigs to the reference seq using gsMapper?

if your contigs are larger than 2000bp gsMapper will not consider them for mapping to the ref seq, as i got that error during aligning contigs with gsMapper, so it is good to map reads to ref seq.

Also the gsMapper outputs HCDiff.txt containing high confidence SNP sites and INDELS.

**sulicon** · 02-22-2011, 10:17 PM

Originally posted by ketan_bnf View Post

Hi sulicon,

Are you mapping contigs to the reference seq using gsMapper?

if your contigs are larger than 2000bp gsMapper will not consider them for mapping to the ref seq, as i got that error during aligning contigs with gsMapper, so it is good to map reads to ref seq.

Also the gsMapper outputs HCDiff.txt containing high confidence SNP sites and INDELS.

No. I used gsAssembler for the de novo assembly. Then I BLATed the contigs against reference genome to get the structure of the genes. I observed some SNPs/Indels in the alignments, but had no confidence about the results...

**ketan_bnf** · 02-23-2011, 12:30 AM

If you want to find SNPs, you should map reads to ref seq using gsMapper or map contigs to ref seq using BWA http://bio-bwa.sourceforge.net/, get output in sam, extarct SNPs using SAMTools, Magicviewer.

You can also further annote that SNPs using variant effect predictor on EnsEMBL.

**sklages** · 02-23-2011, 04:08 AM

Originally posted by sulicon View Post

No. I used gsAssembler for the de novo assembly. Then I BLATed the contigs against reference genome to get the structure of the genes. I observed some SNPs/Indels in the alignments, but had no confidence about the results...

Why don't you just map your reads with gsMapper on your refseq? As you already have seen some SNP from BLATing your contigs, you know where to look for them ...

Sven

**krobison** · 02-23-2011, 05:54 AM

If you can generate a SAM/BAM file from your alignment, then the various SNP callers which work on that format should allow you to estimate confidence in the calls.

However, they will be relying on the quality scores generated by the base caller. I've recently run into a situation on another platform (SOLiD) in another setting (RNA-Seq) in which systematic errors were reinforced, and so some of my very confident calls from the SNP caller were bogus. In the end, nothing beats verifying at least a sample of your variant calls experimentally -- which is how I discovered the trouble in my data.

**sulicon** · 02-23-2011, 11:42 AM

Thanks Ketan and Sven. I have already assembled the contigs by newbler and performed a lot of subsequent analysis. It's better if I needn't to assembled the reads again... Maybe I have to map the reads to reference seq, just for the purpose of SNP detection.

**sulicon** · 02-23-2011, 11:55 AM

@krobison

Thanks. Could I generate SAM/BAM files from BLAT alignment between the contigs and human genome?

I think the there would be some information lost if I worked on the contigs, instead of reads. However, compared with aligning raw reads, I guess the de novo assembler has already considered the alignment between reads, and I could provide a quality file for the contigs. But I don't whether the SNP callers could realize sequencing error rate would rise in homopolymer regions. I will have a try.

We would perform some experiments for variants validation if I could find interesting candidates.

**colindaven** · 02-24-2011, 08:20 AM

I would definitely just use gsMapper to align to a reference genome. There are some nice output files with confidence, I think they are called HCDiff.txt or similar.
We have been doing this and the validation quite a lot of late and the 454 data is very nice for SNP calling, even at low coverages which has really surprised us after fun with Illumina-predicted SNPs at low coverage.

**Giulietta** · 02-25-2011, 05:27 AM

Just a word about the Ensembl variant effect predictor- if you have chromosome and base pair positions (on the reference assembly), you can enter in any alleles found at that position as your input. The output will let you know any dbSNP IDs that map to the same position. In this way, you can see if there is a known dbSNP ID for the allele/alternate nucleotide you have found. More is here:

Variation File Format

http://www.ensembl.org/info/website/upload/var.html

It also accepts VCF format:

1000genomes.org - 1000genomes Resources and Information.

http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40

1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

The tool itself is here- both a script, and a web interface:

404 Not Found

http://www.ensembl.org/tools.html

Hope that helps.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

how to validate SNPs and Indels after assembly?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News