SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BRCA sub type information in TCGA papori General 1 02-11-2016 05:59 AM
BRCA grade papori Bioinformatics 0 07-23-2014 06:40 AM
Myriad and BRCA math_guy Bioinformatics 17 07-25-2013 08:13 AM
calculate alignement differences atma_weapon Bioinformatics 0 04-11-2013 03:22 PM
What do i do with the generated files? BRCA-DIAGNOSTIC andrehorta Bioinformatics 0 01-13-2011 02:01 AM

Reply
 
Thread Tools
Old 06-27-2016, 03:48 AM   #1
tristan dubos
Member
 
Location: France

Join Date: Dec 2015
Posts: 39
Default BRCA insertion left alignement : references (dbSNP)

Hi all,

I'm analyzing data from Myseq on BRCA and i have some problems about the results i obtained concerning INDEL after variant caller. In the figure you can see the position on IGV and the deletion corresponding to 32911870 del of T.
My first question is : If i am right this is a left alignment of my insertion ?

My second question concern the reference of the mutation :
I looked in a second time in dbSNP(v146) if i found this reference : 32911873 del T (rs397507666) which means the reference is a deletion on the end of the homo-polymer and then aligned on the rigth ?

Best regards

Tristan
Attached Images
File Type: png Capture du 2016-06-27 12:32:35.png (30.3 KB, 3 views)
tristan dubos is offline   Reply With Quote
Old 06-28-2016, 04:26 AM   #2
tristan dubos
Member
 
Location: France

Join Date: Dec 2015
Posts: 39
Default

Finally i looked in the database clinvar and i find the correct position if the normalization is the left alignment. At the end i have the same rs reference for Clean_var and dbSNP but with different genomic position...
Do you know if the left normalization have been in dbSNP ? ( i do not find this information )
tristan dubos is offline   Reply With Quote
Old 06-28-2016, 05:15 AM   #3
Jessica_L
Senior Member
 
Location: Washington, D.C. metro area

Join Date: Feb 2010
Posts: 118
Default

Hi Tristan,

If I understand your questions correctly, I'd say that yes, your data looks like a left alignment of the deletion (a very common thing for most, if not all, variant callers to do), while the official NCBI records are right aligned. Part of the reason for the latter are the rules that govern HGVS nomenclature , which prefer features to be aligned at the right-most possible position.

Based on this page it appears that the dbSNP entry is also right-aligned, as it should be. An entry for the left alignment probably does not exist, since it's really the same thing-- the T deletion, regardless of position in the codon, causes a frameshift and an apparent deletion of the T at c.3381. There's no way to determine if the deletion occurred at c.3379 instead, so it would never be reported that alternate way.

I hope that helps.
Jessica_L is offline   Reply With Quote
Old 06-28-2016, 06:22 AM   #4
tristan dubos
Member
 
Location: France

Join Date: Dec 2015
Posts: 39
Default

Thank you for this help i'm not crazy then
The point is i don't understand why they decide to use rigth aligned nomenclature, because i don't know aligner and variant caller doing that... is there exist software who translate this kind of information ? or i m using wrong software (bwa and bowtie) ?
tristan dubos is offline   Reply With Quote
Old 06-28-2016, 08:43 AM   #5
Jessica_L
Senior Member
 
Location: Washington, D.C. metro area

Join Date: Feb 2010
Posts: 118
Default

My understanding is that the HGVS guidelines were developed several years before NGS was-- I remember using them to annotate Sanger sequence reads.

Why HGVS marks features at their 3' end is probably more a function of how a change at the DNA/RNA level affects a translated protein: in a run of TTT(n) that could represent several phenylalanine codons, a deletion of one base, even at the 5' end, causes a frameshift that ultimately results in an alteration of the last phenylalanine at the protein level, regardless of where the actual deletion event occurs.

I imagine the algorithms for things like aligning and variant calling work similarly to the one used in BLAST-- they all tend to left align, which I imagine is probably more efficient with respect to calculating edit distances between a sequence and its reference, or something to that effect. But that also has the side effect of making matching up with protein-focused HGVS nomenclature somewhat difficult.

I wouldn't say you're using the wrong software as any aligner/variant caller (that I'm aware of, anyway) will do the same thing. As to whether there's a piece of software to switch from left to right aligned features (and vice-versa), I'm not sure. I find that if I'm looking for the same indels across multiple samples or data sets, I just make a note of the left aligned position and include it in any .bed or .vcf that I use for intersecting/filtering.

Last edited by Jessica_L; 06-28-2016 at 08:44 AM. Reason: clarity
Jessica_L is offline   Reply With Quote
Old 06-28-2016, 11:55 PM   #6
tristan dubos
Member
 
Location: France

Join Date: Dec 2015
Posts: 39
Default

Thank you for your answers. I understand you point of view it an option to annotate one by one INDEL and check if it exists in dbSNP or other database. I am still surprise that nobody already creates software to translate this kind of information. May be i will try to do this if i got time for

Again cheers for your explanations

Best regards

Tristan
tristan dubos is offline   Reply With Quote
Old 08-23-2016, 12:54 AM   #7
tristan dubos
Member
 
Location: France

Join Date: Dec 2015
Posts: 39
Default

Hi ,
to close this subject if somebody is interested in , i find a simple solution. I developed a little pipeline to convert INDELs from Clinvar and dbSNP databases to detect them. I just generated simulated reads containing INDEL mutation with bed tool one by one , aligned it with the aligner i use ( bwa ) against the references , made a variant calling and notes if there is a differences between results produced and the INDEL expected.
Easy way

Best regards

Tristan
tristan dubos is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:18 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO