Seqanswers Leaderboard Ad

**krobison** · 05-21-2010, 07:32 AM

New paper out (not yet in Medline! -- that's how new it is) addresses this issue but does appear to contain specific code

Nucleic Acids Research, doi:10.1093/nar/gkq408

Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing

Jeffrey A. Rosenfeld1,2,*, Anil K. Malhotra1,2,3 and Todd Lencz1,2,3

Genomic sequence comparisons between individuals are usually restricted to the analysis of single nucleotide polymorphisms (SNPs). While the interrogation of SNPs is efficient, they are not the only form of divergence between genomes. In this report, we expand the scope of polymorphism detection by investigating the occurrence of double nucleotide polymorphisms (DNPs) and triple nucleotide polymorphisms (TNPs), in which two or three consecutive nucleotides are altered compared to the reference sequence. We have found such DNPs and TNPs throughout two complete genomes and eight exomes. Within exons, these novel polymorphisms are over-represented amongst protein-altering variants; nearly all DNPs and TNPs result in a change in amino acid sequence and, in some cases, two adjacent amino acids are changed. DNPs and TNPs represent a potentially important new source of genetic variation which may underlie human disease and they should be included in future medical genetics studies. As a confirmation of the damaging nature of xNPs, we have identified changes in the exome of a glioblastoma cell line that are important in glioblastoma pathogenesis. We have found a TNP causing a single amino acid change in LAMC2 and a TNP causing a truncation of HUWE1.

**swbarnes2** · 05-21-2010, 10:08 AM

I think there's two subquestions there; will an ailgner align reads that are more disparate than a single base change, and what will a variant parser make of them?

DNPs will probably be fine, even TNPs if your aligner handles 3 mismatches in a read. And I don't see why a variant parser would have a hard time with that.

Your last example is the hard one, as most aligners just wouldn't align reads with a 5 base discrepancy. What you'd see is a steep drop off in coverage just over the change, possibly with the edges of the discrepancy called as SNPs, as some reads will land in exactly the right place that they just cover it, and will align with only 1 or 2 discrepancies at the end of the read. In theory, if you fixed your reference genome to match at those two letters, and then realigned, you'd get more reads aligning, and maybe you'd cover the whole region with reads after enough iterations. But aligning a second time probably isn't feasable for many genomes.

De novo would catch those kinds of things, if your sample was mostly clonal or homozygous for that large change. Compare your de novo to your reference, and you'd see the discrepancy fine.

**NGSfan** · 05-21-2010, 01:03 PM

krobison:

wow! thanks a lot for sharing this paper with me - this is definitely hot off the press and on topic!

swbarnes2:

>I think there's two subquestions there; will an ailgner align reads that are >more disparate than a single base change, and what will a variant parser >make of them?

Yes this is true - there are two parts to detection - alignment and variant parser. I would think that new aligners such as BWA/BFAST/Novoalign can handle mismatches and indels >3bp . Bowtie maxes out at 3bp.

>DNPs will probably be fine, even TNPs if your aligner handles 3 mismatches in >a read. And I don't see why a variant parser would have a hard time with that.

The variant parser is really where I am concerned - because the pileup output from samtools, looks like neighboring SNVs will get treated separately than as being together. The point is that if your short reads capture two SNV in one read-span length, then you can assign these two mutations as going together into a allele. In heterzygous situations, treating them separately could mean that one allele has mutation 1 and the other allele has mutation 2

Please correct me if my genetic vocabulary use is wrong.

Thanks for joining the discussion and sharing your input. It's great to bounce off ideas and hear back from others

**Michael.James.Clark** · 05-24-2010, 05:43 PM

Originally posted by NGSfan View Post

What if, for example, you have a two neighboring SNV mutations detected inside your reads?

...

Or what about a deletion of sequence and the insertion of new sequence?

...

It seems to me that most of the tools out there are can handle identifying the simple SNV/indel scenarios but do not take into account such cases. Does samtools pileup capture these kinds of mutations?

Perhaps my assumption is wrong and some of the available tools handle them?

Thanks for any input.

In pure sequence terms, I don't think there is a difference between two SNVs right next to each other and a complex indel where two neighboring bases are removed and replaced with two other bases. Those two events will look identical when two sequences are side by side.

I believe many aligners and at least samtools for variant calling are indeed robust to these types of events and that they will usually mark them as indels because they do not necessarily constrain indels to a particular size, but they likely do constrain SNVs to one base (since they are, after all, single nucleotide variants). I guess that if a variant caller sees a spot where two expected bases in a row are missing, it flags that spot as a deletion, and if it sees a spot where two unexpected bases are present, it flags that as an insertion.

Therefore, it seems reasonable that such an events will be flagged as a deletion and an insertion directly adjacent to each other. (In fact, in the back of my mind, I feel like I've seen that very type of thing before in our own whole genome alignments... maybe just in aberrant reads, though.)

As for the indel example, as long as your aligner is robust against that (gapped aligners should be), that spot will similarly be flagged as both a deletion and an insertion adjacent to each other.

Also, for the case where there are repetitive elements that make the exact position of that sort of event ambiguous, I believe people generally either left-justify them or randomly position them.

**NGSfan** · 05-25-2010, 12:29 AM

Originally posted by Michael.James.Clark View Post

In pure sequence terms, I don't think there is a difference between two SNVs right next to each other and a complex indel where two neighboring bases are removed and replaced with two other bases. Those two events will look identical when two sequences are side by side.

I believe many aligners and at least samtools for variant calling are indeed robust to these types of events and that they will usually mark them as indels because they do not necessarily constrain indels to a particular size, but they likely do constrain SNVs to one base (since they are, after all, single nucleotide variants). I guess that if a variant caller sees a spot where two expected bases in a row are missing, it flags that spot as a deletion, and if it sees a spot where two unexpected bases are present, it flags that as an insertion.

You're right that in pure sequence terms, it will not make a difference, since you are just recording changes. But it will make a difference perhaps, when you want to distinguish alleles:

The following would get reported as g.6T>C and g.7T>G:

TGACTTTGCTGA Reference
TGACTCTGCTGA Read 1
TGACTCTGCTGA Read 2
TGACTCTGCTGA Read 3
TGACTTGGCTGA Read 4
TGACTTGGCTGA Read 5
TGACTTGGCTGA Read 6 etc..

And if my understanding of samtools pileup is correct, so would this case:

TGACTTTGCTGA Reference
TGACTCGGCTGA Read 1
TGACTCGGCTGA Read 2
TGACTCGGCTGA Read 3
TGACTTTGCTGA Read 4
TGACTTTGCTGA Read 5
TGACTTTGCTGA Read 6 etc..
etc..

So while, both are recorded as g.6T>C and g.7T>G at the end of the day, the problem is that they are really different kind of mutation, one from the other. However one alignment is telling you that an allele carries both, while the other tells you there are two alleles each carrying a different mutation. I think it is important to distinguish this, no?

Originally posted by Michael.James.Clark View Post

Therefore, it seems reasonable that such an events will be flagged as a deletion and an insertion directly adjacent to each other. (In fact, in the back of my mind, I feel like I've seen that very type of thing before in our own whole genome alignments... maybe just in aberrant reads, though.)

As for the indel example, as long as your aligner is robust against that (gapped aligners should be), that spot will similarly be flagged as both a deletion and an insertion adjacent to each other.

Also, for the case where there are repetitive elements that make the exact position of that sort of event ambiguous, I believe people generally either left-justify them or randomly position them.

These are definitely difficult alignment situations - because it deals with two events first a deletion, then an insertion. I am using BFAST, which for the most part handles indels pretty well. But just thinking of scenarios where the change is not just an Deletion *or* an Insertion but where both happened.

**Michael.James.Clark** · 05-25-2010, 08:36 AM

Originally posted by NGSfan View Post

You're right that in pure sequence terms, it will not make a difference, since you are just recording changes. But it will make a difference perhaps, when you want to distinguish alleles:

The following would get reported as g.6T>C and g.7T>G:

TGACTTTGCTGA Reference
TGACTCTGCTGA Read 1
TGACTCTGCTGA Read 2
TGACTCTGCTGA Read 3
TGACTTGGCTGA Read 4
TGACTTGGCTGA Read 5
TGACTTGGCTGA Read 6 etc..

And if my understanding of samtools pileup is correct, so would this case:

TGACTTTGCTGA Reference
TGACTCGGCTGA Read 1
TGACTCGGCTGA Read 2
TGACTCGGCTGA Read 3
TGACTTTGCTGA Read 4
TGACTTTGCTGA Read 5
TGACTTTGCTGA Read 6 etc..
etc..

So while, both are recorded as g.6T>C and g.7T>G at the end of the day, the problem is that they are really different kind of mutation, one from the other. However one alignment is telling you that an allele carries both, while the other tells you there are two alleles each carrying a different mutation. I think it is important to distinguish this, no?

But those aren't the same by sequence because they aren't occurring on the same haplotype. I doubt it would be reported as the same type of event because the first case should be called as two adjacent SNVs since they're happening on separate haplotypes while the second one is a deletion adjacent to an insertion because it's happening on the same haplotype.

These are definitely difficult alignment situations - because it deals with two events first a deletion, then an insertion. I am using BFAST, which for the most part handles indels pretty well. But just thinking of scenarios where the change is not just an Deletion *or* an Insertion but where both happened.

Like I said, in the back of my mind I recall seeing reads like this without a problem, and that's using BFAST. I'm not actually sure about samtools calling a variant like this because I don't recall seeing a variant like this (I think the closest I've seen is a deletion adjacent to a SNV). I encourage you to test it with a simulation if you're concerned with it, though.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

What about mutations in the "twilight zone"?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News