SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Exome result analysis MARYAMD Introductions 1 05-09-2012 04:28 AM
Kmer overrepressented in an exome analysis jpofmars Bioinformatics 2 02-14-2012 11:50 AM
Unusual QV on F3 vv85 SOLiD 0 07-27-2011 10:10 AM
Exome sequence annotation ketan_bnf Bioinformatics 1 01-06-2011 01:21 PM
exome indel annotation dvh Bioinformatics 4 10-07-2010 02:14 PM

Reply
 
Thread Tools
Old 07-27-2012, 04:01 PM   #1
shyam_la
Member
 
Location: California

Join Date: Mar 2012
Posts: 97
Exclamation Exome Analysis.. Annotation - unusual observation; need explanation..

I have a bioinformatics query on the exome project we are running. We are using a NimbleGenV2 exome capture kit for target capture.

It's a unusual sort of question, and has been nagging me for more than a week now and nobody could provide a good answer yet:

Lets say I have processed raw reads from a tumor-normal paired exome experiment and made them fit for mutation calling. I have two bam files (one each for tumor and normal) that I feed into a mutation caller and since its an exome experiment,

Case 1: I limit the variant calls to mutations limited to the target regions only by using the .bed file from the NimbleGen website, as an interval parameter.

Now, theoretically all the mutation calls made by the caller are exonic or splicing. I have 2100 SNVs.

I run these calls through an annotation software and annotate it against a refgene set (Annovar (uses directly downloaded UCSC refgene set), more than 92% of the SNVs are annotated as "exonic" or "splicing" as expected..


Case 2: I limit the variant calls to mutations limited to exons + 10 bases only by generating a .bed file of refgenes from the UCSC table browser, and use it as an interval parameter.

Now, once again theoretically all the mutation calls made by the caller are exonic or splicing. I have 2700 SNVs.

But when I run these calls through an annotation software and annotate it against a refgene set (Annovar again), only approximately 65%-75% of the calls are exonic or splicing. The rest are annotated as intronic, upstream, downstream and a zillion other things..

(1) My understanding is that the 2100 vs 2700 are because of possible misalignment of a fraction of the reads into non target regions and hence the extra 600 SNVs comprise false positive mutation calls, for the most part (correct me if I am wrong).
(2) The 92% vs 65-75% on the other hand is quite inexplicable. In both cases the caller was asked to call variants in only exonic regions; which in the former case was the capture target regions, and in the latter case was the refgene set of exons got from the Table Browser. I would have expected >90% exonic variants in Case 2 also..


Have you noticed this before? Is there an explanation as to why (2) is happening?
shyam_la is offline   Reply With Quote
Old 07-29-2012, 09:02 PM   #2
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi shyam_la,

1) Try to compare the two bed files (nimblegen and refgenes) to how different they are.
2) It does not seem too much to extend 10bp, but a big chunk of human exons are <200bp so the chance of getting non-exonic/splicing variants is quite big.
3) If you are curious, try the nimblegen bed file but extending 10bp; and try the refgenes without extending 10bp. I am quite interested in what you get.

Best regards,
Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 07-29-2012, 10:15 PM   #3
ersgupta
Member
 
Location: India

Join Date: Jun 2011
Posts: 26
Default

yeah the bed files can vary... which will ultimately effect the statistics, one more thing i want to ask is 2100 included in the 2700 you get in case 2 ??
ersgupta is offline   Reply With Quote
Old 07-30-2012, 10:04 AM   #4
shyam_la
Member
 
Location: California

Join Date: Mar 2012
Posts: 97
Default

Yes, the 2100 are included in the 2700. Of course the bed files vary - but that is not an explanation for my observation..

Quote:
Originally Posted by ersgupta View Post
yeah the bed files can vary... which will ultimately effect the statistics, one more thing i want to ask is 2100 included in the 2700 you get in case 2 ??
shyam_la is offline   Reply With Quote
Old 07-30-2012, 10:08 AM   #5
shyam_la
Member
 
Location: California

Join Date: Mar 2012
Posts: 97
Default

HI,

1) On IGV, they are not very different at the genomic level.. If I zoom in to look at finer details, the NimbleGen one has a lot of exons missing that are present in the refseq one (which is expected)..
I will try out mutation calling without the +10 bp - though doubt thats going to reduce the numbers very much..
Will update with results.

Quote:
Originally Posted by DZhang View Post
Hi shyam_la,

1) Try to compare the two bed files (nimblegen and refgenes) to how different they are.
2) It does not seem too much to extend 10bp, but a big chunk of human exons are <200bp so the chance of getting non-exonic/splicing variants is quite big.
3) If you are curious, try the nimblegen bed file but extending 10bp; and try the refgenes without extending 10bp. I am quite interested in what you get.

Best regards,
Douglas
www.contigexpress.com
shyam_la is offline   Reply With Quote
Old 07-30-2012, 02:45 PM   #6
shyam_la
Member
 
Location: California

Join Date: Mar 2012
Posts: 97
Default

Did it on 1 sample..
Got 2492 SNVs (exons only) vs 2768 (exons + 10 bp).
78% of those were annotated as exonic/splicing vs 70% (exon + 10bp)..

So, 8% of the difference is due to the extra 10bp that I had used. But 78% is still a low proportion.. Expected: atleast 90%

Quote:
Originally Posted by DZhang View Post
Hi shyam_la,

1) Try to compare the two bed files (nimblegen and refgenes) to how different they are.
2) It does not seem too much to extend 10bp, but a big chunk of human exons are <200bp so the chance of getting non-exonic/splicing variants is quite big.
3) If you are curious, try the nimblegen bed file but extending 10bp; and try the refgenes without extending 10bp. I am quite interested in what you get.

Best regards,
Douglas
www.contigexpress.com
shyam_la is offline   Reply With Quote
Old 07-30-2012, 06:37 PM   #7
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

2100x.92=1932
2492X.78=1944

So the absolute exonic/splicing numbers are quite close. Without examining carefully the difference in the two bed files and the actual SNV variants unique to the refgene bed file, it is hard to explain why.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 07-31-2012, 11:07 AM   #8
shyam_la
Member
 
Location: California

Join Date: Mar 2012
Posts: 97
Default

Yeah, exactly my thoughts..
Thank you.
shyam_la is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO