SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Calling structural variants (CNVs) with single-end reads agwe Genomic Resequencing 5 01-18-2016 07:30 AM
Filter out reads with several variants david.tamborero Bioinformatics 0 01-25-2012 08:31 AM
pair reads vs single reads modocthegreat Bioinformatics 2 01-12-2012 05:46 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
SOLiD SAGE reads direction and orientation invermay SOLiD 0 02-14-2011 06:17 PM

Reply
 
Thread Tools
Old 05-05-2010, 06:24 PM   #1
Moggs
Junior Member
 
Location: Australia

Join Date: May 2010
Posts: 5
Default Single direction reads for variants

I'm running a targeted resequencing project and I'm observing some strange results with respect to few single base variants. We capture using SureSelect, sequence single end with GAIIx and then align reads and call variants with MAQ. From 118 tumor samples we see 80 samples that have a het call (T/G) at exactly the same base position where the matching ref base (T) has equal number of reads in both directions (forward/reverse) but the variant (G) has reads in only one direction (reverse). The reads mapping to G allele have multiple start positions so I'm ruling out PCR bias and contamination. The variant isn't in dSNP and the sequence is unique (ie not repetitive and no apparent pseudogenes). Anyone have any ideas?
Moggs is offline   Reply With Quote
Old 05-05-2010, 10:23 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Moggs View Post
I'm running a targeted resequencing project and I'm observing some strange results with respect to few single base variants. We capture using SureSelect, sequence single end with GAIIx and then align reads and call variants with MAQ. From 118 tumor samples we see 80 samples that have a het call (T/G) at exactly the same base position where the matching ref base (T) has equal number of reads in both directions (forward/reverse) but the variant (G) has reads in only one direction (reverse). The reads mapping to G allele have multiple start positions so I'm ruling out PCR bias and contamination. The variant isn't in dSNP and the sequence is unique (ie not repetitive and no apparent pseudogenes). Anyone have any ideas?
What are the mapping qualities for the reads with the G mutation?
nilshomer is offline   Reply With Quote
Old 05-05-2010, 11:20 PM   #3
Moggs
Junior Member
 
Location: Australia

Join Date: May 2010
Posts: 5
Default

The G quality scores are OK (Phred-like typically 20-35), no different from the T scores. With some further investigation I noticed that read direction is not always balanced for the T call and often biased in favour of forward reads (average 5:1 across 80 samples). Perhaps there is something odd about this sequence resulting in misincorporation of a C for A (as its a reverse read) only at this particular base position. Quite strange
Moggs is offline   Reply With Quote
Old 05-06-2010, 03:09 AM   #4
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Which allele is in the baits?

What is the local sequence context of the T/G ?
ECO is offline   Reply With Quote
Old 05-06-2010, 04:04 PM   #5
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

What is your coverage per sample?

It's possible you're just seeing a random event. 66/118 and 80/118 having specific allele reads on only one particular strand is unlikely (especially at high coverage), but certainly possible if you're only observing the position a couple times per sample.

Could also be due to biases in your hyb or subsequent PCR. Hard to say.

I think it's unlikely to be due to the sequencing incorporating the wrong base in a systematic way. It can and does incorporate the wrong base randomly of course, but for 80/118 to randomly be the same wrong base on the same strand by chance seems unlikely (especially given what you observed with the reference T allele displaying a similar bias).

I'd hypothesize the variant is real and verify by other means such as Sanger sequencing.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 05-06-2010, 04:45 PM   #6
Moggs
Junior Member
 
Location: Australia

Join Date: May 2010
Posts: 5
Default

The bait should be the reference HG18 sequence therefore T not G. The surrounding sequence context as follows:

Forward GGAGGAAGCTGG/TACCGTGCCAACGGCCA
Reverse TGGCCGTTGGCACGGTA/CCAGCTTCCTCC

Base position chr1:2056602 on HG18
Moggs is offline   Reply With Quote
Old 05-06-2010, 05:01 PM   #7
Moggs
Junior Member
 
Location: Australia

Join Date: May 2010
Posts: 5
Default

We'll do the validation but my bet is that it is an artifact, although I don't have a reasonable explanation. Another group in our institute sees the same variant being called for a different bait library targeting the same gene and I was curious to know if anyone else sees the same thing or if there are any general rules about predicting artifacts from imbalanced read directions when a variant is being called with adequate seq depth. Contamination would seem likely in our case if all reads on the G had one start site (as we share facilities) but they don't. Depth is typically 50+ across 80 samples.
Moggs is offline   Reply With Quote
Old 05-06-2010, 06:51 PM   #8
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

I asked some of the people in my lab who do SureSelect pulldowns on GAIIx and they said they do not see phenomena like this.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 05-06-2010, 07:53 PM   #9
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

I would have a look at this thread regarding context specific errors (specifically T->G changes near a GGnnG) in newer data, particularly the link in post 9.
ECO is offline   Reply With Quote
Old 05-06-2010, 09:21 PM   #10
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Interesting link. Could be from fragmentation protocol. In our lab, we use a Covaris I believe at 4C, not sure on the exact settings. Moggs, what do you do for fragmentation?
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 06-22-2010, 03:34 AM   #11
Moggs
Junior Member
 
Location: Australia

Join Date: May 2010
Posts: 5
Default

Just to report that we tried validating the variants and they were false. Thanks for the heads up on the GGnnG issue ECO. Must be related.
Moggs is offline   Reply With Quote
Old 06-22-2010, 06:55 AM   #12
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Thanks for the final word on this story...interesting.
ECO is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO