Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Single direction reads for variants

    I'm running a targeted resequencing project and I'm observing some strange results with respect to few single base variants. We capture using SureSelect, sequence single end with GAIIx and then align reads and call variants with MAQ. From 118 tumor samples we see 80 samples that have a het call (T/G) at exactly the same base position where the matching ref base (T) has equal number of reads in both directions (forward/reverse) but the variant (G) has reads in only one direction (reverse). The reads mapping to G allele have multiple start positions so I'm ruling out PCR bias and contamination. The variant isn't in dSNP and the sequence is unique (ie not repetitive and no apparent pseudogenes). Anyone have any ideas?

  • #2
    Originally posted by Moggs View Post
    I'm running a targeted resequencing project and I'm observing some strange results with respect to few single base variants. We capture using SureSelect, sequence single end with GAIIx and then align reads and call variants with MAQ. From 118 tumor samples we see 80 samples that have a het call (T/G) at exactly the same base position where the matching ref base (T) has equal number of reads in both directions (forward/reverse) but the variant (G) has reads in only one direction (reverse). The reads mapping to G allele have multiple start positions so I'm ruling out PCR bias and contamination. The variant isn't in dSNP and the sequence is unique (ie not repetitive and no apparent pseudogenes). Anyone have any ideas?
    What are the mapping qualities for the reads with the G mutation?

    Comment


    • #3
      The G quality scores are OK (Phred-like typically 20-35), no different from the T scores. With some further investigation I noticed that read direction is not always balanced for the T call and often biased in favour of forward reads (average 5:1 across 80 samples). Perhaps there is something odd about this sequence resulting in misincorporation of a C for A (as its a reverse read) only at this particular base position. Quite strange

      Comment


      • #4
        Which allele is in the baits?

        What is the local sequence context of the T/G ?

        Comment


        • #5
          What is your coverage per sample?

          It's possible you're just seeing a random event. 66/118 and 80/118 having specific allele reads on only one particular strand is unlikely (especially at high coverage), but certainly possible if you're only observing the position a couple times per sample.

          Could also be due to biases in your hyb or subsequent PCR. Hard to say.

          I think it's unlikely to be due to the sequencing incorporating the wrong base in a systematic way. It can and does incorporate the wrong base randomly of course, but for 80/118 to randomly be the same wrong base on the same strand by chance seems unlikely (especially given what you observed with the reference T allele displaying a similar bias).

          I'd hypothesize the variant is real and verify by other means such as Sanger sequencing.
          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
          Projects: U87MG whole genome sequence [Website] [Paper]

          Comment


          • #6
            The bait should be the reference HG18 sequence therefore T not G. The surrounding sequence context as follows:

            Forward GGAGGAAGCTGG/TACCGTGCCAACGGCCA
            Reverse TGGCCGTTGGCACGGTA/CCAGCTTCCTCC

            Base position chr1:2056602 on HG18

            Comment


            • #7
              We'll do the validation but my bet is that it is an artifact, although I don't have a reasonable explanation. Another group in our institute sees the same variant being called for a different bait library targeting the same gene and I was curious to know if anyone else sees the same thing or if there are any general rules about predicting artifacts from imbalanced read directions when a variant is being called with adequate seq depth. Contamination would seem likely in our case if all reads on the G had one start site (as we share facilities) but they don't. Depth is typically 50+ across 80 samples.

              Comment


              • #8
                I asked some of the people in my lab who do SureSelect pulldowns on GAIIx and they said they do not see phenomena like this.
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment


                • #9
                  I would have a look at this thread regarding context specific errors (specifically T->G changes near a GGnnG) in newer data, particularly the link in post 9.

                  Comment


                  • #10
                    Interesting link. Could be from fragmentation protocol. In our lab, we use a Covaris I believe at 4C, not sure on the exact settings. Moggs, what do you do for fragmentation?
                    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                    Projects: U87MG whole genome sequence [Website] [Paper]

                    Comment


                    • #11
                      Just to report that we tried validating the variants and they were false. Thanks for the heads up on the GGnnG issue ECO. Must be related.

                      Comment


                      • #12
                        Thanks for the final word on this story...interesting.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        46 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X