Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequencing failed only on one strand within a specific genomic region

    Hello everybody,

    I had a strange observation from the sequencing alignment of our cohort and was wondering whether you could help me. We sequenced several members of a family using Illumina whole exome sequencing, and I aligned the reads with bwa mem and novoalign (without trimming prior to alignment). Within one particular genomic region, which is protein-coding, exonic and very unique (the only hit from BLAT against human genome, and mappability is 1.0), the base quality is really bad only for the reverse strand, not for the plus strand, and this happens to every sample we sequenced. Any base outside of this particular region is totally fine.

    Here is a screen shot of the alignment (viewed in UCSC Genome Browser):



    Figure shown is about 100bp window.

    Within this short genomic region, on the reverse strand, the base quality is consistently lower than 5 for ~95% of the reads, resulting many sequencing errors (as shown in the figure). Only a small fraction (~5%) of the reads from the reverse strand are still high quality for the same string of bases (baseQ>30).

    I have been thinking of complex structure variants, lane bias, bad sample handling at the center, etc. but none of those could be the reason because the same sequencing failure was observed across different samples, sequencing platforms (Illumina GAII and HiSeq2000), sequencing centers (we had samples sequenced at two centers), exon capture kids (some samples used NimbleGen and some Agilent), lanes, R1/R2 of the pairs, different aligners. Therefore, it is likely to be intrinsic to the samples themselves. But I couldn't came out with a good explanation. All samples are germline samples from patients who developed tumors.

    Any comments and suggestions will be extremely appreciated! Thanks =)

    Cheers,
    Sonia
    Last edited by sonia.bao; 10-16-2015, 12:20 PM.

  • #2
    Looks like some kind of misassembled collapsed repeat or hypervariable region. The mapping in the area is probably suspect and should be ignored for the purposes of calling variations with respect to that reference.

    Comment


    • #3
      Whats the region? Looks like a low-complexity repeat.

      Comment


      • #4
        Originally posted by Brian Bushnell View Post
        Looks like some kind of misassembled collapsed repeat or hypervariable region. The mapping in the area is probably suspect and should be ignored for the purposes of calling variations with respect to that reference.
        Thank you Brian. I was thinking of this too but if that was the case, would it affect both plus and minus strands? I was puzzled by the fact that only one strand is affected!

        Comment


        • #5
          Originally posted by ECO View Post
          Whats the region? Looks like a low-complexity repeat.
          Thank you ECO. The region is chr5:31,526,200-31,526,300 on hg19 assembly. It is a unique region with no repetitive elements.

          More updates: I checked another cohort that we sequenced at a different center and on a different date. It is the same!! The minus strand is really bad just for this region. It seems a universal problem.....

          Comment


          • #6
            There are also certain motifs that interfere with the sequencing enzymes... or so I hear. That can cause sequencing to be unsuccessful in one direction.

            But, I think this is a misassembled repeat. The right side does not have totally random errors; rather, there are discrete positions where many reads agree on an alternate allele. Maybe there's a misassembly because it's hard to sequence with any technology due to a structural issue like a hairpin, or being slippery.

            Comment


            • #7
              Have you thought about trying a re-aligner to see if it improves the alignment. ABRA is one example.

              Comment


              • #8
                Originally posted by Brian Bushnell View Post
                There are also certain motifs that interfere with the sequencing enzymes... or so I hear. That can cause sequencing to be unsuccessful in one direction.

                But, I think this is a misassembled repeat. The right side does not have totally random errors; rather, there are discrete positions where many reads agree on an alternate allele. Maybe there's a misassembly because it's hard to sequence with any technology due to a structural issue like a hairpin, or being slippery.
                Thanks Brian. I took a closer look to the samples and indeed those errors are not completely random. They always pop up at the same spot across multiple samples.

                As a next step, I took the minus strand sequence from the erroneous region and checked whether it may form certain type of secondary structure:

                >chr5:31526227-31526292 strand=-
                CGGGAGCGAGGCCGCAGTCCCGACAGGAGAAGACAAGACAGCCGGTACAGATCTGATTATGACCGA

                Using this RNA/DNA structure prediction program (http://rna.urmc.rochester.edu/RNAstr...Web/index.html)

                The result suggested there is strong second structure forming within this DNA sequence!! Almost all bases have probability >= 80% (chr5_DNA_secondaryStr.sequencingBad.minus.pdf, attached)

                I also took the plus strand sequence and the second structure is similar. (chr5_DNA_secondaryStr.sequencingBad.plus.pdf)

                As a control, I took DNA sequence of similar length from a region where the sequencing was good:

                >chr5:31526292-31526358 strand=-
                TATGATGACCACAGGCACCGAGATCACAGTCATGGGCGAGGTGAGAGGCATCGGTCCCTGGATCGGC

                And the prediction result suggested certain structure may form but nothing is strong. (chr5_DNA_secondaryStr.sequencingOK.pdf, also attached)

                So this could be the reason!
                Last edited by sonia.bao; 10-17-2015, 05:28 PM.

                Comment


                • #9
                  Here are the predicted DNA secondary structure output files
                  Attached Files

                  Comment


                  • #10
                    Originally posted by GenoMax View Post
                    Have you thought about trying a re-aligner to see if it improves the alignment. ABRA is one example.
                    Thanks GenoMax. I was using GATK for indel realignment. ABRA sounds like another good option! How does it compare to GATK?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X