SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mixing strand-specific and non strand-specific reads in RNA-seq alexanderxcy Bioinformatics 2 08-27-2013 02:39 AM
Specific-strand Sequencing vs. Anti-sense Sequencing SeqGirl7 Literature Watch 2 04-18-2012 11:50 AM
Strand-specific library appears not strand-specific oligo Illumina/Solexa 7 12-08-2011 10:54 AM
FRT-seq: amplification-free, strand-specific transcriptome sequencing severin Literature Watch 0 03-22-2010 05:53 AM

Reply
 
Thread Tools
Old 10-16-2015, 01:12 PM   #1
sonia.bao
Member
 
Location: US

Join Date: May 2012
Posts: 12
Default Sequencing failed only on one strand within a specific genomic region

Hello everybody,

I had a strange observation from the sequencing alignment of our cohort and was wondering whether you could help me. We sequenced several members of a family using Illumina whole exome sequencing, and I aligned the reads with bwa mem and novoalign (without trimming prior to alignment). Within one particular genomic region, which is protein-coding, exonic and very unique (the only hit from BLAT against human genome, and mappability is 1.0), the base quality is really bad only for the reverse strand, not for the plus strand, and this happens to every sample we sequenced. Any base outside of this particular region is totally fine.

Here is a screen shot of the alignment (viewed in UCSC Genome Browser):



Figure shown is about 100bp window.

Within this short genomic region, on the reverse strand, the base quality is consistently lower than 5 for ~95% of the reads, resulting many sequencing errors (as shown in the figure). Only a small fraction (~5%) of the reads from the reverse strand are still high quality for the same string of bases (baseQ>30).

I have been thinking of complex structure variants, lane bias, bad sample handling at the center, etc. but none of those could be the reason because the same sequencing failure was observed across different samples, sequencing platforms (Illumina GAII and HiSeq2000), sequencing centers (we had samples sequenced at two centers), exon capture kids (some samples used NimbleGen and some Agilent), lanes, R1/R2 of the pairs, different aligners. Therefore, it is likely to be intrinsic to the samples themselves. But I couldn't came out with a good explanation. All samples are germline samples from patients who developed tumors.

Any comments and suggestions will be extremely appreciated! Thanks =)

Cheers,
Sonia

Last edited by sonia.bao; 10-16-2015 at 01:20 PM.
sonia.bao is offline   Reply With Quote
Old 10-16-2015, 02:26 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Looks like some kind of misassembled collapsed repeat or hypervariable region. The mapping in the area is probably suspect and should be ignored for the purposes of calling variations with respect to that reference.
Brian Bushnell is offline   Reply With Quote
Old 10-16-2015, 06:04 PM   #3
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Whats the region? Looks like a low-complexity repeat.
ECO is offline   Reply With Quote
Old 10-16-2015, 06:06 PM   #4
sonia.bao
Member
 
Location: US

Join Date: May 2012
Posts: 12
Default

Quote:
Originally Posted by Brian Bushnell View Post
Looks like some kind of misassembled collapsed repeat or hypervariable region. The mapping in the area is probably suspect and should be ignored for the purposes of calling variations with respect to that reference.
Thank you Brian. I was thinking of this too but if that was the case, would it affect both plus and minus strands? I was puzzled by the fact that only one strand is affected!
sonia.bao is offline   Reply With Quote
Old 10-16-2015, 06:15 PM   #5
sonia.bao
Member
 
Location: US

Join Date: May 2012
Posts: 12
Default

Quote:
Originally Posted by ECO View Post
Whats the region? Looks like a low-complexity repeat.
Thank you ECO. The region is chr5:31,526,200-31,526,300 on hg19 assembly. It is a unique region with no repetitive elements.

More updates: I checked another cohort that we sequenced at a different center and on a different date. It is the same!! The minus strand is really bad just for this region. It seems a universal problem.....
sonia.bao is offline   Reply With Quote
Old 10-16-2015, 09:19 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

There are also certain motifs that interfere with the sequencing enzymes... or so I hear. That can cause sequencing to be unsuccessful in one direction.

But, I think this is a misassembled repeat. The right side does not have totally random errors; rather, there are discrete positions where many reads agree on an alternate allele. Maybe there's a misassembly because it's hard to sequence with any technology due to a structural issue like a hairpin, or being slippery.
Brian Bushnell is offline   Reply With Quote
Old 10-17-2015, 05:33 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,080
Default

Have you thought about trying a re-aligner to see if it improves the alignment. ABRA is one example.
GenoMax is offline   Reply With Quote
Old 10-17-2015, 06:18 PM   #8
sonia.bao
Member
 
Location: US

Join Date: May 2012
Posts: 12
Default

Quote:
Originally Posted by Brian Bushnell View Post
There are also certain motifs that interfere with the sequencing enzymes... or so I hear. That can cause sequencing to be unsuccessful in one direction.

But, I think this is a misassembled repeat. The right side does not have totally random errors; rather, there are discrete positions where many reads agree on an alternate allele. Maybe there's a misassembly because it's hard to sequence with any technology due to a structural issue like a hairpin, or being slippery.
Thanks Brian. I took a closer look to the samples and indeed those errors are not completely random. They always pop up at the same spot across multiple samples.

As a next step, I took the minus strand sequence from the erroneous region and checked whether it may form certain type of secondary structure:

>chr5:31526227-31526292 strand=-
CGGGAGCGAGGCCGCAGTCCCGACAGGAGAAGACAAGACAGCCGGTACAGATCTGATTATGACCGA

Using this RNA/DNA structure prediction program (http://rna.urmc.rochester.edu/RNAstr...Web/index.html)

The result suggested there is strong second structure forming within this DNA sequence!! Almost all bases have probability >= 80% (chr5_DNA_secondaryStr.sequencingBad.minus.pdf, attached)

I also took the plus strand sequence and the second structure is similar. (chr5_DNA_secondaryStr.sequencingBad.plus.pdf)

As a control, I took DNA sequence of similar length from a region where the sequencing was good:

>chr5:31526292-31526358 strand=-
TATGATGACCACAGGCACCGAGATCACAGTCATGGGCGAGGTGAGAGGCATCGGTCCCTGGATCGGC

And the prediction result suggested certain structure may form but nothing is strong. (chr5_DNA_secondaryStr.sequencingOK.pdf, also attached)

So this could be the reason!

Last edited by sonia.bao; 10-17-2015 at 06:28 PM.
sonia.bao is offline   Reply With Quote
Old 10-17-2015, 06:27 PM   #9
sonia.bao
Member
 
Location: US

Join Date: May 2012
Posts: 12
Default

Here are the predicted DNA secondary structure output files
sonia.bao is offline   Reply With Quote
Old 10-17-2015, 06:51 PM   #10
sonia.bao
Member
 
Location: US

Join Date: May 2012
Posts: 12
Default

Quote:
Originally Posted by GenoMax View Post
Have you thought about trying a re-aligner to see if it improves the alignment. ABRA is one example.
Thanks GenoMax. I was using GATK for indel realignment. ABRA sounds like another good option! How does it compare to GATK?
sonia.bao is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:47 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO