![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mixing strand-specific and non strand-specific reads in RNA-seq | alexanderxcy | Bioinformatics | 2 | 08-27-2013 02:39 AM |
Specific-strand Sequencing vs. Anti-sense Sequencing | SeqGirl7 | Literature Watch | 2 | 04-18-2012 11:50 AM |
Strand-specific library appears not strand-specific | oligo | Illumina/Solexa | 7 | 12-08-2011 10:54 AM |
FRT-seq: amplification-free, strand-specific transcriptome sequencing | severin | Literature Watch | 0 | 03-22-2010 05:53 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: May 2012
Posts: 12
|
![]()
Hello everybody,
I had a strange observation from the sequencing alignment of our cohort and was wondering whether you could help me. We sequenced several members of a family using Illumina whole exome sequencing, and I aligned the reads with bwa mem and novoalign (without trimming prior to alignment). Within one particular genomic region, which is protein-coding, exonic and very unique (the only hit from BLAT against human genome, and mappability is 1.0), the base quality is really bad only for the reverse strand, not for the plus strand, and this happens to every sample we sequenced. Any base outside of this particular region is totally fine. Here is a screen shot of the alignment (viewed in UCSC Genome Browser): ![]() Figure shown is about 100bp window. Within this short genomic region, on the reverse strand, the base quality is consistently lower than 5 for ~95% of the reads, resulting many sequencing errors (as shown in the figure). Only a small fraction (~5%) of the reads from the reverse strand are still high quality for the same string of bases (baseQ>30). I have been thinking of complex structure variants, lane bias, bad sample handling at the center, etc. but none of those could be the reason because the same sequencing failure was observed across different samples, sequencing platforms (Illumina GAII and HiSeq2000), sequencing centers (we had samples sequenced at two centers), exon capture kids (some samples used NimbleGen and some Agilent), lanes, R1/R2 of the pairs, different aligners. Therefore, it is likely to be intrinsic to the samples themselves. But I couldn't came out with a good explanation. All samples are germline samples from patients who developed tumors. Any comments and suggestions will be extremely appreciated! Thanks =) Cheers, Sonia Last edited by sonia.bao; 10-16-2015 at 01:20 PM. |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Looks like some kind of misassembled collapsed repeat or hypervariable region. The mapping in the area is probably suspect and should be ignored for the purposes of calling variations with respect to that reference.
|
![]() |
![]() |
![]() |
#3 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Whats the region? Looks like a low-complexity repeat.
|
![]() |
![]() |
![]() |
#4 | |
Member
Location: US Join Date: May 2012
Posts: 12
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#5 |
Member
Location: US Join Date: May 2012
Posts: 12
|
![]()
Thank you ECO. The region is chr5:31,526,200-31,526,300 on hg19 assembly. It is a unique region with no repetitive elements.
More updates: I checked another cohort that we sequenced at a different center and on a different date. It is the same!! The minus strand is really bad just for this region. It seems a universal problem..... |
![]() |
![]() |
![]() |
#6 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
There are also certain motifs that interfere with the sequencing enzymes... or so I hear. That can cause sequencing to be unsuccessful in one direction.
But, I think this is a misassembled repeat. The right side does not have totally random errors; rather, there are discrete positions where many reads agree on an alternate allele. Maybe there's a misassembly because it's hard to sequence with any technology due to a structural issue like a hairpin, or being slippery. |
![]() |
![]() |
![]() |
#8 | |
Member
Location: US Join Date: May 2012
Posts: 12
|
![]() Quote:
As a next step, I took the minus strand sequence from the erroneous region and checked whether it may form certain type of secondary structure: >chr5:31526227-31526292 strand=- CGGGAGCGAGGCCGCAGTCCCGACAGGAGAAGACAAGACAGCCGGTACAGATCTGATTATGACCGA Using this RNA/DNA structure prediction program (http://rna.urmc.rochester.edu/RNAstr...Web/index.html) The result suggested there is strong second structure forming within this DNA sequence!! Almost all bases have probability >= 80% (chr5_DNA_secondaryStr.sequencingBad.minus.pdf, attached) I also took the plus strand sequence and the second structure is similar. (chr5_DNA_secondaryStr.sequencingBad.plus.pdf) As a control, I took DNA sequence of similar length from a region where the sequencing was good: >chr5:31526292-31526358 strand=- TATGATGACCACAGGCACCGAGATCACAGTCATGGGCGAGGTGAGAGGCATCGGTCCCTGGATCGGC And the prediction result suggested certain structure may form but nothing is strong. (chr5_DNA_secondaryStr.sequencingOK.pdf, also attached) So this could be the reason! Last edited by sonia.bao; 10-17-2015 at 06:28 PM. |
|
![]() |
![]() |
![]() |
#9 |
Member
Location: US Join Date: May 2012
Posts: 12
|
![]()
Here are the predicted DNA secondary structure output files
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: US Join Date: May 2012
Posts: 12
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|