SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
Failure of tumor FFPE samples sequencing maelle.anciaux RNA Sequencing 1 11-13-2018 06:41 AM
Is ddRAD sequencing strand-specific? carladosanjos De novo discovery 4 11-16-2016 11:38 PM
ddRAD Jayakumar S Bioinformatics 0 03-26-2016 03:06 AM
hybridization failure- Target exome sequencing Ion proton vandy Ion Torrent 2 07-30-2013 02:12 AM

Reply
 
Thread Tools
Old 06-19-2019, 03:25 AM   #1
alextheinnes
Junior Member
 
Location: Scotland

Join Date: Nov 2017
Posts: 3
Unhappy ddRAD Sequencing failure

Hi,

I haven't posted before but have used the forum a lot during the construction of my libraries.

I recently received a second library back from a sequencing facility and after a brief look through it seems that the sequence data is almost all random rather than the targeted RAD markers aimed for. The fastQC files seem to suggest that the restriction cut sites are to blame showing much lower quality scores than the rest of the read and in most cases not matching the expected cut sequence for MseI or PstI. If I run through the processing pipeline the number of stacks assembled is drastically lower than last time and the coverage has dropped from a mean of 30x to around 5x.

My best guess is that the for some reason the adapters (with the restriction site specific overhang) have ligated to pretty much anything and everything in the digest reaction rather than targeting the restriction fragments and as a result I have sequenced a much more diverse pool of fragments at a much lower coverage. Unfortunately that means most of it is useless

A few other details. I have checked for adapter contamination in the reads and there is very little (I checked and double checked this throughout the library prep too) so i dont think its adapter dimerization. This is the second library using the same method and the first one worked fine. To further confuse matters the sequencing facility had to resequence the library as there was issues with overclustering. They had the same issues again the second time but reckon the data is fine to use.

It may be a case of degraded oligos used to make up the adapters (i used the same ones for both libraries) but if so why is it just the cut site that is low quality (the rest of the adapter quality is high)? And if so i dont understand how the ligation and ligation QC during prep could have been so successful with degraded adapters or overhangs? And even if this was the case I would have thought that in a pool of purified digested DNA that most free ends in the digested pool would be RADtag ends anyway so I would expect something in my sequence data?

Sorry for so many questions. My heart sank when i found this out and I am still digging through the data for answers. Any help in figuring out what has gone wrong would be much appreciated.

Many thanks

Alex
Attached Images
File Type: png bp_qual.PNG (54.9 KB, 8 views)
File Type: png bp_content.PNG (82.3 KB, 11 views)
File Type: png tile_qual.PNG (22.7 KB, 4 views)
alextheinnes is offline   Reply With Quote
Old 06-19-2019, 12:00 PM   #2
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 501
Default

Did the facility add extra PhiX to help the basecalling in the low-complexity region of the cut site? What is the cut site?

It does seem like two issues. If you are getting lots of off-target sites then a common problem is your adapters having the overhang chewed back and blunt-ligating to random sheared ends. But then the restriction region should not be very low complexity and should have normal quality!

Are the off-target sites all at 1X read depth or is it a larger set of sites with most around 5X? If the latter, then is the size-selection different this time and you just ended up sequencing more ddRAD sites than hoped for? It doesn't sound like it if the fragments truly aren't near cut sites, but just checking.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 06-23-2019, 04:32 AM   #3
alextheinnes
Junior Member
 
Location: Scotland

Join Date: Nov 2017
Posts: 3
Default

Hi SNPsaurus,

Thanks for the response and apologies for the delayed reply. Ive been away from the computer.

They did add PhiX yes at 5% to compensate for the low complexity region. The cut sites are for MseI and PstI. Following ligation to the adapters the cut sites should GGTAA for MseI and TGCAG for PstI. Both appear to have been affected equally by the quality drop in FastQC and the loss of specificity in adapter ligation. The size selection if anything was tighter this time round so i would have expected better coverage. I have yet to try mapping the reads to see how far they sit from the expected cut sites.

Ive now talked further with the sequencing facility and they have explained that the overclustering issue was caused by high levels of polyclonal or multi-occupancy wells. The automatic filters then remove those wells from further sequencing resulting in a much lower final read count than expected. What they are not sure about is why there are such high levels of multi-occupancy with this library.

With regards to the low quality cut site signal in the fastQC files, they suggest that this is usual for ddRAd and other Rad based sequence libraries where the low complexity region results in lower call confidence values from the sequencer. What is strange about this is that the PhiX should have negated this issue and that this was not observed at all in the first library.

The question now is whether the multi-occupancy levels, the low quality cut site call confidence, and the poor adapter specificity are related in any way. My initial thought was the low quality score and the adapter specificity must be, but on second thought perhaps not. If the overhangs had been blunted and I had the blunt ends ligating to random sheared fragments Id have thought they should still have formed coherent fragments for sequencing? Could the base pairs around the blunt end ligation region be degraded in some way that would affect the call confidence during sequencing?

Thanks again for the help and insight.

Alex
alextheinnes is offline   Reply With Quote
Old 06-24-2019, 12:38 PM   #4
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 501
Default

If they loaded PhiX at 5% but the library was mis-estimated and actually much higher, then the actual PhiX could be lower, leading to poor quality scores.

Are you sequencing just one of the cut sites in read 1?

It may be that you are still sequencing ddRAD fragments and not random genomic but the poor quality of the cut site (because of overloading) is changing the cut site sequence. If you take 30 bp from the middle of "bad" read, can you find it in other reads with some of those reads starting from a good sequence and has a cut site?

It can be tough to amplify a very tight size selection and that does give artifacts a chance to become significant. But I think you should characterize the "off site" reads and determine if they are random and scattered or real ddRAD loci that just have bad quality cut sites that have changed the cut site sequence.

Overall though the core problem is the overloading. You aren't going to get a good number of reads. But you do want to figure this out to see if you can just load at a better concentration and you'll be fine or if other issues are present.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 06-24-2019, 11:35 PM   #5
alextheinnes
Junior Member
 
Location: Scotland

Join Date: Nov 2017
Posts: 3
Default

Thanks SNPsaurus,

I hadnt considered the possibility that the overloading might have affected the rest of the library, I had only looked at it the other way round. I'll definitely take a look from that perspective.

I had intended to look for cut site 2 in smaller fragments sequenced. The reads shouldnt contain the second cut site as the library size targeted was between 400-600bp, the average size was 465bp, and I was sequencing at 150bp PE. But there will likely be some and I will have a look through and see whether any showed the second site but not the first. That might hint at some sort of overloading effect at the first site.

There is a chance it was mis-estimated slightly as they saw a very slight increase in concentration when they re-quantified the library for the second attempt at sequencing. But all together the library was quantified 3 times (qPCR and Bionanalyser each time), once by myself and twice by the facility, and it was fairly consistent.

I really need to sit down and have a detailed look through the raw read data like you suggest. Unfortunately im away and wont be back in front of a workstation for a week but I will update here once I get that chance.

Thanks again for the suggestions,

best,

A.

Last edited by alextheinnes; 06-24-2019 at 11:39 PM.
alextheinnes is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO