SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
DEXseq - very low numbers of counts kajot RNA Sequencing 6 08-04-2018 12:04 AM
HTSeq, human genomes and low read counts: am I doing anything wrong? ExMachina Bioinformatics 12 07-31-2014 12:22 PM
RNA SEQ Data read counts from known and unknown regions deveci Bioinformatics 0 06-24-2011 03:33 PM
DESeq: Read counts vs. BP counts burkard Bioinformatics 0 08-06-2010 12:52 AM

Reply
 
Thread Tools
Old 09-14-2016, 10:15 AM   #1
anjama
Member
 
Location: Florida

Join Date: Mar 2016
Posts: 15
Default Low read counts from old PacBio data

I've been helping other people with processing PacBio data using the SMRTanalysis software. Basically, for our purposes, I've just been using the reads of insert protocol. This has worked great for pacbio data we have from several species, but now I'm having issues trying to process the data for one particular species where it's producing an order of magnitude fewer reads than expected (i.e., about 300-500 versus the 5000+). Adjusting the quality and coverage parameters makes minimal difference. The data is originally from 2013, but data for another species sequenced at the same time appears to be fine.

Comparing the folders of the problematic species side-by-side with species where we had no issues, it appears as though all the files are present. All of the raw data files appear to be consistent in size. However, the generated ccs and subread fasta/fastq files that we received with the raw data are all an order of magnitude small for the problematic species, which leads me to believe that the problem doesn't have to do with the analysis, but rather with the original sequencing process.

So, the question: what could have gone wrong that the raw .h5 data files all appear to be a typical size (~1GB each), but analysis software is only detecting <10% of the reads expected? Coverage and quality seem fine for the reads that it does detect.

Thank you.

Last edited by anjama; 09-20-2016 at 09:34 AM.
anjama is offline   Reply With Quote
Old 09-15-2016, 07:34 AM   #2
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Pacbio machines to not generate a fixed number of reads, the number of reads from the reads of insert protocol is highly dependent on loading, which can be variable between libraries. Do you have the P0, P1 and P2 statistics? These are generated by the reads of insert protocol (loading report) when running on the command line or gui.
rhall is offline   Reply With Quote
Old 09-16-2016, 07:21 AM   #3
anjama
Member
 
Location: Florida

Join Date: Mar 2016
Posts: 15
Default

I don't see anything a report called a loading report, or that gives p0, p1, p2 statistics. I've attached screenshots of the reports I do have. The left side is the problematic species. The right side is a similar species that was submitted at the same time, and produced results similar to what I've seen from several other species sequenced recently.

For the problematic species, this was apparently the second attempt they tried sequencing it. The first time it produced abnormally small output files (about 50% the size of everything else we typically have gotten), so they did it again. Running the reads of insert protocol with the original bad run gives similar results to the supposed good run.

I'm not really involved with any the sample preparation or sequencing aspect of this, so I only have a basic understanding of how pacbio works. I don't know how to tell if this was an issue with the sample preparation itself, the sequencing machine, or the data output. What has me curious/confused is why the raw data files are so large, yet it seems like the analysis tools are seeing only a small amount of data. Particularly because the data it does see seems to have perfectly fine coverage and read quality. Mean length is a touch lower than I typically see, but not enough that I can confidently call it abnormal.
Attached Images
File Type: png report01.png (146.3 KB, 8 views)
File Type: png report02.png (32.0 KB, 5 views)
anjama is offline   Reply With Quote
Old 09-16-2016, 10:19 AM   #4
gconcepcion
Member
 
Location: Menlo Park

Join Date: Dec 2010
Posts: 68
Default

This is characteristic of a loading issue as rhall indicated.

Unfortunately the ReadsOfInsert protocols don't give you the loading report information, however you can lump all of the cells together in a job (or do them separately) and map them to a reference using the RS_Resequencing workflow. One of the reports generated is a loading efficiency report that should give you an indication of how well the SMRTCell was loaded relative to the other ones.

You can use any reference if you don't care about the mapping and only want the loading report.

Attached is a sample loading report :
productivity 0 - means empty ZMWs
productivity 1 - ZMWs loaded with a single polymerase (what you want)
productivity 2 - ZMWs loaded with more than one polymerase (mostly unusable)
Attached Images
File Type: png Screen Shot 2016-09-16 at 10.17.21 AM.png (109.3 KB, 7 views)
gconcepcion is offline   Reply With Quote
Old 09-16-2016, 01:24 PM   #5
anjama
Member
 
Location: Florida

Join Date: Mar 2016
Posts: 15
Default

Okay, I generated a loading report. The first row is the species from which we got typical results. The second row is the first attempt at the problematic species. The third row is the second attempt.

Judging by this thread:
http://seqanswers.com/forums/showthread.php?t=43256
None of these results look particularly good. Even the one that I thought was good appears like it might be underloaded. Granted, I don't know what typical values are, or how they might vary across techniques.

What should I be taking away from these numbers? Thanks
Attached Images
File Type: png loading report.png (19.7 KB, 9 views)

Last edited by anjama; 09-20-2016 at 09:35 AM.
anjama is offline   Reply With Quote
Old 09-17-2016, 11:01 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It looks like the first two cells were underloaded, and produced mostly nothing; while the third cell was massively overloaded, so most of the data was unusable. My understanding is that when you overload cells, the usable portion tends to be short-insert due to diffusion speed.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
low reads, pacbio, smrtanalysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO