I've been helping other people with processing PacBio data using the SMRTanalysis software. Basically, for our purposes, I've just been using the reads of insert protocol. This has worked great for pacbio data we have from several species, but now I'm having issues trying to process the data for one particular species where it's producing an order of magnitude fewer reads than expected (i.e., about 300-500 versus the 5000+). Adjusting the quality and coverage parameters makes minimal difference. The data is originally from 2013, but data for another species sequenced at the same time appears to be fine.
Comparing the folders of the problematic species side-by-side with species where we had no issues, it appears as though all the files are present. All of the raw data files appear to be consistent in size. However, the generated ccs and subread fasta/fastq files that we received with the raw data are all an order of magnitude small for the problematic species, which leads me to believe that the problem doesn't have to do with the analysis, but rather with the original sequencing process.
So, the question: what could have gone wrong that the raw .h5 data files all appear to be a typical size (~1GB each), but analysis software is only detecting <10% of the reads expected? Coverage and quality seem fine for the reads that it does detect.
Thank you.
Comparing the folders of the problematic species side-by-side with species where we had no issues, it appears as though all the files are present. All of the raw data files appear to be consistent in size. However, the generated ccs and subread fasta/fastq files that we received with the raw data are all an order of magnitude small for the problematic species, which leads me to believe that the problem doesn't have to do with the analysis, but rather with the original sequencing process.
So, the question: what could have gone wrong that the raw .h5 data files all appear to be a typical size (~1GB each), but analysis software is only detecting <10% of the reads expected? Coverage and quality seem fine for the reads that it does detect.
Thank you.
Comment