Seqanswers Leaderboard Ad

**krobison** · 06-21-2013, 05:04 AM

Base call accuracy in PacBio should be independent of read position; I believe this has been shown but don't know the reference.

What fold coverage of your sample do you have? Have your tried using HGAP instead of PacBioToCA? Have you tried Quiver?

**mchaisso** · 06-21-2013, 06:37 AM

Actually, you start to get closer to correct assemblies using greedy overlap (forming unitigs), so in a sense it is less algorithmically intense to assemble. Finding the overlaps is a bit more complicated than building a hash table (as in de Bruijn assembly), but it is just pairwise alignment and fast.

-mark

Originally posted by pag View Post

PB Disadvantages
Lower individual read accuracy - more read-to-read variability
More algorithmically intense to assemble

**pag** · 06-21-2013, 06:43 AM

Fold coverage from MIRA during a mapping assembly to a single chromosome (manually assembled from MIRA and newbler runs, along with some Sanger data):
Coverage assessment (calculated from contigs >= 5000):
=========================================================
Avg. total coverage: 61.41
Avg. coverage per sequencing technology
454: 29.99
PcBioHQ: 30.47

denovo assembly using MIRA or PacBioToCA appeared to "split" too much rather than "lump": basically generating multiple contigs that were covering the same region of the genome. MIRA in particular appeared to then erroneously join across long repeats where there was no evidence that two (non-repetitive) regions of the genome were actually adjacent to the same copy of the repeat.

I have not used HGAP, as I understand that would work with only the PacBio data and not the 454. Quiver I have not used yet as I do not have access to an installation of the SMRTPortal and "RS_Mapping_QVs." It is my understanding that Quiver operates on the data set of filtered reads from PacBio, not the PacBioToCA-corrected reads. Is this correct?

**dalexander** · 06-21-2013, 11:01 AM

Quiver author here. You are correct that Quiver works from the raw PacBio (sub)reads, not corrected reads, and you need to provide it with a cmp.h5 file with all QVs loaded for best results.

To get this cmp.h5 file without SMRTportal, you will have to (presently) use
- compareSequences [to create the cmp.h5 without QVs], and
- loadPulses [to import the QV tracks into the cmp.h5]

Both of these tools are in the SMRTanalysis package.

In the near future, we will be releasing a new tool called pbalign that makes this procedure easier, more automatic.

Feel free to message me and we can help walk you through the exact details of using compareSequences, loadPulses ... and then I can add these details to the docs on GitHub.

**dalexander** · 06-21-2013, 12:01 PM

---deleted---

**pag** · 06-21-2013, 06:44 PM

I believe I have Quiver installed now, but I get the following perplexing error
TypeError: run() takes exactly 1 argument (2 given)
File "smrtanalysis-2.0.1/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/compareSequences.py", line 895, in run
service.run( self.options.tmpDir )

adding
log.debug( self.options.tmpDir ) immediately above gives "/scratch"
specifying self.options.tmpDir[1] only sends the "s" character, but I get the same error with run. How is one character 2 arguments? Also, why are we telling a directory to run?

Edit: this was without options --algorithm=blasr --h5fn=aligned.cmp.h5 specified

**pag** · 06-21-2013, 07:27 PM

running compareSequences.py --algorithm=blasr --h5fn=aligned.cmp.h5 --debug --circularReference filtered_subreads.fasta reference_genome.fasta worked

Edit: running loadPulses on the reads pre-filter seemed to "work" but see next post.

but then loadPulses blah_s1_p0.bas.h5 aligned.cmp.h5
WARNING: There is insufficient data to compute metric: ClassifierQV in the file blah_s1_p0.bas.h5 It will be ignored.
WARNING: There is insufficient data to compute metric: pkmid in the file blah_s1_p0.bas.h5 It will be ignored.
loading 33396 alignments for movie 1
ERROR, the query sequence does not match the aligned query sequence.
HoleNumber: 46311, MovieName: blah_s1_p0, ReadIndex: 46311, qStart: 17, qEnd: 2852

**pag** · 06-22-2013, 02:50 AM

after reconfiguring SWIG and several other items, I get
"Input CmpH5 file must be sorted." when running Quiver

I cannot seem to locate a sort script. Also, is it supposed to sort by quality, start position on the map, or by what criteria?

**GenoMax** · 06-22-2013, 03:51 AM

Originally posted by dalexander View Post

I guess there isn't messaging functionality in this forum

Look at the top right where you see your name "dalexander". "private message" link is right below it.

People can use your forum name to email you privately.

If you want to take out your email address feel free.

**dalexander** · 06-23-2013, 08:45 AM

Hi Matt, so it looks like you've got quiver and SMRTanalysis installed andyour current issue is sorting the cmp.h5.

In your SMRTanalysis install, there is a program called cmph5tools.py. If you do

$ cmph5tools.py sort my.cmp.h5

it should produce a file out.cmp.h5 that is sorted.

Dave

**pag** · 06-23-2013, 12:25 PM

That didn't appear to generate a new cmp.h5, but rather altered the existing one (aligned5; at least I think so based on the timestamp)

I then tried
loadPulses blah.bas.h5 aligned5.cmp.h5
WARNING: There is insufficient data to compute metric: ClassifierQV in the file blah.bas.h5 It will be ignored.
WARNING: There is insufficient data to compute metric: pkmid in the file blah.bas.h5 It will be ignored.

I also tried with switches -metrics QualityValue,ClassifierQV,InsertionQV,MergeQV,DeletionQV,SubstitutionQV,ClassifierQV

When trying to run Quiver, I get:
2013-06-23 13:20:26,158 [WARNING] This .cmp.h5 file lacks some of the QV data tracks that are required for optimal performance of the Quiver algorithm. For optimal results use the ResequencingQVs workflow in SMRTPortal with bas.h5 files from an instrument using software version 1.3.1 or later.

**pag** · 06-23-2013, 02:41 PM

looks like I can get quiver to run if I specify every single track...I'm not sure which are the important ones other than the QV ones.

however, it runs in about 2 seconds flat, which seems like it probably didn't do anything other than generate the fasta file. I had been specifying my consensus in the parameters (-r), not my entire assembly. As my assembly was performed in MIRA and consed on PacBioToCA error-corrected filtered reads (and renamed), what file should be used?

Or did one of the other programs already perform the polishing and quiver just put it in fasta format?

**dalexander** · 06-23-2013, 02:55 PM

Try running quiver with -v to see what it is doing.

The QVs you need for quiver are listed here:

Build software better, together

https://github.com/PacificBiosciences/GenomicConsensus/blob/master/doc/QuiverFAQ.rst#what-are-these-qvs-that-quiver-uses

GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

In particular, it looks like you are missing the SubstitutionTag.

**pag** · 06-23-2013, 04:12 PM

quiver -v -j4 cp6.cmp.h5 -r consensus.fasta -o quiver-assembly.fasta
2013-06-23 17:03:40,711 [INFO] h5py version: 2.0.1
2013-06-23 17:03:40,711 [INFO] hdf5 version: 1.8.4
2013-06-23 17:03:40,714 [INFO] ConsensusCore version: 0.6.1
2013-06-23 17:03:40,714 [INFO] Starting.
2013-06-23 17:03:41,224 [INFO] Peeking at CmpH5 file cp6.cmp.h5
2013-06-23 17:03:41,225 [INFO] Input CmpH5 data: numAlnHits=28557
2013-06-23 17:03:41,225 [INFO] Loading reference
2013-06-23 17:03:41,437 [INFO] Loaded 1 of 1 reference groups from consensus.fasta
2013-06-23 17:03:41,491 [INFO] Using Quiver parameter set unknown.AllQVsModel
2013-06-23 17:03:41,506 [INFO] Available CPUs: 4
2013-06-23 17:03:41,506 [INFO] Requested workers: 4
2013-06-23 17:03:41,506 [INFO] Parallel Mode: Process
2013-06-23 17:03:41,560 [INFO] Launched compute slaves.
2013-06-23 17:03:41,563 [INFO] Launched collector slave.
2013-06-23 17:03:41,648 [INFO] Quiver operating on (consensus fasta):14-112
2013-06-23 17:03:41,649 [INFO] Quiver operating on (consensus fasta):928102-928132
2013-06-23 17:03:41,651 [INFO] Usable coverage in (consensus fasta):14-112: [(14, 107)]
2013-06-23 17:03:41,651 [INFO] Quiver operating on (consensus fasta):1121891-1121935
2013-06-23 17:03:41,652 [INFO] Usable coverage in (consensus fasta):928102-928132: [(928107, 928127)]
2013-06-23 17:03:41,664 [INFO] Usable coverage in (consensus fasta):1121891-1121935: [(1121896, 1121919), (1121919, 1121935)]
2013-06-23 17:03:41,664 [INFO] Quiver operating on (consensus fasta):1121949-1121964
2013-06-23 17:03:41,668 [INFO] Usable coverage in (consensus fasta):1121949-1121964: [(1121949, 1121962)]
2013-06-23 17:03:44,407 [INFO] Quiver operating on (consensus fasta):1198804-1198837
2013-06-23 17:03:44,415 [INFO] Usable coverage in (consensus fasta):1198804-1198837: [(1198809, 1198829)]
2013-06-23 17:03:47,709 [INFO] Analysis completed.
2013-06-23 17:03:47,709 [INFO] Output files completed.
2013-06-23 17:03:48,625 [INFO] Finished.

what does this mean? only very restricted regions were usably covered? Or these were the only regions it saw fit to correct? Edit: setting -X 5 or --minCoverage 5 doesn't seem to make a difference.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 19 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Systematic errors in PacBio?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News