Seqanswers Leaderboard Ad

**rhall** · 08-05-2014, 10:12 AM

Homopolymers errors such as this should not effect mapping, they are present because quiver has not had a chance to correct them, they are the most common error in the consensus called during assembly.

1) what causes quiver to produce a consensus with areas of lowercase bases, even if subreads unambiguously map to these regions without dips or spikes in coverage? Did these areas not get polished at all--and if so, how can I determine consensus accuracy for the genome if indels might still be present in lowercase regions?

By default a lowercase base indicates no consensus call has been made and the base is just the reference sequence, this occurs when a 500bp region does not have any unambiguously mapped spanning reads. I'm don't know how this would happen if there is coverage and high mapqv. One possibility is that the mapping all terminate at a single point, i.e. the reads have no evidence for the reference to be joined at that position, the coverage and mapQV could look good, but the alignment should show the blunt join. Is the lowercase region in the middle of the sequence? before circularization the end of the sequence will be lowercase due to the overlap and ambiguous mapping.
If the lowercase is due to ambiguous mapping, most likely, then the option to ignore ambiguous mapped reads can be turned off in the protocol settings.

2) do lowercase bases in the consensus.fasta output factor into the calculation of consensus accuracy?

The consensus (concordance) accuracy is calculated between the reference and the consensus, but I'm not sure if lowercase bases in the consensus sequence are ignored, I'll have to check.

3) does quiver ignore lowercase bases in a reference that is uploaded for the RS_Resequencing pipeline--meaning should I make sure all bases are uppercase before uploading a reference to SMRTportal?

No lowercase bases are not treated any differently when uploaded as a reference.

**jbadalam** · 08-05-2014, 12:37 PM

The lowercase regions all occur within the sequence. One HGAP run produced a single contig with self-overlapping ends which I removed before uploading to SMRT Portal as a reference for running subsequent passes through quiver. I iterated a few times to get the consensus QV >50 and after each pass the same regions came back lowercase in the consensus, even if I changed to uppercase before uploading the reference.

The lowercase regions occur throughout the assembly, probably 10-15 or so per megabase. All of them are spanned by unambiguously mapped reads, most of which are significantly longer than the lowercase regions themselves, so I highly doubt these are misassemblies, especially since BLASTing them with flanking uppercase sequence hits the published reference genome across the entire query.

Is it possible that the chemistry information provided in the raw data is different from what was actually used during the sequencing run? I know that quiver behaves differently with different chemistries.

**rhall** · 08-05-2014, 12:48 PM

The lowercase is independent of chemistry, and will only occur if the region does not have highQV spanning reads. Try running with allow ambiguous mapped reads to see if the lowercase bases disappear. Are the lowercase regions always 500bp?

**jbadalam** · 08-05-2014, 01:29 PM

Thanks, I'll try allowing ambiguously mapped reads and post back.

I didn't count the lengths of every occurrence, only the first 10 or so, and yes, they are all exactly 500 bp.

**jbadalam** · 08-12-2014, 01:51 PM

I've looked into this further, and what appears to be happening is that quiver is calling variants (quite a few of them on the first pass, as expected), but not capturing all of these variants in the output consensus.fasta file. For example, if quiver calls 297 variants (284 insertions, 13 deletions), I would assume the total length of the consensus sequence should increase by 271 bp relative to the reference - is this correct? On another pass I actually noticed the consensus get smaller despite ~50 net insertions being called. This has happened for multiple genomes and so I don't think that one particular dataset is causing this odd behavior.

Subsequent quiver passes (taking the consensus from one as the input to the next) show a similar trend in that the consensus sequence length does not change as predicted by the variants called. And while the 500-bp lowercase regions keep showing up, their position and number appear to change randomly. Some are 1000 bp and in total they comprise about 1% of the genome.

I'm wondering what might be causing quiver to defer to the reference for consensus calling at these regions given that their position along the genome seems to be arbitrary with each pass. I haven't checked each lowercase region but for the ones I have I see plenty of coverage in spanning reads and no breakpoints at the lowercase junction.

For one genome, essentially all of the variants that quiver called appear to be correct (backed up by short reads), but in another, it seems to be missing 2/3 variants (if the short read data is to be trusted)--and only some, but not all, of the variants that do get called are showing up in the consensus.

**rhall** · 08-12-2014, 03:46 PM

Variants can be multiple bases.
The 500 multiple lowercase regions are due to low mapQV reads, probably due to repetitive elements. These change in multiple rounds because you are not converging on the correct consensus.

**jbadalam** · 08-13-2014, 05:59 PM

Unfortunately that doesn't really explain everything I'm seeing. On the first quiver pass, there were 365 total variants, of which 17 were single base deletions, and 346 were single base insertions. The other two variants were multiple base insertions totaling 5 bp. The reference length was 3819884 bp, and the quiver consensus was 3820109 bp. So while the consensus did expand by 225 bp, shouldn't it have been 334 bp longer? Maybe I'm missing something?

The 500-bp lowercase regions are showing up where I have >130x spanning coverage of high map QV reads. Some might have 1 or 2 low map QV reads but this is also true for other parts of the genome that aren't lowercase.

The positions of repeat sequences (>500 bp, >85% identity, determined with nucmer) have no correlation with the location of lowercase regions. The repeats are also spanned by high map QV reads.

If I re-run quiver (taking the consensus as the new reference), the 500-bp lowercase regions move to other non-repetitive regions of the genome that have similar coverage of spanning high map QV reads. At this point I'm lost as to whether I might benefit from having more data for polishing or if this is a bug.

**rhall** · 08-13-2014, 08:07 PM

To double check, this is within SMRT Portal? What version are you using? If you don't mind sharing your data, I'd gladly take a look to see if we can get to the bottom of this strange behavior. I'll send you a message with my email.

**jbadalam** · 08-14-2014, 06:13 AM

Thanks! Will send my data over to you.

Yes, this is within SMRT Portal. Since I work only with microbial genomes we are running it locally on one standalone machine (8 cores, 32 Gb RAM). I am using:
SMRTAnalysis version v2.2.0.p3 build 137015
Daemon version v2.2.0 build 132105
SMRTpipe version v2.2.0 build 132739
SMRT Portal version v2.2.0 build 133335
SMRT View version v2.2.0 build 132578

**rhall** · 08-25-2014, 04:41 PM

Keeping the thread up to date.
The problems with the consensus resulted from quiver not converging due to low quality data, likely resulting from the overloading of sample on SMRT Cells, http://www.pacb.com/samplenet/SMRTCe...mendations.pdf. Updates have been made to the quiver code that result in better performance on data such as this, https://github.com/PacificBioscience...ster/CHANGELOG.

**Marcela Uliano** · 03-21-2016, 06:02 AM

Hey guys,

I have a related question, although it's different case, maybe you will be able to give me some inputs.
So, I have a hybrid assembly (illumina + pacbio subreads - DBG2OCL) for an eukaryotic genome, and now I'm in doubt if I should quiver this assembly or not.
Once, as I read, quiver calls consensus based on the pacbio quality of the aligned PacBio, what it would do to the regions that are covered only by my Illumina contigs?
Does it make sense to run quiver in a hybrid assembly at all?

Thanks a lot for the help! =)

**GenoMax** · 03-21-2016, 08:40 AM

Originally posted by Marcela Uliano View Post

Hey guys,

I have a related question, although it's different case, maybe you will be able to give me some inputs.
So, I have a hybrid assembly (illumina + pacbio subreads - DBG2OCL) for an eukaryotic genome, and now I'm in doubt if I should quiver this assembly or not.
Once, as I read, quiver calls consensus based on the pacbio quality of the aligned PacBio, what it would do to the regions that are covered only by my Illumina contigs?
Does it make sense to run quiver in a hybrid assembly at all?

Thanks a lot for the help! =)

For reference cross-posted: https://www.biostars.org/p/182647/

**feixue1039** · 04-16-2017, 05:01 AM

I find that after Quiver, sequence length of each contig increases ~0.12%. If you are looking at the the mapping result with IGV and load the reference sequence (before polish analysis) and pbalign output bam file as input, you may need to take the coordinate shift into account.

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, Today, 07:03 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 31 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 41 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 33 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

lowercase output from quiver

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News