Seqanswers Leaderboard Ad

**GenoMax** · 05-07-2015, 06:22 AM

By your own admission the illumina data is not that good but what happens when you try to align illumina reads to original pacbio assembly? Can you provide some stats?

**DrSpace** · 05-07-2015, 06:45 AM

We did aggressively quality trim the reads so I was hoping that it wouldn't bring down the quality of the PacBio-only assembly this much. PBcR does just use Illumina data for PacBio error correction correct? Would this indicate that the original PacBio-only assembly was not completely accurate? Or should the Illumina data, even aggressively quality-trimmed, just be ignored?

Here are alignment results against the PacBio assembly:

Code:

2056898 reads; of these:
  2056898 (100.00%) were paired; of these:
    155844 (7.58%) aligned concordantly 0 times
    1826141 (88.78%) aligned concordantly exactly 1 time
    74913 (3.64%) aligned concordantly >1 times
    ----
    155844 pairs aligned concordantly 0 times; of these:
      32818 (21.06%) aligned discordantly 1 time
    ----
    123026 pairs aligned 0 times concordantly or discordantly; of these:
      246052 mates make up the pairs; of these:
        137416 (55.85%) aligned 0 times
        91214 (37.07%) aligned exactly 1 time
        17422 (7.08%) aligned >1 times
96.66% overall alignment rate

**GenoMax** · 05-07-2015, 07:17 AM

Originally posted by DrSpace View Post

We did aggressively quality trim the reads so I was hoping that it wouldn't bring down the quality of the PacBio-only assembly this much. PBcR does just use Illumina data for PacBio error correction correct? Would this indicate that the original PacBio-only assembly was not completely accurate? Or should the Illumina data, even aggressively quality-trimmed, just be ignored?

I was thinking that your PacBio assembly is of reasonably good quality. Wonder if the illumina data is actually causing a problem.

**DrSpace** · 05-07-2015, 07:31 AM

I thought that, in general, one should not use an assembly that is just PacBio data due to its higher error rate. Maybe the self-correction steps used in most PacBio assemblies make up for this more than I'm imagining they do. Forgive my ignorance on the topic, I usually primarily work with Illumina data. I suppose I should try using PBcR with the PacBio data by itself and see what happens as well.

**GenoMax** · 05-07-2015, 09:36 AM

HGAP handles all of that internally: https://github.com/PacificBioscience...-SMRT-Analysis

I am not sure if PBcR would improve things but if you have the time you could try it.

**flxlex** · 05-07-2015, 10:45 PM

Originally posted by DrSpace View Post

Before obtaining the Illumina data, our collaborators assembled the PacBio data by itself with HGAP3 into one contig at ~6.7MB.

(this answers some of the previous posts).

Originally posted by DrSpace View Post

But with PacBio's high error rate, we wanted to correct with our Illumina data.

You already seem to have a very good assembly. I recommend trying Pilon http://www.broadinstitute.org/software/pilon/ to polish your assembly and remove any leftover errors.

**DrSpace** · 05-11-2015, 05:45 AM

Originally posted by flxlex View Post

(this answers some of the previous posts).

You already seem to have a very good assembly. I recommend trying Pilon http://www.broadinstitute.org/software/pilon/ to polish your assembly and remove any leftover errors.

Thanks for the tip, I'll give that a try.

**fanli** · 06-11-2015, 12:17 PM

We recently came across this same issue. The "PacBio only" assembly turned out to be far superior to any hybrid scheme (using either SPAdes or PBcR).

We tried using Pilon to polish the PacBio assemblies and got some interesting results. Looking at the Pilon *.changes files, there are lots of G/C single insertions (from the MiSeq perspective relative to PacBio). Is this a known behavior? Forgive me, this is my first experience w/ PacBio data.

Thanks,
Fan

Attached Files

Pilon.21699.changes.gz (626 Bytes, 17 views)

**colindaven** · 06-11-2015, 12:40 PM

Pacbio gives some excellent assemblies as you all have seen. Actually I just use Pacbio data alone in PBcR and it gives excellent results - better -in terms of less contigs- than HGAP3 in many cases.

You could try rerunning quiver on the assembly to correct any further errors - there are good docs on this on the Pacbio site.

If you insist upon using the Illumina data (for correction, not for assembly) why not just align classically and call SNPs / indel differences ? Then eyeball the differences.

Pilon is also a good choice.

If this is a bacterium you can also do a draft annotation and check for frequent frameshifts caused by indels.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

PBcR hybrid assembly of 6.5MB genome with PacBio and MiSeq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News