SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
LSC - a fast PacBio long read error correction tool. LSC Bioinformatics 9 08-21-2015 06:06 AM
LSC - a fast PacBio long read error correction tool. LSC Pacific Biosciences 55 02-14-2014 05:34 AM
sequence bias correction in base coverage calculation lher Bioinformatics 0 10-11-2012 05:57 AM
Has anyone used Qamar? - error correction for Sanger/illumina a_crisp Sanger/Dye Terminator 1 06-07-2012 05:40 AM
error correction for RNA-seq reads fhb RNA Sequencing 6 11-18-2010 01:58 PM

Reply
 
Thread Tools
Old 11-14-2012, 02:09 PM   #1
[email protected]
Member
 
Location: Burnaby

Join Date: Sep 2012
Posts: 17
Default pacbio sequence error correction

Hi all,

I have some pacbio long read data, about 10x coverage of a 120M genome. I already have the reference genome. However it is not complete and there are many gaps in it. What I am trying to do is to error correct my pacbio sequence and assemble the genome. Later on I will add more illumina data trying to close the gaps.

My question about he error correction is: Can I use the incomplete reference genome to error correct my pacbio data? My plan is to convert the genome fasta into pacBioToCA required frg format. And then feed my pacbio data and the genome frg data to the correction pipeline to output error corrected data. My concern is : will pacBioToCA accept relatively long genome scalfold data as high identity sequence to correct my pacbio data?

Suggestions and help is greatly appreciatedl

Stuart
zszong@hotmail.com is offline   Reply With Quote
Old 11-14-2012, 06:52 PM   #2
[email protected]
Member
 
Location: Burnaby

Join Date: Sep 2012
Posts: 17
Default

I am not able to figure out how I can use the incomplete reference genome for error correction. It looks like FastaToCA converts fastq file to frg file so that it can be used as high identity sequence for error correction. However, the incomplete genome assembly in fasta file. there is no quality score files can be found. How can I get around this?

many thanks!

Stuart
zszong@hotmail.com is offline   Reply With Quote
Old 11-15-2012, 08:06 AM   #3
mchaisso
Member
 
Location: Seattle, WA

Join Date: Apr 2008
Posts: 84
Default

Perhaps use the pbjelly pipeline to fill gaps? Also, with an appropriate pipeline (quiver: https://github.com/PacificBiosciences/GenomicConsensus) you may not need error correction to call accurate consensus.

cheers,
-mark
mchaisso is offline   Reply With Quote
Old 11-20-2012, 07:24 PM   #4
[email protected]
Member
 
Location: Burnaby

Join Date: Sep 2012
Posts: 17
Default

Thanks for the tips! Mark. It looks like it will take me a while to figure this out. However, It sounds like interesting to me when you say I might not need to do error correction for pacbiodate since it it has 15% error rate.

STuart
zszong@hotmail.com is offline   Reply With Quote
Old 11-22-2012, 06:56 AM   #5
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

Some more tips: if you want to use pacBioToCA, the approach would be to use the raw Illumina data as input to the correction step, not the draft assembly. The advantage of going back to the raw data is you may be able to correct assembly errors. The disadvantage is it takes longer to run.

If you want to keep the assembly as is, you can install SMRT Analysis and use AHA (a hybrid assembler) to scaffold it, provided your the genome is less than about 200 MB. For larger genomes, or to really focus on the gap-filling, you can use pbjelly.

Finally, the "no error correction" suggestion refers to the new algorithm HGAp: http://www.pacbiodevnet.com/hgap. You'll need more PacBio coverage to go that route. The benefit is you may be able to close more gaps and get a final result that's potentially as accurate as Sanger finishing.
jbingham is offline   Reply With Quote
Old 11-22-2012, 08:17 AM   #6
[email protected]
Member
 
Location: Burnaby

Join Date: Sep 2012
Posts: 17
Default

Thanks for your tips! jbingham. I am in the process of generating short illumina data for the error correction. I think I don't have enough coverage to try the new algorithm since my pacbio data only gives 3-4 times coverage when look into those data more carefully. The most majority of them are less than 500bp and 1000bp. Longest read is 13kb. I will post my process later.

Thanks again to Winsettz and jbingham for helping out here!

Stuart
zszong@hotmail.com is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO