SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DeNovo assembly using pacBio data krittika.sasmal Pacific Biosciences 50 06-05-2013 10:56 AM
De novo assembly of PacBio with short Illumina data boetsie Pacific Biosciences 1 10-06-2012 01:35 PM
Application of PacBio based cDNA data in RNA-Seq analysis apratap Bioinformatics 5 05-25-2012 10:31 AM
hybrid assembly for PacBio and Illumina data laelaps Bioinformatics 1 05-01-2012 06:35 AM
PacBioEDA: Exploratory Data Analysis tools for PacBio RS SillyPoint Pacific Biosciences 1 03-14-2012 01:21 PM

Reply
 
Thread Tools
Old 07-29-2013, 12:55 PM   #1
horvathdp
Member
 
Location: Fargo

Join Date: Dec 2011
Posts: 66
Default Novice looking to use PacBio data

Hey all,

I am a complete novice with only the barest understanding of working command line interfaces (currently using resources at iPlant). I have a bunch of PacBio sequences and about 30X coverage with illumina (100 base pair end). I'd like to be able to correct the PacBio sequences with my illumina reads. Anyone care to give me a step by step? Alternatively, I'd happily give authorship rights to anyone who wants to help me with my correction when I publish this work. I am dealing with a non-model invasive weed species (leafy spurge- Euphorbia esula) - mostlytrying to assemble gene space with promoters to help leverage a bunch of transcriptomics (microarray) data we have generated over the last 5 years.
horvathdp is offline   Reply With Quote
Old 07-29-2013, 11:32 PM   #2
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Hello,

Seems like an interesting problem. Here is what you need to do.

(i) Please draw a k-mer distribution of the Illumina reads. I think your Illumina coverage (30) is slightly on the lower side, but we do not know until we see the chart. You can draw k-mer distribution by using SOAPdenovo, DSK (http://minia.genouest.org/dsk/) and many other k-mer counting packages.

http://www.homolog.us/blogs/blog/201...unting-k-mers/


(ii) Use any de Bruijn graph-based assembler to assemble the Illumina reads first up to contig level. My favorites are SOAPdenovo (because it can handle PE) and Minia (http://minia.genouest.org/) for being light-weight. Ideally you need to do the assembly at multiple k-mer values.


(iii) Once you have the the Illumina reads assembled, use BLASR (a tool distributed by PacBIO) to map the Illumina contigs on to large PacBio reads.

Only after we have results of this step, we can talk about error correction of PacBio.

Also check the following commentary and discussions in the comment sections.

http://www.homolog.us/blogs/blog/201...ly-thereafter/


If all those are too complicated, please email me at samanta at homolog.us, and we can discuss further.
__________________
http://homolog.us

Last edited by samanta; 07-30-2013 at 09:40 AM. Reason: error in text
samanta is offline   Reply With Quote
Old 07-30-2013, 09:35 AM   #3
horvathdp
Member
 
Location: Fargo

Join Date: Dec 2011
Posts: 66
Default

Thanks for the reply! So, the readme file is really sparce, and I could not find a link to a manual (even in the associated paper published in BMC). Any chance you have a link to the manual? Also, let me run my process by you just to see if I am on the right track:

open an instance in iPlant atmosphere (ubuntu or linuxbiocloud-32bit?).
run the script:

wget -L "http://minia.genouest.org/dsk/dsk-1.5280.tar.gz"
tar -xzf dsk-1.5280.tar.gz
cd ./dsk
make

From here I am pretty lost. Without a manual, I am not even sure what format my input files need to be in, nor do I have a list of the arguments or the order that they are supposed to be presented. If you had a model script, I’d be most appreciative. From the paper, it seems clear that I need to convert my Fastq files into fasta-no problem there. However, it is unclear if the paired end reads should/could be interlaced, or if I should/could combine my four libraries (2 are have inserts of about 270 bases and two have inserts of about 390 bases). Any thoughts or suggestions?
horvathdp is offline   Reply With Quote
Old 07-30-2013, 08:26 PM   #4
samanta
Senior Member
 
Location: Seattle

Join Date: Feb 2010
Posts: 109
Default

Quote:
Originally Posted by horvathdp View Post
From here I am pretty lost. Without a manual, I am not even sure what format my input files need to be in, nor do I have a list of the arguments or the order that they are supposed to be presented. If you had a model script, Id be most appreciative. From the paper, it seems clear that I need to convert my Fastq files into fasta-no problem there. However, it is unclear if the paired end reads should/could be interlaced, or if I should/could combine my four libraries (2 are have inserts of about 270 bases and two have inserts of about 390 bases). Any thoughts or suggestions?
Oh well.

Please send me an email and I will try to walk you through the steps. Maybe we need to start in a different way.
__________________
http://homolog.us
samanta is offline   Reply With Quote
Old 08-25-2013, 05:13 PM   #5
rchikhi
Member
 
Location: France

Join Date: Jan 2013
Posts: 13
Default

Quote:
Originally Posted by horvathdp View Post
wget -L "http://minia.genouest.org/dsk/dsk-1.5280.tar.gz"
tar -xzf dsk-1.5280.tar.gz
cd ./dsk
make

From here I am pretty lost. Without a manual, I am not even sure what format my input files need to be in, nor do I have a list of the arguments or the order that they are supposed to be presented. If you had a model script, Id be most appreciative. From the paper, it seems clear that I need to convert my Fastq files into fasta-no problem there. However, it is unclear if the paired end reads should/could be interlaced, or if I should/could combine my four libraries (2 are have inserts of about 270 bases and two have inserts of about 390 bases). Any thoughts or suggestions?
Hello,

I regret that you had issues running DSK. You were on the right track though.
If anyone reads this and wonders what the answers to his questions are:
  • The input data needs not be FASTA. The README files provides some guidance:
    Quote:
    * File input can be fasta, fastq, gzipped or not.
    * To pass several files as input : create a file with the list of file names (one per line), and pass this file to dsk
  • Format of paired-end reads (interlaced or not), and whether to combine libraries of different inserts or not: how the reads are paired does not matter, DSK sees the reads as a multiset of k-mers.

However, there are easier ways to correct PacBio reads using Illumina than re-inventing the wheel. There are at least two existing tools, PacBioToCA and LSC:
rchikhi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO