SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert from sff to fasta or fastq shuang Bioinformatics 12 05-15-2014 08:09 AM
What tools can convert sequence file from tabular format to fasta format? yangjianhunt Bioinformatics 5 03-26-2014 01:48 PM
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 01:26 AM
MAQ-convert fasta to fastq rururara Bioinformatics 0 12-07-2011 11:06 PM
Convert Maq's out put to fasta/fastq? bea Bioinformatics 5 06-04-2009 01:18 AM

Reply
 
Thread Tools
Old 08-29-2012, 09:29 AM   #1
vincebaby6
Junior Member
 
Location: Sherbrooke

Join Date: May 2011
Posts: 8
Post Is it possible to convert FASTQ/FASTA files in HDF5 format?

Hi everyone!

I'm working on a method to create multiplexed PACBIO sequencing libraries. I was able to demultiplex my PACBIO reads, but I'm currently stuck at the error correction step. I tried pacbiotoCA, but I have a lot of problems to run it. So I decided to try the P_ErrorCorrection Module from the SMRT analysis suite but it uses only HDF5 files. My problem is that I don't have those HDF5 files because I used FASTQ files for my demultiplexing. So I was wondering if there is a way to create an HDF5 file from a FASTA or a FASTQ file so I can use it with P_ErrorCorrection Module.

Vince
vincebaby6 is offline   Reply With Quote
Old 08-29-2012, 10:46 AM   #2
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

There is currently not an easy way to generate a PacBio hdf file from a fastq or fasta. pacBioToCA is going to be your best bet. What specific problems have you had with it?
jbingham is offline   Reply With Quote
Old 08-29-2012, 10:54 AM   #3
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

There is currently not an easy way to generate a PacBio hdf file from a fastq or fasta. pacBioToCA is going to be your best bet. What specific problems have you had with it?
jbingham is offline   Reply With Quote
Old 08-29-2012, 12:16 PM   #4
vincebaby6
Junior Member
 
Location: Sherbrooke

Join Date: May 2011
Posts: 8
Default

I'm not sure what the problem is in fact. The program seems to begin properly but suddenly stops and I get this error message in the log file:

Quote:
runCA failed.

----------------------------------------
Stack trace:

at runCA line 1238
main::caFailure('failed to build the obt store', '/home/...') called at runCA line 3667
main:verlapTrim() called at runCA line 5877

----------------------------------------
Last few lines of the relevant log file (/home/temptestL1/0-overlaptrim/asm.obtStore.err):

Scanning overlap files to count the number of overlaps.
WARNING: No overlaps found (or file not found) in '/home/temptestL1/0-overlaptrim-overlap/001/000001.ovb.gz'.
Found 0.000 million overlaps.
overlapStore: overlapStore_build.C:84: uint64 computeIIDperBucket(uint32, uint64, uint32, uint32, char**): Assertion `numOverlaps > 0' failed.

----------------------------------------
Failure message:

failed to build the obt store

----------------------------------------END Wed Aug 29 12:50:48 2012 (95 seconds)
Failed to execute runCA -s pacbio.spec -p asm -d temptestL1 ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1 obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm stopAfter=overlapper
vincebaby6 is offline   Reply With Quote
Old 08-29-2012, 05:20 PM   #5
jbingham
Member
 
Location: Silicon Valley

Join Date: Jul 2011
Posts: 24
Default

Hmm. First thought is to check whether /home/temptestL1 exists and is writeable. Normally a tmp dir wouldn't go in /home.

Also, if your PacBio data happens to be from v1.3.0, the fastq files need to be processed to remove the extra line breaks. See here:

http://sourceforge.net/apps/mediawik...Overlaps_Found

Another key point about pacBioToCA is that the short reads can't be too short. The min recommended length is usually 100 bp, and the longer the better. That's why 454 or PacBio CCS reads do the best job. If you're using really short Illumina, you may not get any overlaps either.

Those are my 3 ideas. Hope you're able to hunt down the problem.
jbingham is offline   Reply With Quote
Old 08-30-2012, 06:30 AM   #6
vincebaby6
Junior Member
 
Location: Sherbrooke

Join Date: May 2011
Posts: 8
Default

Thanks a lot! you found my problem!

The length of the reads was the problem, I tried to run pacbiotoca with another set of longer reads and it worked!

Thanks again!
vincebaby6 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO