Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is it possible to convert FASTQ/FASTA files in HDF5 format?

    Hi everyone!

    I'm working on a method to create multiplexed PACBIO sequencing libraries. I was able to demultiplex my PACBIO reads, but I'm currently stuck at the error correction step. I tried pacbiotoCA, but I have a lot of problems to run it. So I decided to try the P_ErrorCorrection Module from the SMRT analysis suite but it uses only HDF5 files. My problem is that I don't have those HDF5 files because I used FASTQ files for my demultiplexing. So I was wondering if there is a way to create an HDF5 file from a FASTA or a FASTQ file so I can use it with P_ErrorCorrection Module.

    Vince

  • #2
    There is currently not an easy way to generate a PacBio hdf file from a fastq or fasta. pacBioToCA is going to be your best bet. What specific problems have you had with it?

    Comment


    • #3
      There is currently not an easy way to generate a PacBio hdf file from a fastq or fasta. pacBioToCA is going to be your best bet. What specific problems have you had with it?

      Comment


      • #4
        I'm not sure what the problem is in fact. The program seems to begin properly but suddenly stops and I get this error message in the log file:

        runCA failed.

        ----------------------------------------
        Stack trace:

        at runCA line 1238
        main::caFailure('failed to build the obt store', '/home/...') called at runCA line 3667
        main:verlapTrim() called at runCA line 5877

        ----------------------------------------
        Last few lines of the relevant log file (/home/temptestL1/0-overlaptrim/asm.obtStore.err):

        Scanning overlap files to count the number of overlaps.
        WARNING: No overlaps found (or file not found) in '/home/temptestL1/0-overlaptrim-overlap/001/000001.ovb.gz'.
        Found 0.000 million overlaps.
        overlapStore: overlapStore_build.C:84: uint64 computeIIDperBucket(uint32, uint64, uint32, uint32, char**): Assertion `numOverlaps > 0' failed.

        ----------------------------------------
        Failure message:

        failed to build the obt store

        ----------------------------------------END Wed Aug 29 12:50:48 2012 (95 seconds)
        Failed to execute runCA -s pacbio.spec -p asm -d temptestL1 ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1 obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm stopAfter=overlapper

        Comment


        • #5
          Hmm. First thought is to check whether /home/temptestL1 exists and is writeable. Normally a tmp dir wouldn't go in /home.

          Also, if your PacBio data happens to be from v1.3.0, the fastq files need to be processed to remove the extra line breaks. See here:

          Download Whole-Genome Shotgun Assembler for free. Celera Assembler (CA) is a whole-genome shotgun (WGS) assembler for the reconstruction of genomic DNA sequence from WGS sequencing data.


          Another key point about pacBioToCA is that the short reads can't be too short. The min recommended length is usually 100 bp, and the longer the better. That's why 454 or PacBio CCS reads do the best job. If you're using really short Illumina, you may not get any overlaps either.

          Those are my 3 ideas. Hope you're able to hunt down the problem.

          Comment


          • #6
            Thanks a lot! you found my problem!

            The length of the reads was the problem, I tried to run pacbiotoca with another set of longer reads and it worked!

            Thanks again!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X