![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to get the fastq file from bax.h5 pacbio | Fad2012 | Pacific Biosciences | 4 | 12-09-2014 04:03 PM |
Starting with Bioconductor. How to import .fastq files? | buthercup_ch | Bioinformatics | 2 | 10-17-2014 09:10 PM |
Merge two raw data files.fq.gz into one | shis | Bioinformatics | 13 | 04-15-2014 08:34 AM |
How to keep the raw .fastq.gz files for RNASeq data | shirley0818 | RNA Sequencing | 5 | 03-25-2014 10:15 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Edinburgh Join Date: Jan 2013
Posts: 18
|
![]()
Dear all,
I've been sent some raw PacBio data from a collaborator which I'd like to import into SMRT Portal to generate filtered subreads and CCSs for scaffolding. However I'm struggling to understand the file structure for these data, in terms of what is needed for import into SMRT (and quite possibly PacBio data in general). A description of the raw data: For each cell, I have three *.bax.h5 files, three *.subreads.fastq files and one *.metadata.xml. There is no *.bas.h5 file... I'm not sure why not. There is also a directory "error corrected" containing a corrected.fastq and a filtered_subreads.fastq. I'm aware that SMRT requires a certain file structure (e.g., described in this link), and that the bas.h5 file is a pointer to the bax.h5 files. I'd like to know if it's possible to import these data into the SMRT software without the bas.h5, or if it's possible to generate a bas.h5 file from the files I do have? I'm also a little unsure as to how the sequence files corrected.fastq and filtered_subreads.fastq have been generated - I am guessing that filtered_subreads.fastq contains the adapter-trimmed subreads (duh!), and that corrected.fastq contains the CCS sequences, but is there a way to know for certain? Any insights will be greatly appreciated! ![]() |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Hong Kong Join Date: Mar 2010
Posts: 498
|
![]()
I think there should be at least one bas.h5 generated by the machine. But this file probably is not necessary at all to run any kind of analysis for the current gen of chemistry. I only need the three bax.h5 file to run the analysis I needed so far.
Hope this helps. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
@reubennowell: Have you tried to import the SMRTcell(s) into the Portal? It apears that the data has already been partly analyzed.
Link for Wiki for SMRTportal: http://files.pacb.com/software/smrta...MRT_Portal.htm You can find information about various protocols in navigation pane on left. |
![]() |
![]() |
![]() |
#4 |
Member
Location: Edinburgh Join Date: Jan 2013
Posts: 18
|
![]()
Thanks for the replies both.
@GenoMax, yes I tried to import and received a "no SMRT cell data here" message from the Import SMRT Cells page. I agree this data appears to have been partly analysed; rather than trying to forensically determine what has been done I'd rather just reimport and reanalyse myself. But I did get it imported in the end ![]() Code:
data |---Analysis_Results | |---*1.bax.h5 | |---*2.bax.h5 | |---*3.bax.h5 | |---*1.bax.h5 | |---*2.bax.h5 | |---*3.bax.h5 | etc. |--- *.metadata.xml |--- *.metadata.xml etc. As a side question, are there alternative programs apart from the SMRT software that allow you to go from a bax.h5 to filtered, trimmed and condensed (ie CCS) fastq? Many thanks again. |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: San Francisco Join Date: Aug 2012
Posts: 322
|
![]()
The SMRT software is by far the easiest way to generate filtered data, and the only way to generate CCS reads, but a lot of the components are developed individually and are available as development releases on github, e.g. https://github.com/PacificBiosciences/pbcore
|
![]() |
![]() |
![]() |
Tags |
ccs, fastq, pacbio, smrt analysis, smrt portal |
Thread Tools | |
|
|