Seqanswers Leaderboard Ad

**GenoMax** · 03-09-2016, 10:03 AM

You should submit the metadata.xml file because as I remember it is difficult (or impossible) to recreate and that file is needed to import/analyze data in SMRTportal.

The *.h5 files you submit become available as is under the "Download" tab so people can get at the raw data. At least that is how things work in SRA.

**maubp** · 03-09-2016, 10:32 AM

When you say *.xml do you mean all of them (at both levels of the directory hierarchy)?

**GenoMax** · 03-09-2016, 10:39 AM

I meant to specifically say metadata.xml (details of the files are described here: https://github.com/PacificBioscience...rvice-provider)

**maubp** · 03-09-2016, 12:47 PM

Thanks - ENA are not clear but suspect you're right and they want the *.metadata.xml - and perhaps the *.sts.xml files too (summary statistics).

**maubp** · 03-10-2016, 01:20 AM

I've emailed the EBI DataSubs team, and will post back once I know the answer.

**maubp** · 03-10-2016, 09:06 AM

The DataSubs team replied for each PacBio SMRT cell run they want three *.bax.h5 files, one *.bas.h5 file, and one *.metadata.xml file.

i.e. Something like this for the PacBio example above (using made up checksum values):

Code:

$ cat run_1_manifest.all
7b382592c46607ec0348bf969ed8b01f m140415_143853_42175_c100635972550000001823121909121417_s1_p0.1.bax.h5
2b912a574ad5e264f781ca495b0b5908 m140415_143853_42175_c100635972550000001823121909121417_s1_p0.2.bax.h5
6c7c66e4e2aa1e5516f7d7c16b0ef8b2 m140415_143853_42175_c100635972550000001823121909121417_s1_p0.3.bax.h5
3f6067c02aa643eb5d609197defc3baa m140415_143853_42175_c100635972550000001823121909121417_s1_p0.bas.h5
c12eafa8bf1cc3c1548c1625d9edad7c m140415_143853_42175_c100635972550000001823121909121417_s1_p0.metadata.xml

I've asked if I can share the full email here.

Update:

Jeena at the EBI Data Submissions team kindly allowed me to post her advice - note the screenshot shows the expected MD5 based manifest file on which I based the example above:

Dear Peter,

A Pac Bio run normally consists of 5 files. They are 3 bax.h5, 1 bas.h5, and the equally important metadata.xml file. If you use Webin you must create a manifest file as explained here:

Error: 404 | EMBL-EBI

http://www.ebi.ac.uk/~mrosello/FAQs/graphics/pac_bio_run.png

If you want to reference each file separately per run the please use the REST submission service:

ENA Browser

http://www.ebi.ac.uk/ena/submit/programmatic-submission

ENA Browser

Here is a template for a pac bio run.

Error: 404 | EMBL-EBI

http://www.ebi.ac.uk/~mrosello/xml_templates/pac_bio/run.xml

Please let us know if you require more help. My colleague Marc is currently away but will be back in the office tomorrow and will be able to provide further help if needed.

Kind regards,
Jeena

**GenoMax** · 03-10-2016, 09:11 AM

That makes sense.

Are you also submitting fastq/fasta files that went into your analysis (since they would be generated after some filtering etc using SMRTportal or command line tools)?

Do you know how ENA makes original files available for PacBio? On the page where they have fastq files?

**maubp** · 03-10-2016, 09:15 AM

I'm quite willing to, but unsure how they'd want that - I could upload the processed FASTQ as another run?

**GenoMax** · 03-10-2016, 09:18 AM

Originally posted by maubp View Post

I'm quite willing to, but unsure how they'd want that - I could upload the processed FASTQ as another run?

I wonder if NCBI SRA handles things the other way around. Submit "fastq" as main record and attach original *.h5 (which become available via the "Download" tab). As is the *.h5 files are not immediately useful unless they are going to be re-processed by the person downloading them (not everyone would want to or have the means to do that).

**maubp** · 03-10-2016, 09:24 AM

In this case from two SMRT cells I have one FASTQ file of filtered subreads used in the analysis, but I can easily split it up into one FASTQ file per run based on the read names.

**GenoMax** · 03-10-2016, 09:29 AM

Perhaps this is another question for ENA datasub team.

Having two separate records (one for fastq and other for *.h5 files) may be confusing. Having both in one record makes more sense but sounds like there is no direct way of doing that?

Edit: Unless ENA SRA is going to convert the *.h5 files and make fastq's from them. Again they would have to confirm that.

**maubp** · 03-11-2016, 06:34 AM

Reply from the ENA DataSubs team: Please submit the fastq or the native package but not both.

It looks like our first SMRT cell raw data has uploaded OK

**GenoMax** · 03-11-2016, 07:07 AM

You can make the SMRTportal (or command line) settings used to generate the filtered fastq files available in the methods/supplemental materials.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Uploading PacBio raw data to ENA SRA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News