submitting data to SRA - SEQanswers

You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion

X

newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#1

submitting data to SRA

02-11-2013, 12:51 PM

Hi,
I am trying to submit a 16s rRNA reads from Illumina on SRA. I have reached to level where it is asking me the following things:
Flowcell, Lane, Filename, md5checksum.

I have the information, but I have some other samples in the same lane that does not belong to me. I am wondering how should I submit the file which have other data also in addition to mine.
The demultiplexed file which I have is in fasta format, so I don't know how to deal with this.
Please help!!!
Tags: None
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7142
- Share
- Tweet
#2

02-11-2013, 01:03 PM

I am not sure why your de-multiplexed files are in fasta format (did you never get fastq format files)? Did these samples have "in-line" barcodes (i.e. custom/home-brew multiplex) and were de-multiplexed outside of illumina casava pipeline?

There is no point in submitting data that does not belong to your study. Looks like you are going to have to go back and do some parsing/re-creating the sample file(s) that you need for submission.

Last edited by GenoMax; 02-11-2013, 01:10 PM.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#3

02-11-2013, 01:13 PM

Thanks GenoMax for looking into my problem.
I got demultiplexed file which looks like this:
>R.1_00001
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
>R.2_00001
TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
>R.1_00002
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

I got two files from here:
one containing the reads and other the barcode, the files were in fastq format.

Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.
Comment
Kennels

Senior Member

Join Date: Feb 2011

Posts: 149
- Share
- Tweet
#4

02-11-2013, 05:50 PM

Originally posted by newBioinfo View Post

Thanks GenoMax for looking into my problem.
I got demultiplexed file which looks like this:
>R.1_00001
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
>R.2_00001
TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
>R.1_00002
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

I got two files from here:
one containing the reads and other the barcode, the files were in fastq format.

Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

If your original files you downloaded were in fastq format, then you need to use a script that enables demultiplexing and also outputs in fastq format. What script/program did you use to demultiplex? The sequencing facility should have done this for you with the Illumina pipeline as mentioned above.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#5

02-11-2013, 06:27 PM

Thanks Kennels,
The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?

Thanks!!!
Comment
Kennels

Senior Member

Join Date: Feb 2011

Posts: 149
- Share
- Tweet
#6

02-11-2013, 06:53 PM

Originally posted by newBioinfo View Post

Thanks Kennels,
The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?

Thanks!!!

If your sequencing facility was able to demultiplex it, then they should also be able to produce the fastq format for you. Can't you ask them to do it again?

You could try fastx toolkit (barcode splitter), or Reaper, but a general search on this forum or google should provide you more choices.
If you are not very familiar with command line, you could try Galaxy: https://main.g2.bx.psu.edu/ , use the barcode splitter tool under NGS manipulation on the left panel.

Good luck.

Last edited by Kennels; 02-11-2013, 06:56 PM.
Comment
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7142
- Share
- Tweet
#7

02-12-2013, 04:32 AM

Originally posted by newBioinfo View Post

Thanks GenoMax for looking into my problem.
I got demultiplexed file which looks like this:
>R.1_00001
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT

This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

This is slightly confusing. So it sounds like you are saying that you did receive a "fastq" format file that had someone else's data (along with yours). You then de-mumtiplexed the data from this original fastq file.

Originally posted by newBioinfo View Post

I got two files from here:
one containing the reads and other the barcode, the files were in fastq format.

What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?

Originally posted by newBioinfo View Post

Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

It may be clear once you answer the above two questions but in any case you are going to have to go back to the original fastq file that you received from your sequencing facility to create the files you need to submit to SRA.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#8

02-12-2013, 08:01 AM

Thanks GenoMax,
I did get the original file from the facility but as I was new to the field I asked them to demultiplex it for me and got the file I showed above. So, now I have both the files but while submitting to SRA I need fastq file.
I think they used their own program to demultiplex it.

I didn't understand what you mean by this, can you please explain it to me
"""What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.

Thanks for help!!!
Comment
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7142
- Share
- Tweet
#9

02-12-2013, 09:57 AM

Originally posted by newBioinfo View Post

I didn't understand what you mean by this, can you please explain it to me
"""What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.

Thanks for help!!!

I was asking what software was used for doing the de-multiplexing. But it sounds like this was done by the sequencing facility for you which resulted in the plain fasta file you have.

Did you use standard illumina tag protocol (where the tag reads are not part of the actual sequence but are rather done as a separate read) or were the "tags" incorporated within the actual sequence? In case you had used illumina protocol then you would not have a separate barcode file (since you do I am not sure what exactly you did for multiplexing).

Either you (or someone who would know how) may indeed have to write some code to parse out data for your sample(s) from the original fastq file if you did not use standard illumina multiplex protocol. Perhaps you can ask the facility to split the fastq file and give you your part of the data.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#10

02-12-2013, 02:09 PM

Thanks GenoMax,
I contacted the facility and they have provided me the data in fastq files.
Thanks for all the help.
Comment

Previous template Next

Advancing Precision Medicine for Rare Diseases in Children

by seqadmin

Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
- Channel: Articles
12-16-2024, 07:57 AM
Recent Advances in Sequencing Technologies

by seqadmin

Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...
- Channel: Articles
12-02-2024, 01:49 PM

	Topics		Statistics	Last Post
	Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM		0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
	New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM		0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
	Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM		0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
	Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM		0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Working...

X