submitting data to SRA - SEQanswers

You are currently viewing the SEQanswers forums as a guest, which limits your access. Click here to register now, and join the discussion

X

newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#1

submitting data to SRA

02-11-2013, 12:51 PM

Hi,
I am trying to submit a 16s rRNA reads from Illumina on SRA. I have reached to level where it is asking me the following things:
Flowcell, Lane, Filename, md5checksum.

I have the information, but I have some other samples in the same lane that does not belong to me. I am wondering how should I submit the file which have other data also in addition to mine.
The demultiplexed file which I have is in fasta format, so I don't know how to deal with this.
Please help!!!
Tags: None
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7140
- Share
- Tweet
#2

02-11-2013, 01:03 PM

I am not sure why your de-multiplexed files are in fasta format (did you never get fastq format files)? Did these samples have "in-line" barcodes (i.e. custom/home-brew multiplex) and were de-multiplexed outside of illumina casava pipeline?

There is no point in submitting data that does not belong to your study. Looks like you are going to have to go back and do some parsing/re-creating the sample file(s) that you need for submission.

Last edited by GenoMax; 02-11-2013, 01:10 PM.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#3

02-11-2013, 01:13 PM

Thanks GenoMax for looking into my problem.
I got demultiplexed file which looks like this:
>R.1_00001
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
>R.2_00001
TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
>R.1_00002
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

I got two files from here:
one containing the reads and other the barcode, the files were in fastq format.

Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.
Comment
Kennels

Senior Member

Join Date: Feb 2011

Posts: 149
- Share
- Tweet
#4

02-11-2013, 05:50 PM

Originally posted by newBioinfo View Post

Thanks GenoMax for looking into my problem.
I got demultiplexed file which looks like this:
>R.1_00001
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
>R.2_00001
TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
>R.1_00002
TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

I got two files from here:
one containing the reads and other the barcode, the files were in fastq format.

Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

If your original files you downloaded were in fastq format, then you need to use a script that enables demultiplexing and also outputs in fastq format. What script/program did you use to demultiplex? The sequencing facility should have done this for you with the Illumina pipeline as mentioned above.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#5

02-11-2013, 06:27 PM

Thanks Kennels,
The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?

Thanks!!!
Comment
Kennels

Senior Member

Join Date: Feb 2011

Posts: 149
- Share
- Tweet
#6

02-11-2013, 06:53 PM

Originally posted by newBioinfo View Post

Thanks Kennels,
The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?

Thanks!!!

If your sequencing facility was able to demultiplex it, then they should also be able to produce the fastq format for you. Can't you ask them to do it again?

You could try fastx toolkit (barcode splitter), or Reaper, but a general search on this forum or google should provide you more choices.
If you are not very familiar with command line, you could try Galaxy: https://main.g2.bx.psu.edu/ , use the barcode splitter tool under NGS manipulation on the left panel.

Good luck.

Last edited by Kennels; 02-11-2013, 06:56 PM.
Comment
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7140
- Share
- Tweet
#7

02-12-2013, 04:32 AM

Originally posted by newBioinfo View Post

Thanks GenoMax for looking into my problem.
I got demultiplexed file which looks like this:
>R.1_00001
TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT

This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

This is slightly confusing. So it sounds like you are saying that you did receive a "fastq" format file that had someone else's data (along with yours). You then de-mumtiplexed the data from this original fastq file.

Originally posted by newBioinfo View Post

I got two files from here:
one containing the reads and other the barcode, the files were in fastq format.

What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?

Originally posted by newBioinfo View Post

Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

It may be clear once you answer the above two questions but in any case you are going to have to go back to the original fastq file that you received from your sequencing facility to create the files you need to submit to SRA.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#8

02-12-2013, 08:01 AM

Thanks GenoMax,
I did get the original file from the facility but as I was new to the field I asked them to demultiplex it for me and got the file I showed above. So, now I have both the files but while submitting to SRA I need fastq file.
I think they used their own program to demultiplex it.

I didn't understand what you mean by this, can you please explain it to me
"""What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.

Thanks for help!!!
Comment
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7140
- Share
- Tweet
#9

02-12-2013, 09:57 AM

Originally posted by newBioinfo View Post

I didn't understand what you mean by this, can you please explain it to me
"""What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.

Thanks for help!!!

I was asking what software was used for doing the de-multiplexing. But it sounds like this was done by the sequencing facility for you which resulted in the plain fasta file you have.

Did you use standard illumina tag protocol (where the tag reads are not part of the actual sequence but are rather done as a separate read) or were the "tags" incorporated within the actual sequence? In case you had used illumina protocol then you would not have a separate barcode file (since you do I am not sure what exactly you did for multiplexing).

Either you (or someone who would know how) may indeed have to write some code to parse out data for your sample(s) from the original fastq file if you did not use standard illumina multiplex protocol. Perhaps you can ask the facility to split the fastq file and give you your part of the data.
Comment
newBioinfo

Member

Join Date: Mar 2012

Posts: 36
- Share
- Tweet
#10

02-12-2013, 02:09 PM

Thanks GenoMax,
I contacted the facility and they have provided me the data in fastq files.
Thanks for all the help.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

	Topics		Statistics	Last Post
	Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM		0 responses 11 views 0 likes	Last Post by seqadmin Today, 08:47 AM
	Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM		0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
	Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM		0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
	Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM		0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Working...

X