Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • submitting data to SRA

    Hi,
    I am trying to submit a 16s rRNA reads from Illumina on SRA. I have reached to level where it is asking me the following things:
    Flowcell, Lane, Filename, md5checksum.

    I have the information, but I have some other samples in the same lane that does not belong to me. I am wondering how should I submit the file which have other data also in addition to mine.
    The demultiplexed file which I have is in fasta format, so I don't know how to deal with this.
    Please help!!!

  • #2
    I am not sure why your de-multiplexed files are in fasta format (did you never get fastq format files)? Did these samples have "in-line" barcodes (i.e. custom/home-brew multiplex) and were de-multiplexed outside of illumina casava pipeline?

    There is no point in submitting data that does not belong to your study. Looks like you are going to have to go back and do some parsing/re-creating the sample file(s) that you need for submission.
    Last edited by GenoMax; 02-11-2013, 01:10 PM.

    Comment


    • #3
      Thanks GenoMax for looking into my problem.
      I got demultiplexed file which looks like this:
      >R.1_00001
      TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
      >R.2_00001
      TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
      >R.1_00002
      TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

      This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

      I got two files from here:
      one containing the reads and other the barcode, the files were in fastq format.

      Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

      Comment


      • #4
        Originally posted by newBioinfo View Post
        Thanks GenoMax for looking into my problem.
        I got demultiplexed file which looks like this:
        >R.1_00001
        TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
        >R.2_00001
        TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
        >R.1_00002
        TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

        This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

        I got two files from here:
        one containing the reads and other the barcode, the files were in fastq format.

        Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

        If your original files you downloaded were in fastq format, then you need to use a script that enables demultiplexing and also outputs in fastq format. What script/program did you use to demultiplex? The sequencing facility should have done this for you with the Illumina pipeline as mentioned above.

        Comment


        • #5
          Thanks Kennels,
          The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?


          Thanks!!!

          Comment


          • #6
            Originally posted by newBioinfo View Post
            Thanks Kennels,
            The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?


            Thanks!!!

            If your sequencing facility was able to demultiplex it, then they should also be able to produce the fastq format for you. Can't you ask them to do it again?

            You could try fastx toolkit (barcode splitter), or Reaper, but a general search on this forum or google should provide you more choices.
            If you are not very familiar with command line, you could try Galaxy: https://main.g2.bx.psu.edu/ , use the barcode splitter tool under NGS manipulation on the left panel.

            Good luck.
            Last edited by Kennels; 02-11-2013, 06:56 PM.

            Comment


            • #7
              Originally posted by newBioinfo View Post
              Thanks GenoMax for looking into my problem.
              I got demultiplexed file which looks like this:
              >R.1_00001
              TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT

              This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.
              This is slightly confusing. So it sounds like you are saying that you did receive a "fastq" format file that had someone else's data (along with yours). You then de-mumtiplexed the data from this original fastq file.

              Originally posted by newBioinfo View Post
              I got two files from here:
              one containing the reads and other the barcode, the files were in fastq format.
              What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?


              Originally posted by newBioinfo View Post
              Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.
              It may be clear once you answer the above two questions but in any case you are going to have to go back to the original fastq file that you received from your sequencing facility to create the files you need to submit to SRA.

              Comment


              • #8
                Thanks GenoMax,
                I did get the original file from the facility but as I was new to the field I asked them to demultiplex it for me and got the file I showed above. So, now I have both the files but while submitting to SRA I need fastq file.
                I think they used their own program to demultiplex it.

                I didn't understand what you mean by this, can you please explain it to me
                """What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

                Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.


                Thanks for help!!!

                Comment


                • #9
                  Originally posted by newBioinfo View Post

                  I didn't understand what you mean by this, can you please explain it to me
                  """What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

                  Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.


                  Thanks for help!!!
                  I was asking what software was used for doing the de-multiplexing. But it sounds like this was done by the sequencing facility for you which resulted in the plain fasta file you have.

                  Did you use standard illumina tag protocol (where the tag reads are not part of the actual sequence but are rather done as a separate read) or were the "tags" incorporated within the actual sequence? In case you had used illumina protocol then you would not have a separate barcode file (since you do I am not sure what exactly you did for multiplexing).

                  Either you (or someone who would know how) may indeed have to write some code to parse out data for your sample(s) from the original fastq file if you did not use standard illumina multiplex protocol. Perhaps you can ask the facility to split the fastq file and give you your part of the data.

                  Comment


                  • #10
                    Thanks GenoMax,
                    I contacted the facility and they have provided me the data in fastq files.
                    Thanks for all the help.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    59 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    54 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X