Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • submitting data to SRA

    Hi,
    I am trying to submit a 16s rRNA reads from Illumina on SRA. I have reached to level where it is asking me the following things:
    Flowcell, Lane, Filename, md5checksum.

    I have the information, but I have some other samples in the same lane that does not belong to me. I am wondering how should I submit the file which have other data also in addition to mine.
    The demultiplexed file which I have is in fasta format, so I don't know how to deal with this.
    Please help!!!

  • #2
    I am not sure why your de-multiplexed files are in fasta format (did you never get fastq format files)? Did these samples have "in-line" barcodes (i.e. custom/home-brew multiplex) and were de-multiplexed outside of illumina casava pipeline?

    There is no point in submitting data that does not belong to your study. Looks like you are going to have to go back and do some parsing/re-creating the sample file(s) that you need for submission.
    Last edited by GenoMax; 02-11-2013, 01:10 PM.

    Comment


    • #3
      Thanks GenoMax for looking into my problem.
      I got demultiplexed file which looks like this:
      >R.1_00001
      TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
      >R.2_00001
      TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
      >R.1_00002
      TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

      This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

      I got two files from here:
      one containing the reads and other the barcode, the files were in fastq format.

      Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

      Comment


      • #4
        Originally posted by newBioinfo View Post
        Thanks GenoMax for looking into my problem.
        I got demultiplexed file which looks like this:
        >R.1_00001
        TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT
        >R.2_00001
        TACGGAGGGTGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGTGGGCGGATAAGTCAGTGGTGAAATCTTCAAGCTTAACTTGGAAACTGCCATTGATACTATTCGTCTTGAATATCCCGGAGGTAAGCGGAATATGTCAT
        >R.1_00002
        TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTGTGTAAGTTGGATGTGAAATCCCCGGGCTTAACCTGGGAATGGCATTCAAAACTGCACGGCTAGAGTATGGGAGAGGAAGGTAGAATTCCAGGT

        This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.

        I got two files from here:
        one containing the reads and other the barcode, the files were in fastq format.

        Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.

        If your original files you downloaded were in fastq format, then you need to use a script that enables demultiplexing and also outputs in fastq format. What script/program did you use to demultiplex? The sequencing facility should have done this for you with the Illumina pipeline as mentioned above.

        Comment


        • #5
          Thanks Kennels,
          The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?


          Thanks!!!

          Comment


          • #6
            Originally posted by newBioinfo View Post
            Thanks Kennels,
            The sequencing facility did it for me, and I got the demultiplexed file which is in fasta format. Also, if I do it myself which program should I use?


            Thanks!!!

            If your sequencing facility was able to demultiplex it, then they should also be able to produce the fastq format for you. Can't you ask them to do it again?

            You could try fastx toolkit (barcode splitter), or Reaper, but a general search on this forum or google should provide you more choices.
            If you are not very familiar with command line, you could try Galaxy: https://main.g2.bx.psu.edu/ , use the barcode splitter tool under NGS manipulation on the left panel.

            Good luck.
            Last edited by Kennels; 02-11-2013, 06:56 PM.

            Comment


            • #7
              Originally posted by newBioinfo View Post
              Thanks GenoMax for looking into my problem.
              I got demultiplexed file which looks like this:
              >R.1_00001
              TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATATAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAATTGCATCTGTGACTGTATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGT

              This file I got from the original file which was uploaded to the sequencing facility website, in which there were other samples also.
              This is slightly confusing. So it sounds like you are saying that you did receive a "fastq" format file that had someone else's data (along with yours). You then de-mumtiplexed the data from this original fastq file.

              Originally posted by newBioinfo View Post
              I got two files from here:
              one containing the reads and other the barcode, the files were in fastq format.
              What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?


              Originally posted by newBioinfo View Post
              Now the point is how to get my sequences in fastq format, as after demultiplexing I am getting in fasta format and to upload on SRA we need fastq.
              It may be clear once you answer the above two questions but in any case you are going to have to go back to the original fastq file that you received from your sequencing facility to create the files you need to submit to SRA.

              Comment


              • #8
                Thanks GenoMax,
                I did get the original file from the facility but as I was new to the field I asked them to demultiplex it for me and got the file I showed above. So, now I have both the files but while submitting to SRA I need fastq file.
                I think they used their own program to demultiplex it.

                I didn't understand what you mean by this, can you please explain it to me
                """What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

                Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.


                Thanks for help!!!

                Comment


                • #9
                  Originally posted by newBioinfo View Post

                  I didn't understand what you mean by this, can you please explain it to me
                  """What tool/software did you use to do the demultiplexing and why did it eliminate the quality values? Were these barcodes "inline" with the actual sequence read (custom)?"""

                  Also, do I need to write a code for doing this as I have original fastq file, barcode file and mapping file.


                  Thanks for help!!!
                  I was asking what software was used for doing the de-multiplexing. But it sounds like this was done by the sequencing facility for you which resulted in the plain fasta file you have.

                  Did you use standard illumina tag protocol (where the tag reads are not part of the actual sequence but are rather done as a separate read) or were the "tags" incorporated within the actual sequence? In case you had used illumina protocol then you would not have a separate barcode file (since you do I am not sure what exactly you did for multiplexing).

                  Either you (or someone who would know how) may indeed have to write some code to parse out data for your sample(s) from the original fastq file if you did not use standard illumina multiplex protocol. Perhaps you can ask the facility to split the fastq file and give you your part of the data.

                  Comment


                  • #10
                    Thanks GenoMax,
                    I contacted the facility and they have provided me the data in fastq files.
                    Thanks for all the help.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X