Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 15bp mystery sequence at the beginning of 454 data

    Hello-

    So I am working on replicating an analysis from a published study that used 454 sequencing with titanium chemistry, and have downloaded all sequences from the SRA database. Here is an example sequence:

    Code:
    GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTATCCCTACCTGATCCGAGGTCAAAGTAAAAGGCTTGTGGAATGGAGTCTGGCTGAAATCGCTTGCAATGTGCTGCGCACGAAGCCAACATACCGGCTGCCAATGAATTTGAGGCGAGTCCACGCGCTGAGGCGGAACAAACACCCAACACCAAGCATAGCTTGAAGGTTTAAATGACGCTCGAACAGGCATGCCCAACGGAATACCGAAGGGCAATACTAGGTGTGGTCGGCGTCTCTCAAGGCACACAGGGGATAGGNNN
    And here is the corresponding barcode/primer sequence sent to me from from the author. barcode bolded, primer underlined:

    ACGAGTGCGTTCCTSCGCTTATTGATATGC

    Both of these are in this sequence, from the sample:

    Code:
    GACTACACGTAGTAT[B]ACGAGTGCGT[/B][U]TCCTGCGCTTATTGATATGC[/U]TTAAGTTCAGCGGGTATCCCTACCTGATCCGAGGTCAAAGTAAAAGGCTTGTGGAATGGAGTCTGGCTGAAATCGCTTGCAATGTGCTGCGCACGAAGCCAACATACCGGCTGCCAATGAATTTGAGGCGAGTCCACGCGCTGAGGCGGAACAAACACCCAACACCAAGCATAGCTTGAAGGTTTAAATGACGCTCGAACAGGCATGCCCAACGGAATACCGAAGGGCAATACTAGGTGTGGTCGGCGTCTCTCAAGGCACACAGGGGATAGGNNN

    However, there is a 15bp sequence preceding the barcode/primer: GACTACACGTAGTAT

    I don't know what this is, but it causes my split libraries commands to fail in QIIME. I've seen that some samples in this data set have this identical sequence preceding them, while others have a slightly different sequence but its still 15bp. Any idea what this might be? Is it an artifact of 454 Titanium?
    Last edited by Brian Bushnell; 05-13-2015, 09:54 AM.

  • #2
    So- this is likely a 454 adaptor sequence that was left in. Its surprisingly difficult to find information on the adaptors online. Can I safely assume the adaptor will be 15bp across the study and go ahead and trim off the first 15bp of each sequence and qual line? Is there a way to tell if this is the A or B adaptor for titanium chemistry?

    Comment


    • #3
      Is there a database of adapters anywhere that can be searched?

      Comment


      • #4
        This wiki appears to have all 454 sequences: https://wikis.utexas.edu/display/GSAF/454+-+all+flavors

        GACT appears to be "key" sequence but that is not for titanium.

        Comment


        • #5
          it would be surprising, as that would mean the sequences are double barcoded (they used barcoded primers during PCR, before adapter ligation). Also- I see the GACT motif is associated with RAPID chemistry, however authors report they used Titanium chemistry in their manuscript, however they may have misreported this detail.

          Comment


          • #6
            Is that GATC+sequence matching RAPID barcodes in other samples?
            Last edited by GenoMax; 05-13-2015, 11:56 AM.

            Comment


            • #7
              Yes, the 15bp sequence seems to be present in every read, although sometimes it is slightly different than the one listed above. THEN there is a 10-12bp barcode, and then there is the primer.

              Comment


              • #8
                But is the "different" part matching other RAPID barcodes (with GATC always at the beginning)?

                Comment


                • #9
                  Yes- the ones that are different still start with 'GACT' as the first four bases of the read, just as the one above.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 08:47 AM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  53 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X