Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 16S gene genomic coordinates?

    Hello,
    I have a couple of questions.

    Does anyone know where I can get the true genomic coordinates of the 9 different variable regions in the 16S gene?
    I have these based on my own inference from some published figures of the gene but would like to know if they are correct:
    V1 ~ 80-120
    V2 ~ 170-200
    V3 ~ 420-500
    V4 ~ 610-700
    V5 ~ 820-950
    V6 ~ 960-1100
    V7 ~ 1150-1200
    V8 ~ 1220-1300
    V9 ~ 1450-1500

    Also, does anyone know if there is a tool that I could use where I could blast roughly 100K reads against a database of bacteria and get back the region where my sequences aligned?

    thank you,
    Jen

  • #2
    This is slightly tangential to your question, but I have a neat tool that will locate a region in a 16S if you have the full-length 16S and primer sequences for the region. It's actually designed for cutting out the sub-regions, but you can just look at the sam file to get the coordinates.

    msa.sh in=16S.fasta query=ACTGACTG out=1.sam
    msa.sh in=16S.fasta query=ACTGACTG out=2.sam

    Those sam files will indicate, for each 16s sequence in the input file, the best alignment of the query sequence (which should be your left or right primer sequence for that subregion). Then you can cut out the regions like this:

    cutprimers.sh in=16S.fasta out=V4.fasta sam1=1.sam sam2=2.sam

    These are in BBTools.

    If you want to align sequences to bacteria, I suggest RefSeq Bacteria (just download and concatenate all of the *.fna.gz files). As far as tools go for BLASTing, you can use BLAST, of course. But I'm not entirely sure what you want. Can you clarify the question?

    Comment


    • #3
      Hi Brian,
      Yes basically, I am involved with a metagenomics study where I have 200-250bp sequence reads from Next Gen Sequencing derived from different regions of the 16s gene, i.e. 6 different primers (primer sequences are unknown as they are from a commercial kit and the company informed us that they are proprietary). For example, one fastq file that I have contains ~170K reads all from different regions of the 16s gene. I would like to be able to blast my reads against a database of bacteria so that in return I get each read aligned to the 16s gene somewhere and it's genomic coordinate, there I will know where a read in my file is derived form within the gene.

      Does this make sense? The HOMD database (www.homd.org) allows one to blast a total of only 3000 reads. I am looking for a different tool that will allow me to blast all of my reads and will return the alignments, and percent identity along with where the read aligned along the gene.

      Jen

      Comment


      • #4
        Also, do you know if there is a publication or information somewhere that gives the rough coordinates of the variable regions within the gene?

        Comment


        • #5
          If you are looking for a web tool, I can't really offer any suggestions (hopefully someone else can). You can run BBMap locally, though, which will return alignments along with their percent identity. And rather than aligning to bacterial genomes, you can just align to 16S using one of the datasets mentioned here. However, sometimes 16S in public databases are not full-length, or are too long, so the coordinates will be misleading. You may wish to first filter out the ones that seem anomalous, for example, like this:

          reformat.sh in=16S.fasta out=filtered.fasta minlen=1440 maxlen=1640

          ...which is what I did previously when trying to get rid of bad sequences. The exact length limits I derived empirically from looking at length distributions (using readlength.sh); possibly a tighter band would be better since you are interested in finding specific coordinates.

          Comment


          • #6
            Originally posted by JenBarb View Post
            I am involved with a metagenomics study where I have 200-250bp sequence reads from Next Gen Sequencing derived from different regions of the 16s gene, i.e. 6 different primers (primer sequences are unknown as they are from a commercial kit and the company informed us that they are proprietary). Jen
            If you are trying to find out the primer sequences used for amplifying 16S region in a particular kit, the easiest way would be preparing NGS libraries form amplicons by using a different NGS platform. For instance, if the kit is designed for platform ITxx, you can amplify the regions and then after clean up use ampliconas as input into ILxx library prep. Sequencing would be expected to start from primers.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            70 views
            0 likes
            Last Post seqadmin  
            Working...
            X