SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Metagenomics (http://seqanswers.com/forums/forumdisplay.php?f=29)
-   -   16S gene genomic coordinates? (http://seqanswers.com/forums/showthread.php?t=49066)

JenBarb 12-18-2014 11:31 AM

16S gene genomic coordinates?
 
Hello,
I have a couple of questions.

Does anyone know where I can get the true genomic coordinates of the 9 different variable regions in the 16S gene?
I have these based on my own inference from some published figures of the gene but would like to know if they are correct:
V1 ~ 80-120
V2 ~ 170-200
V3 ~ 420-500
V4 ~ 610-700
V5 ~ 820-950
V6 ~ 960-1100
V7 ~ 1150-1200
V8 ~ 1220-1300
V9 ~ 1450-1500

Also, does anyone know if there is a tool that I could use where I could blast roughly 100K reads against a database of bacteria and get back the region where my sequences aligned?

thank you,
Jen

Brian Bushnell 12-18-2014 12:09 PM

This is slightly tangential to your question, but I have a neat tool that will locate a region in a 16S if you have the full-length 16S and primer sequences for the region. It's actually designed for cutting out the sub-regions, but you can just look at the sam file to get the coordinates.

msa.sh in=16S.fasta query=ACTGACTG out=1.sam
msa.sh in=16S.fasta query=ACTGACTG out=2.sam

Those sam files will indicate, for each 16s sequence in the input file, the best alignment of the query sequence (which should be your left or right primer sequence for that subregion). Then you can cut out the regions like this:

cutprimers.sh in=16S.fasta out=V4.fasta sam1=1.sam sam2=2.sam

These are in BBTools.

If you want to align sequences to bacteria, I suggest RefSeq Bacteria (just download and concatenate all of the *.fna.gz files). As far as tools go for BLASTing, you can use BLAST, of course. But I'm not entirely sure what you want. Can you clarify the question?

JenBarb 12-19-2014 04:12 AM

Hi Brian,
Yes basically, I am involved with a metagenomics study where I have 200-250bp sequence reads from Next Gen Sequencing derived from different regions of the 16s gene, i.e. 6 different primers (primer sequences are unknown as they are from a commercial kit and the company informed us that they are proprietary). For example, one fastq file that I have contains ~170K reads all from different regions of the 16s gene. I would like to be able to blast my reads against a database of bacteria so that in return I get each read aligned to the 16s gene somewhere and it's genomic coordinate, there I will know where a read in my file is derived form within the gene.

Does this make sense? The HOMD database (www.homd.org) allows one to blast a total of only 3000 reads. I am looking for a different tool that will allow me to blast all of my reads and will return the alignments, and percent identity along with where the read aligned along the gene.

Jen

JenBarb 12-19-2014 04:13 AM

Also, do you know if there is a publication or information somewhere that gives the rough coordinates of the variable regions within the gene?

Brian Bushnell 12-19-2014 09:24 AM

If you are looking for a web tool, I can't really offer any suggestions (hopefully someone else can). You can run BBMap locally, though, which will return alignments along with their percent identity. And rather than aligning to bacterial genomes, you can just align to 16S using one of the datasets mentioned here. However, sometimes 16S in public databases are not full-length, or are too long, so the coordinates will be misleading. You may wish to first filter out the ones that seem anomalous, for example, like this:

reformat.sh in=16S.fasta out=filtered.fasta minlen=1440 maxlen=1640

...which is what I did previously when trying to get rid of bad sequences. The exact length limits I derived empirically from looking at length distributions (using readlength.sh); possibly a tighter band would be better since you are interested in finding specific coordinates.

nucacidhunter 12-19-2014 07:05 PM

Quote:

Originally Posted by JenBarb (Post 156881)
I am involved with a metagenomics study where I have 200-250bp sequence reads from Next Gen Sequencing derived from different regions of the 16s gene, i.e. 6 different primers (primer sequences are unknown as they are from a commercial kit and the company informed us that they are proprietary). Jen

If you are trying to find out the primer sequences used for amplifying 16S region in a particular kit, the easiest way would be preparing NGS libraries form amplicons by using a different NGS platform. For instance, if the kit is designed for platform ITxx, you can amplify the regions and then after clean up use ampliconas as input into ILxx library prep. Sequencing would be expected to start from primers.


All times are GMT -8. The time now is 10:50 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.