SEQanswers

Go Back   SEQanswers > Applications Forums > Metagenomics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Genomic coordinates to gene names Layla Bioinformatics 8 04-17-2014 02:23 PM
Conversion of RefSeq coordinates to genomic coordinates ngssupporter Bioinformatics 2 02-15-2014 12:59 PM
Genomic coordinates from Gene Names Palgrave Bioinformatics 0 05-09-2012 02:40 AM
getting genomic coordinates from gene accesion information mathew Bioinformatics 11 03-18-2011 11:37 AM
From Affy probe sets/gene symbols to genomic coordinates? ETHANol Epigenetics 7 10-25-2010 02:13 AM

Reply
 
Thread Tools
Old 12-18-2014, 11:31 AM   #1
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default 16S gene genomic coordinates?

Hello,
I have a couple of questions.

Does anyone know where I can get the true genomic coordinates of the 9 different variable regions in the 16S gene?
I have these based on my own inference from some published figures of the gene but would like to know if they are correct:
V1 ~ 80-120
V2 ~ 170-200
V3 ~ 420-500
V4 ~ 610-700
V5 ~ 820-950
V6 ~ 960-1100
V7 ~ 1150-1200
V8 ~ 1220-1300
V9 ~ 1450-1500

Also, does anyone know if there is a tool that I could use where I could blast roughly 100K reads against a database of bacteria and get back the region where my sequences aligned?

thank you,
Jen
JenBarb is offline   Reply With Quote
Old 12-18-2014, 12:09 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

This is slightly tangential to your question, but I have a neat tool that will locate a region in a 16S if you have the full-length 16S and primer sequences for the region. It's actually designed for cutting out the sub-regions, but you can just look at the sam file to get the coordinates.

msa.sh in=16S.fasta query=ACTGACTG out=1.sam
msa.sh in=16S.fasta query=ACTGACTG out=2.sam

Those sam files will indicate, for each 16s sequence in the input file, the best alignment of the query sequence (which should be your left or right primer sequence for that subregion). Then you can cut out the regions like this:

cutprimers.sh in=16S.fasta out=V4.fasta sam1=1.sam sam2=2.sam

These are in BBTools.

If you want to align sequences to bacteria, I suggest RefSeq Bacteria (just download and concatenate all of the *.fna.gz files). As far as tools go for BLASTing, you can use BLAST, of course. But I'm not entirely sure what you want. Can you clarify the question?
Brian Bushnell is offline   Reply With Quote
Old 12-19-2014, 04:12 AM   #3
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Hi Brian,
Yes basically, I am involved with a metagenomics study where I have 200-250bp sequence reads from Next Gen Sequencing derived from different regions of the 16s gene, i.e. 6 different primers (primer sequences are unknown as they are from a commercial kit and the company informed us that they are proprietary). For example, one fastq file that I have contains ~170K reads all from different regions of the 16s gene. I would like to be able to blast my reads against a database of bacteria so that in return I get each read aligned to the 16s gene somewhere and it's genomic coordinate, there I will know where a read in my file is derived form within the gene.

Does this make sense? The HOMD database (www.homd.org) allows one to blast a total of only 3000 reads. I am looking for a different tool that will allow me to blast all of my reads and will return the alignments, and percent identity along with where the read aligned along the gene.

Jen
JenBarb is offline   Reply With Quote
Old 12-19-2014, 04:13 AM   #4
JenBarb
Member
 
Location: Bethesda, MD

Join Date: Oct 2010
Posts: 47
Default

Also, do you know if there is a publication or information somewhere that gives the rough coordinates of the variable regions within the gene?
JenBarb is offline   Reply With Quote
Old 12-19-2014, 09:24 AM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If you are looking for a web tool, I can't really offer any suggestions (hopefully someone else can). You can run BBMap locally, though, which will return alignments along with their percent identity. And rather than aligning to bacterial genomes, you can just align to 16S using one of the datasets mentioned here. However, sometimes 16S in public databases are not full-length, or are too long, so the coordinates will be misleading. You may wish to first filter out the ones that seem anomalous, for example, like this:

reformat.sh in=16S.fasta out=filtered.fasta minlen=1440 maxlen=1640

...which is what I did previously when trying to get rid of bad sequences. The exact length limits I derived empirically from looking at length distributions (using readlength.sh); possibly a tighter band would be better since you are interested in finding specific coordinates.
Brian Bushnell is offline   Reply With Quote
Old 12-19-2014, 07:05 PM   #6
nucacidhunter
Jafar Jabbari
 
Location: Melbourne

Join Date: Jan 2013
Posts: 1,226
Default

Quote:
Originally Posted by JenBarb View Post
I am involved with a metagenomics study where I have 200-250bp sequence reads from Next Gen Sequencing derived from different regions of the 16s gene, i.e. 6 different primers (primer sequences are unknown as they are from a commercial kit and the company informed us that they are proprietary). Jen
If you are trying to find out the primer sequences used for amplifying 16S region in a particular kit, the easiest way would be preparing NGS libraries form amplicons by using a different NGS platform. For instance, if the kit is designed for platform ITxx, you can amplify the regions and then after clean up use ampliconas as input into ILxx library prep. Sequencing would be expected to start from primers.
nucacidhunter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO