SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Question of retrieving nucleotides from a list of genomic coordinates.. (http://seqanswers.com/forums/showthread.php?t=23297)

shyam_la 09-12-2012 09:21 AM

Question of retrieving nucleotides from a list of genomic coordinates..
 
Lets say I have an excel file, one column with chr number and the next with genomic coordinates, running into several thousands in numbers. Is there some online / offline tool into which I can input this information and get as output the nucleotides at these loci in hg19?

For eg.
Input
1 23354
2 345344
3 43543553

Output
1 23354 T
2 345344 C
3 43543553 A

dariober 09-12-2012 10:01 AM

Quote:

Originally Posted by shyam_la (Post 83857)
Lets say I have an excel file, one column with chr number and the next with genomic coordinates, running into several thousands in numbers. Is there some online / offline tool into which I can input this information and get as output the nucleotides at these loci in hg19?

For eg.
Input
1 23354
2 345344
3 43543553

Output
1 23354 T
2 345344 C
3 43543553 A

If could reformat your excel file to BED format (even within excel) and save it as plain text (say as mypositions.bed). Then, you can use bedtools as something like (assuming you have already the FASTA file for hg19):

Code:

bedtools getfasta -fi hg19.fa -bed mypositions.bed -tab
Dario

shyam_la 09-12-2012 12:53 PM

Thank you. Will try that out.. Is it possible to do a similar thing with an aligned sorted BAM file?

jparsons 09-12-2012 01:49 PM

It's trivial to convert a sorted BAM file into a bed file.
Look at bedtools documentation. (bamtobed, in particular)

shyam_la 09-12-2012 02:21 PM

Quote:

Originally Posted by jparsons (Post 83873)
It's trivial to convert a sorted BAM file into a bed file.
Look at bedtools documentation. (bamtobed, in particular)

No I meant a BAM file in place of a fasta file..

westerman 09-13-2012 07:12 AM

Since a BAM should contain overlapping reads that may or may not agree at any particular base instead of a single sequence then the answer to your question is not straight-forward. First you'll need to generate a consensus sequence via 'samtools', 'bcftools' and 'vcfutils' ... see: http://samtools.sourceforge.net/mpileup.shtml. Having gotten that then you can pull out the bases.

There may be easier ways but that is how I would do it.


All times are GMT -8. The time now is 06:44 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.