SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PhyloP in annovar Robby Bioinformatics 2 12-21-2015 08:52 PM
annovar mouse Robby Bioinformatics 3 11-04-2013 11:28 AM
question related to Annovar annotation kjaja Bioinformatics 4 03-27-2012 02:32 AM
ANNOVAR question prayingmantis Bioinformatics 2 12-09-2011 12:35 PM
Annovar files Masta General 1 02-22-2011 01:57 PM

Reply
 
Thread Tools
Old 02-02-2012, 10:06 PM   #1
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default annovar question

Hi guys,
i have a question regarding annovar.
I have Torrent data which i have to map to 3 human genes only. I did that with BWA. After i had to do the annotation of SNPs. So i go to annovar. But somehow i get some weird results if any.

So i did so:
1. got the fasta seqs for the 3 genes and put them in a file together(3genes.fasta).
2. bwa index -a is 3genes.fasta
3. bwa aln - l 31 -k 2 -n 10 -t 4 3genes.fasta FILE.fastq > aln_sa.sai
4. bwa samse 3genes.fasta aln_sa.sai FILE.fastq > aln.sam
5. samtools faidx 3genes.fasta
6. samtools view -bt 3genes.fasta.fai -o aln.bam aln.sam
7. samtools sort aln.bam aln.bam.sorted
8. samtools mpileup -ugf 3genes.fasta aln.bam.sorted.bam |bcftools view -bvcg ->var.raw.bcf
9. bcftools view raw.vcf.bcf|vcfutils.pl varFilter -D 100 >var.flt.vcf
10. convert2annovar.pl var.flt.vcf -format vcf4 > var.flt.vcf.avinput
11. annotate_variation.pl -buildver hg19 var.flt.vcf.avinput /annovar/humandb/
12. annotate_variation.pl -buildver hg19 -filter -dbtype snp132 var.flt.vcf.avinput /annovar/humandb/
13. annotate_variation.pl -buildver hg19 var.flt.vcf.avinput.hg19_snp132_filtered /annovar/humandb/

Is it possible to use annovar in that way at all?
I am sorry if it seems a bit strange and not understandable but is difficult for me to explain. If any questions please ask.

thank you
kenietz is offline   Reply With Quote
Old 02-03-2012, 12:01 AM   #2
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

I don't use annovar at all, so I may be well off, but my annotation pipeline requires chromosomal coordinates in order to work out where a variation is and what effect it might have.

How is this information introduced in your workflow?

As soon as you create your reference of 3 genes, this is lost, and any variation coordinates will be relative to your reference sequence, not that of the human genome.
Bukowski is offline   Reply With Quote
Old 02-03-2012, 12:28 AM   #3
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

HI,
thanks for the info. What software do you use then.

I was having similar thought to what you explained about my problem but could not put it into words.

I suppose i could use the genebank files(GFF format) for these genes and then tell annovar to use the info inside as it provides coordinates.
kenietz is offline   Reply With Quote
Old 02-03-2012, 12:34 AM   #4
Bukowski
Senior Member
 
Location: UK

Join Date: Jan 2010
Posts: 390
Default

Quote:
Originally Posted by kenietz View Post
HI,
thanks for the info. What software do you use then.

I was having similar thought to what you explained about my problem but could not put it into words.

I suppose i could use the genebank files(GFF format) for these genes and then tell annovar to use the info inside as it provides coordinates.
I feed my VCF's into VEP : http://www.ensembl.org/info/docs/var...vep/index.html but all my data is exome so consequently mapped to the entire genome so I don't have to worry about these kinds of issues, but I've done amplicon analysis before so that's why it crossed my mind.
Bukowski is offline   Reply With Quote
Old 02-03-2012, 12:43 AM   #5
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Yeah exactly, my data is from amplicon as well.
When i used the whole genome everything is fine but why to do extra job when people are interested in 3 genes only.
Will check up VEP tho. Hopefully it does what i want.
Thank you again.
kenietz is offline   Reply With Quote
Old 02-06-2012, 12:20 AM   #6
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi,
i made some work around i think. i did the following. I have my 3 genes.
1. I got theirs start and end points from NCBI.
2. Put them in a file which annovar can read. Then extracted the sequences from the appropriate chromosome with a script from annovar. This created a fasta file with my seqs including info about their coordinates.
3. Used bwa as usual. I aligned against the fasta file created above and indexed it as well.
4. Created the vcf file and converted to annovar input file which resulted in a file like this,not usable by annovar tho:
chr8:22019184-22021992 2806 2806 C - het 5.79 61
5. created a perl script which transforms that line in the following form usable by annovar:
chr8 22021990 22021990 C - het 5.79 61
here 22021990=22019184+2806
6. then use the converted file and proceed as usual.

I dont know if its correct to do so but seems to be working.
kenietz is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:16 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO