Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • annovar question

    Hi guys,
    i have a question regarding annovar.
    I have Torrent data which i have to map to 3 human genes only. I did that with BWA. After i had to do the annotation of SNPs. So i go to annovar. But somehow i get some weird results if any.

    So i did so:
    1. got the fasta seqs for the 3 genes and put them in a file together(3genes.fasta).
    2. bwa index -a is 3genes.fasta
    3. bwa aln - l 31 -k 2 -n 10 -t 4 3genes.fasta FILE.fastq > aln_sa.sai
    4. bwa samse 3genes.fasta aln_sa.sai FILE.fastq > aln.sam
    5. samtools faidx 3genes.fasta
    6. samtools view -bt 3genes.fasta.fai -o aln.bam aln.sam
    7. samtools sort aln.bam aln.bam.sorted
    8. samtools mpileup -ugf 3genes.fasta aln.bam.sorted.bam |bcftools view -bvcg ->var.raw.bcf
    9. bcftools view raw.vcf.bcf|vcfutils.pl varFilter -D 100 >var.flt.vcf
    10. convert2annovar.pl var.flt.vcf -format vcf4 > var.flt.vcf.avinput
    11. annotate_variation.pl -buildver hg19 var.flt.vcf.avinput /annovar/humandb/
    12. annotate_variation.pl -buildver hg19 -filter -dbtype snp132 var.flt.vcf.avinput /annovar/humandb/
    13. annotate_variation.pl -buildver hg19 var.flt.vcf.avinput.hg19_snp132_filtered /annovar/humandb/

    Is it possible to use annovar in that way at all?
    I am sorry if it seems a bit strange and not understandable but is difficult for me to explain. If any questions please ask.

    thank you

  • #2
    I don't use annovar at all, so I may be well off, but my annotation pipeline requires chromosomal coordinates in order to work out where a variation is and what effect it might have.

    How is this information introduced in your workflow?

    As soon as you create your reference of 3 genes, this is lost, and any variation coordinates will be relative to your reference sequence, not that of the human genome.

    Comment


    • #3
      HI,
      thanks for the info. What software do you use then.

      I was having similar thought to what you explained about my problem but could not put it into words.

      I suppose i could use the genebank files(GFF format) for these genes and then tell annovar to use the info inside as it provides coordinates.

      Comment


      • #4
        Originally posted by kenietz View Post
        HI,
        thanks for the info. What software do you use then.

        I was having similar thought to what you explained about my problem but could not put it into words.

        I suppose i could use the genebank files(GFF format) for these genes and then tell annovar to use the info inside as it provides coordinates.
        I feed my VCF's into VEP : http://www.ensembl.org/info/docs/var...vep/index.html but all my data is exome so consequently mapped to the entire genome so I don't have to worry about these kinds of issues, but I've done amplicon analysis before so that's why it crossed my mind.

        Comment


        • #5
          Yeah exactly, my data is from amplicon as well.
          When i used the whole genome everything is fine but why to do extra job when people are interested in 3 genes only.
          Will check up VEP tho. Hopefully it does what i want.
          Thank you again.

          Comment


          • #6
            Hi,
            i made some work around i think. i did the following. I have my 3 genes.
            1. I got theirs start and end points from NCBI.
            2. Put them in a file which annovar can read. Then extracted the sequences from the appropriate chromosome with a script from annovar. This created a fasta file with my seqs including info about their coordinates.
            3. Used bwa as usual. I aligned against the fasta file created above and indexed it as well.
            4. Created the vcf file and converted to annovar input file which resulted in a file like this,not usable by annovar tho:
            chr8:22019184-22021992 2806 2806 C - het 5.79 61
            5. created a perl script which transforms that line in the following form usable by annovar:
            chr8 22021990 22021990 C - het 5.79 61
            here 22021990=22019184+2806
            6. then use the converted file and proceed as usual.

            I dont know if its correct to do so but seems to be working.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            37 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X