Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Well, I'm rather sure that the problem came from the strand. I rerun ANNOVAR without the reverse sense information and I got no more this warning:

    perl ./annotate_variation.pl -geneanno $dir/$file humandb -build hg19
    NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 38781 transcripts (including 6123 without coding sequence annotation) for 23261 unique genes
    NOTICE: Reading FASTA sequences from humandb/hg19_refGeneMrna.fa ... Done with 160 sequences
    WARNING: A total of 319 sequences will be ignored due to lack of correct ORF annotation
    NOTICE: Finished gene-based annotation on 94 genetic variants in VICJE_missense211011/VICJE_strand_file
    NOTICE: Output files were written to VICJE_missense211011/VICJE_strand_file.variant_function, VICJE_missense211011/VICJE_strand_file.exonic_variant_function
    The point is that I don't want to loose half of the information...
    Could you suggest me something?

    Comment


    • #17
      Can you put your variants in excel and just change the strand yourself? Or write a script to do it? If you can put them in excel then you can sort everything by strand so the "-" strand variants will be together, then sort them by reference base so all the "- A" will be at the top, then the "- C", etc, and then change the reference bases to their complements. Then, sort again by the variant bases and change those to their complements.

      Comment


      • #18
        Well, I will write a script to do that. But it's weird that it is not possible to input "reverse sense" data ; isn't it very common?
        Does someone know if there is a tool handling "both kinds" of data?

        Comment


        • #19
          So sorry about reopening this treat... but I'm struggling with the warning about lack of correct ORF orientation...

          I don't understand at all if this entry on FAQ in the ANNOVAR website is related to my problem... and in the examples from the website the warning also appears...

          Why ANNOVAR reports "unknown" in exonic_variant_function?

          "unknown" means that the gene structure is not correctly annotated (complete ORF information is not available). Previous versions of ANNOVAR will always give an answer such as non-synonymous SNVs, etc, but I got too many user emails complaining about "bugs" (even though ANNOVAR is innocent in this case). So after December 2011, if errors exist in gene structure annotation (RefSeq, Ensembl, UCSC, etc), ANNOVAR will just report unknown for exonic_variant_function; in other word, although the variant is clearly within an exon, we cannot say for sure how it affects protein sequence as the ORF annotation is not correct.

          Any insight, please? Thank you and sorry, again.

          Comment


          • #20
            Hi,

            when running annovar, I get the same warning about the lack of correct ORF annotation:

            perl annotate_variation.pl -geneanno -buildver hg19 -dbtype ensgene -outfile Genes.txt SNVs.txt humandb/


            NOTICE: Reading gene annotation from .../humandb/hg19_ensGene.txt ... Done with 195565 transcripts (including 100437 without coding sequence annotation) for 57528 unique genes
            NOTICE: Reading FASTA sequences from .../annovar/humandb/hg19_ensGeneMrna.fa ... Done with 230 sequences
            WARNING: A total of 6746 sequences will be ignored due to lack of correct ORF annotation
            NOTICE: Finished gene-based annotation on 239 genetic variants in SNVs.txt
            NOTICE: Output files were written to Genes.txt.variant_function, Genes.txt.exonic_variant_function

            I would think that the entry in the FAQ "Why ANNOVAR reports "unknown" in exonic_variant_function?" is related to that. My guess is that the warning indicates that for some SNVs it might not be possible to tell the consequence (e.g. nonsynonymous/synonymous), because the ORF annotation is not available.

            Is this a severe issue or can we just ignore this warning as long as we are only interested in knowing which gene the SNV falls into?
            Since it also appears in the warnings of the website I would guess that we can ignore it. However, I'm concerned because in my case over 6000 sequences will be ignored!
            Anyone knows how to deal with this?
            Thank you.

            Comment


            • #21
              annovar interpretation

              I've read information on the ANNOVAR website - pardon me if i've missed out on something.

              I'm having difficulty interpreting results and I hope someone can help me here:


              phastConsElements46way Score=703;Name=lod=961
              I know the score ranges from 1-1000. How high should a score,LOD be for a region to get conserved? Is there a standard cut off below which I can categorise a variant falling in a
              less conserved site or more conserved site?

              tfbsConsSites Score=775;Name=V$MEF2_04
              Same question here?

              evofold Score=151;Name=660514.0_0_-_151
              Same question here?

              wgEncodeRegDnaseClustered Name=4
              Can I merely interpret the result as - variant falls on a DNAse I hypersensitivity site?

              wgEncodeRegTfbsClustered Name=Pol2-4H8,CTCF,EBF,BATF,MEF2A,IRF4_(M-17),ZEB1_(SC-25388),YY1_(C-20),BCL3,EBF1_(C-8),SP1,PAX5-N19,BCL11A,Pol2,NFKB,PAX5-C20


              Is there a threshold for the following :
              LJB2_GERP++ - higher scores are more deleterious
              LJB2_PhyloP -higher scores are more deleterious - I find negative values as well
              LJB2_SiPhy -higher scores are more deleterious


              What does it mean when you find a '.' in LJB2_SIFT,LJB2_PolyPhen2_HDIV,LJB2_PP2_HDIV_Pred,LJB2_PolyPhen2_HVAR,LJB2_PolyPhen2_HVAR_Pred,LJB2_LRT,LJB2_LRT_Pred,LJB2_MutationTaster,LJB2_MutationTaster_Pred,LJB_MutationAssessor,LJB_MutationAssessor_Pred,LJB2_FATHMM,LJB2_GERP++,LJB2_PhyloP,LJB2_SiPhy

              Originally posted by Jane M View Post
              Thanks for the links !
              I am focusing on the point 2, the well preserved regions.
              The description of the output file is


              In a way, the score represents the "probability" for a region to be really a preserved region... The range in my data is from 200 to 700. Is there a "common threshold" above which we can say that a region is a preserved region?

              Then, the lod score is an other way to measure the "probability" for a region to be really a preserved region... Is it a tool more precise than the simple score?
              I read that a "relation" is significant if the lod score is >=3. In my case, I have 115 mutations presents in the output file.
              Can I conclude that 115 mutations occur in a significant way in well preserved regions?

              I try to see the statistical conclusion, because there is probably a statistical test behind, to test each mutation. The null hypothesis can be "the mutation x is occurring in a conserved region" and the alternative hypothesis "the mutation x is not occurring in a conserved region"...

              Comment


              • #22
                waiting for the answer also~

                Comment


                • #23
                  Posting a reply I got from the author :

                  "1. It is conserved if there is a score. About 5% of the region in genome has a score.
                  it is a TFBS if there is a score
                  same as above for evofold

                  2. Typically the authors of GERP++ recommend >2 to declare conserved.
                  for phylop and siphy, I do not know what is the recommendation. But there is not really a threshold. If you believe again that 50% of exome is conserved, then you take the top 50% score as the cutoff

                  3. it means the value is not available."

                  Hope this helps

                  Comment


                  • #24
                    Thank you very much!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X