Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by ManoloS7 View Post
    For example: On the one hand, I have a read that maps to chr9 in hg19 but, on the other hand, this same read maps to chr19_GL949750v2_alt in hg38.
    Oh. That sequence is an alternate version of a highly polymorphic site, the LCR/KIR complex. You're right, alignment on a genome build with alternate sequences might result in the calls being affected. If I understood correctly here, the "analysis set" of GRCh38 without alternate sequences is what you want to use, as opposed to the "full analysis set", which contains the alternate loci.
    This is a good start: http://genomespot.blogspot.com.br/20...e-version.html
    And this is a nice read too: http://gatkforums.broadinstitute.org...ome-components
    An interesting presentation: https://wiki.dtls.nl/images/8/8c/Zuo..._20150331.pptx

    I really have to thank you, this thread is extremely interesting to follow.

    Comment


    • #17
      Originally posted by ManoloS7 View Post
      For example: On the one hand, I have a read that maps to chr9 in hg19 but, on the other hand, this same read maps to chr19_GL949750v2_alt in hg38.
      Oh. That sequence is an alternate version of a highly polymorphic site, the LCR/KIR complex. You're right, alignment on a genome build with alternate sequences might result in the calls being affected. If I understood correctly here, the "analysis set" of GRCh38 without alternate sequences is what you want to use, as opposed to the "full analysis set", which contains the alternate loci.
      This is a good start: http://genomespot.blogspot.com.br/20...e-version.html
      And this is a nice read too: http://gatkforums.broadinstitute.org...ome-components
      An interesting presentation: https://wiki.dtls.nl/images/8/8c/Zuo..._20150331.pptx

      I really have to thank you, this thread is extremely interesting to follow.

      Comment


      • #18
        That is not a random chromosome, but an alternative version of part of a chromosome that has lots of variability in the human population. It means that the person you sequenced now has a reference sequence which represents their genome better than hg19 was able to, which only provided one sequence per chromosome without any alternatives.

        Comment


        • #19
          Originally posted by Dario1984 View Post
          That is not a random chromosome, but an alternative version of part of a chromosome that has lots of variability in the human population. It means that the person you sequenced now has a reference sequence which represents their genome better than hg19 was able to, which only provided one sequence per chromosome without any alternatives.
          I see, thanks.
          The problem is that, since those reads are not located all of them in the same loci, there are some variants that are present in hg19 but not in hg38. And that is a problem for my analysis.

          Comment


          • #20
            A post of mine from two days ago got stuck in moderation limbo; it's a pity because it had some good links.
            I was commenting that this thread is pretty interesting.
            The sequence you mention is an alternate sequence for the highly polymorphic human KIR locus. As I understand, most alignment algorithms don't handle very well polymorphic regions (alternate scaffolds). So if you align vs. the GRCh38 build plus its alternate contigs, you will see problems like the one you're describing. As I understood, the best option is yes to use the GRCh38 version, but omitting these contigs. That is, you might want to use the "analysis set"; but not the "full" analysis set version (which contains the alternate scaffolds).
            Last edited by r.rosati; 05-03-2017, 11:38 AM.

            Comment


            • #21
              Originally posted by r.rosati View Post
              A post of mine from two days ago got stuck in moderation limbo; it's a pity because it had some good links.
              I was commenting that this thread is pretty interesting.
              The sequence you mention is an alternate sequence for the highly polymorphic human KIR locus. As I understand, most alignment algorithms don't handle very well polymorphic regions (alternate scaffolds). So if you align vs. the GRCh38 build plus its alternate contigs, you will see problems like the one you're describing. As I understood, the best option is yes to use the GRCh38 version, but omitting these contigs. That is, you might want to use the "analysis set"; but not the "full" analysis set version (which contains the alternate scaffolds).
              Oh, that is very interesting and useful information. I will probably redo my analysis to see if now those reads map to the "original" chromosomes.
              Thank you so much.

              Comment


              • #22
                Originally posted by ManoloS7 View Post
                The thing is that I'm getting more variants when applying my filtering using hg38 than hg19, but a lot of them are because, for instance, what was A → G in hg19 is now G → A in hg38. This way, a lot of positions that were 0/0 are now 1/1. I mean, what back then was nothing now it's a mutation. What should I trust?
                I think that the question you are asking is not really meaningful. When it comes the reference at a highly variable site, there is not really a right or wrong. Whether or not the reference is A or G does not matter from a biological perspective. An individual might have an A or G which is either the reference or the alternative allele with respect to either hg19 or hg38, which just means it is a variable (i.e. a variant) site. It's just a question of naming.

                Originally posted by ManoloS7 View Post
                I want to say that I also find, generally, more variants when using hg38 than with hg19. Besides, even though with hg19 my lists are smaller, there are some variants that I don’t see with hg38. They are just gone.
                Did you check how different hg19 and hg38 are at the sites where you observe these differences? If the references are highly dissimilar, this would explain your vanishing variants. I think that based on the 1000Genomes project, many sites in hg38 were modified to adjust for previous errors.

                Comment


                • #23
                  Originally posted by evakoe View Post
                  I think that the question you are asking is not really meaningful. When it comes the reference at a highly variable site, there is not really a right or wrong. Whether or not the reference is A or G does not matter from a biological perspective. An individual might have an A or G which is either the reference or the alternative allele with respect to either hg19 or hg38, which just means it is a variable (i.e. a variant) site. It's just a question of naming.
                  I mostly but not totally agree with that. Of course it does not matter from a biological point of view and of course it is a question of names. But for my analysis it is very important that I see a variant (0/1 or 1/1) or not (0/0). I know that most changes are in positions in which both options are very frequent in the population, but does this happen in all the cases?

                  Originally posted by evakoe View Post
                  Did you check how different hg19 and hg38 are at the sites where you observe these differences? If the references are highly dissimilar, this would explain your vanishing variants. I think that based on the 1000Genomes project, many sites in hg38 were modified to adjust for previous errors.
                  I did not check that directly but it is obvious to me that there should be notable differences.

                  Comment


                  • #24
                    Originally posted by ManoloS7 View Post
                    I did not check that directly but it is obvious to me that there should be notable differences.
                    I did not mean in general, but specifically at the the sites where you see the differences in the variant calls.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X