Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • variant calling in plant

    I am trying to get all variants of all types for a sequence I put together through an assembly step followed by consensus building using a reference. Now I am looking for variants. I have used samtools/bcf/vcfutils steps and the first time I did so accidentally using contigs, which gave me a list of only indels as a result, and I can visually (IGV) verify these. then I tried to correct this and used the actual reads, again with samtools. this time I got a list of SNPs only, which again I can locate in IGV.
    so now I am wondering what is going on? I was under the impression that samtools would locate both indels and SNPs using reads??? would it be legitimate to use indels found by using contigs in a write up?
    the subject is a plant chloroplast sequence and in the end I will need some locations I can use in the lab to find differences between two related species. I am learning this as I go so any information even links to further information would be most appreciated.

  • #2
    Whether or not indels are found depends on the aligner (and, perhaps, ploidy). How did you align the reads, and how did you align the contigs?

    Comment


    • #3
      All my alignments are done using BWA, and the same file of reads was used for both GATK and samtools.
      I used the samtools mpileup/bcftools/vcfutils steps both times:
      samtools +reads = only SNPs
      samtools + contigs = only indels
      GATK tool: UnifiedGenotyper + reads = only SNPs

      Comment


      • #4
        Do you see indels in the reads when you look at the mapped bam in IGV? And how long are these indels?

        Comment


        • #5
          yes, I can see both indels and SNPs in IGV. most of them are 3-7 bp long. And the indels found are at different locations than the SNPs found.
          Is it legitimate to use contigs in calling variants?

          Comment


          • #6
            I look at contigs in IGV and see both SNPs and indels. Have not looked at reads.

            Comment


            • #7
              ok, I just went and looked at reads and see the same insert that I identified in contigs. this is important for the project, because this insert is present in one possible parent but not the other.

              Comment


              • #8
                Calling indels from the contigs is probably a valid approach as long as these are homozygous events; I'm not really sure how chloroplast genomes work. Also, if the assembly is good. Possibly, someone who knows more about GATK or mpileup can comment on why they seem to be missing the indels in reads.

                Comment


                • #9
                  I did get the GATK answer and now have a file with both together. working with plants and all bets are off the samtools output might remain a mystery .
                  Probably a new question: does anyone know how to build a consensus from an alignment which DOES include indels??

                  Comment


                  • #10
                    Did you try the FastaAlternateReferenceMaker in GATK? You need called variants in a VCF file however, not just an alignment - and read the documentation carefully, there are some limitations.

                    Comment


                    • #11
                      Thanks, I will try that.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X