Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sfh838t
    Member
    • Apr 2014
    • 29

    variant calling in plant

    I am trying to get all variants of all types for a sequence I put together through an assembly step followed by consensus building using a reference. Now I am looking for variants. I have used samtools/bcf/vcfutils steps and the first time I did so accidentally using contigs, which gave me a list of only indels as a result, and I can visually (IGV) verify these. then I tried to correct this and used the actual reads, again with samtools. this time I got a list of SNPs only, which again I can locate in IGV.
    so now I am wondering what is going on? I was under the impression that samtools would locate both indels and SNPs using reads??? would it be legitimate to use indels found by using contigs in a write up?
    the subject is a plant chloroplast sequence and in the end I will need some locations I can use in the lab to find differences between two related species. I am learning this as I go so any information even links to further information would be most appreciated.
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    Whether or not indels are found depends on the aligner (and, perhaps, ploidy). How did you align the reads, and how did you align the contigs?

    Comment

    • sfh838t
      Member
      • Apr 2014
      • 29

      #3
      All my alignments are done using BWA, and the same file of reads was used for both GATK and samtools.
      I used the samtools mpileup/bcftools/vcfutils steps both times:
      samtools +reads = only SNPs
      samtools + contigs = only indels
      GATK tool: UnifiedGenotyper + reads = only SNPs

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Do you see indels in the reads when you look at the mapped bam in IGV? And how long are these indels?

        Comment

        • sfh838t
          Member
          • Apr 2014
          • 29

          #5
          yes, I can see both indels and SNPs in IGV. most of them are 3-7 bp long. And the indels found are at different locations than the SNPs found.
          Is it legitimate to use contigs in calling variants?

          Comment

          • sfh838t
            Member
            • Apr 2014
            • 29

            #6
            I look at contigs in IGV and see both SNPs and indels. Have not looked at reads.

            Comment

            • sfh838t
              Member
              • Apr 2014
              • 29

              #7
              ok, I just went and looked at reads and see the same insert that I identified in contigs. this is important for the project, because this insert is present in one possible parent but not the other.

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                Calling indels from the contigs is probably a valid approach as long as these are homozygous events; I'm not really sure how chloroplast genomes work. Also, if the assembly is good. Possibly, someone who knows more about GATK or mpileup can comment on why they seem to be missing the indels in reads.

                Comment

                • sfh838t
                  Member
                  • Apr 2014
                  • 29

                  #9
                  I did get the GATK answer and now have a file with both together. working with plants and all bets are off the samtools output might remain a mystery .
                  Probably a new question: does anyone know how to build a consensus from an alignment which DOES include indels??

                  Comment

                  • sarvidsson
                    Senior Member
                    • Jan 2015
                    • 137

                    #10
                    Did you try the FastaAlternateReferenceMaker in GATK? You need called variants in a VCF file however, not just an alignment - and read the documentation carefully, there are some limitations.

                    Comment

                    • sfh838t
                      Member
                      • Apr 2014
                      • 29

                      #11
                      Thanks, I will try that.

                      Comment

                      Latest Articles

                      Collapse

                      • GATTACAT
                        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by GATTACAT
                        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                        Today, 11:43 AM
                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 05:37 AM
                      0 responses
                      7 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      17 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      52 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      110 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...