Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vcf genotypes for RILs

    I was wondering if there is any way to change the assumptions for the genotypes calling in .vcf files from mpileup in samtools. I am working with a diploid organism but the individuals are mostly homozygous recombinant inbred lines with only about 1% residual heterozygosity. (or highly inbred lines). The problem is that when calling a SNP with low coverage (1-3 reads) and only one allele is observed in a sample, it often assumes the individual is heterozygous if the observed allele is the less common allele.

    The problem is that it assumes that all the loci in my individuals are in H-W equilibrium, when in fact due to experimental design they are not anywhere close to being in HW eq and most loci are going to be homozygous. Filtering by the quality on genotype calls reduces the problem but also discards much of the data.

    Of course sequencing to a high depth would solve this question with the existing tools, but when I expect >99% homozygous individuals at each loci that should not be necessary, as one or two A "Reads" should be enough to predict an AA genotype.

  • #2
    Could you post what arguments you are using? This is a question I am very interested in knowing the answer to.

    Comment


    • #3
      I made bowtie2 for alignments, followed by samtools to create sorted bam files.

      mpileup -BuDf Refseq.fa differentsorted.bam(100 separate files) | bcftools view -bvcg - > out.bcf

      bcftools view -N output.bcf > output.vcf

      I have also used vcftools option --geno-depth on the .vcf file but the results are all -1 (missing data).

      I have tried various permutiations in addition with similar results.

      Comment


      • #4
        That's what I figured.....I wonder what would happen if you didn't use the -c argument when you run bcftools. This calls the -e argument which does the test for Hardy-Weinberg Equilibrium:

        Consensus/Variant Calling Options:
        -c Call variants using Bayesian inference. This option automatically invokes option -e.

        -d FLOAT When -v is in use, skip loci where the fraction of samples covered by reads is below FLOAT. [0]

        -e Perform max-likelihood inference only, including estimating the site allele frequency, testing Hardy-Weinberg equlibrium and testing associations with LRT.




        Maybe try instead.
        Code:
        mpileup -BuDf Refseq.fa differentsorted.bam(100 separate files) | bcftools view -bvg - > out.bcf
        I'd be interested to know how this affects the results. I have never run bcftools without the -c argument.

        PS. I see you're in Athens, GA.....if you wouldn't mind I'd like to ask you a few questions. I am starting a post-doc at UGA in Aug.
        Last edited by chadn737; 04-22-2013, 02:38 PM.

        Comment


        • #5
          Thanks so much, guess I have to re-run that 2 week mpileup.

          Comment


          • #6
            Could you run it on one or two files instead and test it? 2 weeks is a long time to try something new out if you don't now what the result will be.

            Comment


            • #7
              Yes planning on doing so. But right now our computer cluster is having disk issues so I don't expect quick results.

              I think that I will have to do the -b option only (not -bvg) on the bcftools view as -v and -g invoke -c.
              Last edited by jebowers; 04-22-2013, 03:22 PM. Reason: x

              Comment


              • #8
                You're right, my bad, I should have read that a bit more closely.

                Comment


                • #9
                  Hi,
                  I got around a similar problem (I'm working with the yeast equivalent of RI lines) by using Freebayes, which has an option for ploidy. This allows you to genotype your RI samples as if they were haploids.
                  However, in my experience, low coverage will result in poor genoype calls.
                  Cheers,

                  Miguel

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X