Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Empty VCF file with bcftools call

    Hello,

    I am trying to produce a vcf file using bcftools call but it produces an empty vcf file containing only the header. In short, here is what I do:

    1. Alignment with BWA
    2. With samtools, make sorted.bam files
    3. With samtools, index the sorted.bam files
    4. Run samtools mpileup in the following way:
    samtools mpileup -C 50 -E -t SP -t DP -u -I –f /genome/refgenome.fa -b bam_list.txt > output.bcf
    5. Run bcftools call:
    bcftools call -v -c output.bcf > output.vcf

    I am using versions 1.3.1 of samtools, bcftools and htslib. I tried reinstalling these programs but it did not change the issue. I also tried with versions 1.2. Same problem. As far as I know, the bcf file seems fine, it contains lots of data and is 20GB.

    I tried producing a basic vcf file using bcftools view: bcftools view output.cf > output.vcf and it works. The vcf file seems completely normal.

    Could anyone help me with this? Why would bcftools call produce an empty output?

    Thanks

  • #2
    For reference cross-posted: https://www.biostars.org/p/189996

    Comment


    • #3
      Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.

      Comment


      • #4
        Originally posted by AP38 View Post
        Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.
        It is ok to cross-post on SeqAnswers. I included a link to your post on Biostars for reference.

        If you get an answer over @Biostars then please come back and indicate that here.

        Comment


        • #5
          Shouldn't you have the -g option with mpileup to compute genotype likelihoods?
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Yes indeed, if one wants to compute genotype likelihoods, the -g option is required. However, this should not help solve the problem and make a difference.

            Comment


            • #7
              Sorry... don't know what I was thinking!

              I tried your commands one a recent project and it also gave a header-only vcf file.

              This minimal pipeline worked fine and produced a normal vcf from the same data:
              mpileup -gu -Q 10 -t DP,DPR -f ref.fasta -b samples.txt | bcftools call -cv - > test.vcf
              (it also worked without the -g!)

              So from that I would conclude it is not a problem with your versions, file list or reference.

              When I generate a bcf file using the minimal pipeline, it reports the reference allele. Your mpileup does not for some reason.
              Last edited by SNPsaurus; 05-05-2016, 10:35 AM.
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment


              • #8
                Thanks! This is very good to know. It is possible that the -C 50 option is causing the issue because it downgrades mapping quality for excessive mismatches. I am working here with libraries of fairly small coverage so I might want to remove that option. I'll try that and stay in touch about the results.

                Comment


                • #9
                  For some reasons, an empty line was added at the bottom of the index genome file. Removing it solved the problem…

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  71 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  80 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X