Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Empty VCF file with bcftools call

    Hello,

    I am trying to produce a vcf file using bcftools call but it produces an empty vcf file containing only the header. In short, here is what I do:

    1. Alignment with BWA
    2. With samtools, make sorted.bam files
    3. With samtools, index the sorted.bam files
    4. Run samtools mpileup in the following way:
    samtools mpileup -C 50 -E -t SP -t DP -u -I –f /genome/refgenome.fa -b bam_list.txt > output.bcf
    5. Run bcftools call:
    bcftools call -v -c output.bcf > output.vcf

    I am using versions 1.3.1 of samtools, bcftools and htslib. I tried reinstalling these programs but it did not change the issue. I also tried with versions 1.2. Same problem. As far as I know, the bcf file seems fine, it contains lots of data and is 20GB.

    I tried producing a basic vcf file using bcftools view: bcftools view output.cf > output.vcf and it works. The vcf file seems completely normal.

    Could anyone help me with this? Why would bcftools call produce an empty output?

    Thanks

  • #2
    For reference cross-posted: https://www.biostars.org/p/189996

    Comment


    • #3
      Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.

      Comment


      • #4
        Originally posted by AP38 View Post
        Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.
        It is ok to cross-post on SeqAnswers. I included a link to your post on Biostars for reference.

        If you get an answer over @Biostars then please come back and indicate that here.

        Comment


        • #5
          Shouldn't you have the -g option with mpileup to compute genotype likelihoods?
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Yes indeed, if one wants to compute genotype likelihoods, the -g option is required. However, this should not help solve the problem and make a difference.

            Comment


            • #7
              Sorry... don't know what I was thinking!

              I tried your commands one a recent project and it also gave a header-only vcf file.

              This minimal pipeline worked fine and produced a normal vcf from the same data:
              mpileup -gu -Q 10 -t DP,DPR -f ref.fasta -b samples.txt | bcftools call -cv - > test.vcf
              (it also worked without the -g!)

              So from that I would conclude it is not a problem with your versions, file list or reference.

              When I generate a bcf file using the minimal pipeline, it reports the reference allele. Your mpileup does not for some reason.
              Last edited by SNPsaurus; 05-05-2016, 10:35 AM.
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment


              • #8
                Thanks! This is very good to know. It is possible that the -C 50 option is causing the issue because it downgrades mapping quality for excessive mismatches. I am working here with libraries of fairly small coverage so I might want to remove that option. I'll try that and stay in touch about the results.

                Comment


                • #9
                  For some reasons, an empty line was added at the bottom of the index genome file. Removing it solved the problem…

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    Yesterday, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 07:17 AM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-02-2024, 08:06 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-30-2024, 12:17 PM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-29-2024, 10:49 AM
                  0 responses
                  29 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X