Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Empty VCF file with bcftools call

    Hello,

    I am trying to produce a vcf file using bcftools call but it produces an empty vcf file containing only the header. In short, here is what I do:

    1. Alignment with BWA
    2. With samtools, make sorted.bam files
    3. With samtools, index the sorted.bam files
    4. Run samtools mpileup in the following way:
    samtools mpileup -C 50 -E -t SP -t DP -u -I –f /genome/refgenome.fa -b bam_list.txt > output.bcf
    5. Run bcftools call:
    bcftools call -v -c output.bcf > output.vcf

    I am using versions 1.3.1 of samtools, bcftools and htslib. I tried reinstalling these programs but it did not change the issue. I also tried with versions 1.2. Same problem. As far as I know, the bcf file seems fine, it contains lots of data and is 20GB.

    I tried producing a basic vcf file using bcftools view: bcftools view output.cf > output.vcf and it works. The vcf file seems completely normal.

    Could anyone help me with this? Why would bcftools call produce an empty output?

    Thanks

  • #2
    For reference cross-posted: https://www.biostars.org/p/189996

    Comment


    • #3
      Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.

      Comment


      • #4
        Originally posted by AP38 View Post
        Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.
        It is ok to cross-post on SeqAnswers. I included a link to your post on Biostars for reference.

        If you get an answer over @Biostars then please come back and indicate that here.

        Comment


        • #5
          Shouldn't you have the -g option with mpileup to compute genotype likelihoods?
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Yes indeed, if one wants to compute genotype likelihoods, the -g option is required. However, this should not help solve the problem and make a difference.

            Comment


            • #7
              Sorry... don't know what I was thinking!

              I tried your commands one a recent project and it also gave a header-only vcf file.

              This minimal pipeline worked fine and produced a normal vcf from the same data:
              mpileup -gu -Q 10 -t DP,DPR -f ref.fasta -b samples.txt | bcftools call -cv - > test.vcf
              (it also worked without the -g!)

              So from that I would conclude it is not a problem with your versions, file list or reference.

              When I generate a bcf file using the minimal pipeline, it reports the reference allele. Your mpileup does not for some reason.
              Last edited by SNPsaurus; 05-05-2016, 10:35 AM.
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment


              • #8
                Thanks! This is very good to know. It is possible that the -C 50 option is causing the issue because it downgrades mapping quality for excessive mismatches. I am working here with libraries of fairly small coverage so I might want to remove that option. I'll try that and stay in touch about the results.

                Comment


                • #9
                  For some reasons, an empty line was added at the bottom of the index genome file. Removing it solved the problem…

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Advancing Precision Medicine for Rare Diseases in Children
                    by seqadmin




                    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                    12-16-2024, 07:57 AM
                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin



                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has seen remarkable advancements,...
                    12-02-2024, 01:49 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-17-2024, 10:28 AM
                  0 responses
                  33 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-13-2024, 08:24 AM
                  0 responses
                  48 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-12-2024, 07:41 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-11-2024, 07:45 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X