Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bcftools view generates an empty output

    Hello,


    file.sam
    #I convert SAM to BAM
    Code:
    samtools view -bS file -o file.bam
    #I sort BAM file
    Code:
    samtools sort file.bam file.srt
    I remove duplicat, then I detect variants with samtools mpileup

    Code:
    samtools mpileup -uf genome.fasta srt_file.bam | bcftools view -bvcg - > file_raw.bcf
    I obtain no errors, bcf file contain data. (file is about gigabase). => samtools mpileup generates an empty output.

    When I convert bcf into vcf, I obtain
    Code:
    [mpileup] 1 samples in 1 input files
    <mpileup> Set max per-sample depth to 8000
    and different lines with
    Code:
    [bcfview] 434500000 sites processed.
    [afs] 0:54828.077 1:20102.090 2:25069.833
    [bcfview] 434600000 sites processed.
    [afs] 0:44750.569 1:24548.359 2:30701.072
    but the vcf file only contains the header and this line :
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT srt_file.bam

    #convert bcf into vcf
    Code:
    bcftools view file_raw.bcf | vcfutils.pl varFilter -D100 > variant.flt.vcf
    Someone can help me?
    Thanks in advance

  • #2
    Sounds like it called no SNPs. Maybe you should generate the pileup, and eyeball some of it it to see if everything looks alright.

    Comment


    • #3
      I decide to proceed step by step,

      Code:
      samtools mpileup -uf genome.fasta srt_file.bam > output_mpileup
      [mpileup] 1 samples in 1 input files
      <mpileup> Set max per-file depth to 8000

      ls -lah sortie_mpileup
      xxx 35G Feb x xxx output_mpileup

      Code:
      bcftools view -bvcg output_mpileup > output_bcf
      ls -lah output_bcf
      xxx 1.3G Feb x xxx output_bcf
      bcftools view output_bcf | more

      output_bcf file is not empty!

      Code:
      bcftools view output_bcf | vcfutils.pl varFilter -D100 > variant.flt.vcf
      variant.flt.vcf file only contains the header and this line :
      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT srt_file.bam


      I don't know where is the problem!

      Comment


      • #4
        Again, why don't you manaully inspect the pileup? Is filtering everything with a coverage higher than 100 really appropriate for your dataset?

        Comment


        • #5
          Thanks,

          The problem is that vcf output indiscates only N as an SNP!

          Code:
          less xxx.vcf
          #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT C01HGABXX_002_s_7_brut_rm.bam
          chromosome:xxx 400 . N A 3.02 . DP=1;AF1=1;CI95=0.5,1;DP4=0,0,1,0;MQ=35;FQ=-30 PL 30,3,0

          Comment


          • #6
            My problem is that the pileup file contains all "N's" in the reference sequence field (column 3).
            I am mapping reads to a genome reference using stampy. I have converted the bwa sam alignment file to a sorted bam file using samtools. When I try running pileup to identify variants, I get only N's in the reference base position

            *I have checked the reference fasta file and don't find anything wrong.
            * the headers are consistent between SAM and the original fasta.

            see http://seqanswers.com/forums/showthread.php?t=5506

            Someone can hepl me?

            Comment


            • #7
              Double check the reference fasta index. You can make it with samtools faidx.
              That index has to be made correctly for mpileup to work right. mpileup will run that command itself if it doesn't see the fasta index, and it will say something if that step fails, but it will keep going on anyway without it, and it will output all N's like you see.

              Comment


              • #8
                I make a index reference sequence
                Code:
                samtools faidx genome.fasta
                It creates genome.fai file. Hovewer, I obtain the same results!

                Comment


                • #9
                  Are 100% sure that the reference you indexed is the same as the file you aligned to? Spot check the names in the .sam, make sure that they match perfectly with the names in the fasta and the fasta.fai.

                  You definately can't call SNPs until you get mpileup to reconize the association bewteen the reference you aligned to, and the reference fasta you are pointing it to, and having N's in the reference allele indicates that it's not making that association yet. Are there any odd non-space chracters in your chromsome name that maybe your aligner is truncating when making the .sam?

                  Comment


                  • #10
                    I used v0.1.16 (r973) version of samtools; and now with v0.1.17 version, the pileup file doesn't contain "N" in the reference sequence field.
                    Older versions of SAMtools support the pileup command. As of SAMtools v0.1.17 (r973) and later, the pileup command is deprecated and has been replaced with mpileup to accommodate multi-sample calling.

                    Thanks for your help swbarnes2.

                    Comment


                    • #11
                      how about the problem now?solved?

                      Comment


                      • #12
                        Hi,
                        I am also facing the similar problem.
                        I use samtools 1.1.18 and with mpileup command I get a bcf file but the vcf file is only with the header
                        when I try changing the varFilter read depth, the vcf file is of 0 size

                        Can any one of you please help me

                        Comment


                        • #13
                          have this problem solved?

                          I also got empty vcf file

                          Comment


                          • #14
                            htslib matters

                            I got into the same problem.
                            Code:
                            samtools mpileup -ugf <ref.fa> -Q 0 <bamfile.bam> | bcftools call -vc | less
                            I can see the called variants.
                            Code:
                            samtools mpileup -ugf <ref.fa> -Q 0 <bamfile.bam> | bcftools call -vc > <mutation.vcf>
                            And I get an empty vcf file. As have been pointed out by others, the problem lies in exporting the vcf file. This problem almost drove me crazy. I reinstalled the samtools and bcftools. Still not working. Finally, I installed htslib. It worked! It turned out that htslib is not optional! So you may need to make sure htslib is installed in your machine. Good luck.

                            Comment


                            • #15
                              bcftools view -b

                              Dear all, what is wrong in the bcftools view -bvcg ? (why -b option can not be found?). thanks

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X