Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools don't read the whole data

    Hello,

    I'm having troubles running samtools. When launching the following command:
    Code:
    samtools mpileup -uf ref.fa aln1.bam | bcftools view -bvcg - > var.raw.bcf
    It ends without error but it only considered the first 3Mbp of the first human chromosome… The results seem to be consistent but very partial (I have reads for all the chromosomes)

    If I launch that command
    Code:
    samtools mpileup -f ref.fa aln1.bam
    I got the output for all the positions in pileup format, as expected. That is not exactly what I want since there is no SNP calling but that shows mpileup process the whole data.

    So, I assume that samtools mpileup is working properly but then why does bcftools stop after a few Mbp? Any idea would be welcome
    Thanks!

  • #2
    Did this get resolved? I am having the same issue.
    Jesse

    Comment


    • #3
      What happens when you exclude the "-v" option:
      Code:
      samtools mpileup -uf ref.fa aln1.bam | bcftools view -bcg - > var.raw.bcf
      This should call all positions. Perhaps there are no variants called at later positions...

      Comment


      • #4
        samtools doesn't read whole data

        I updated to samtools 0.1.17 from 0.1.12a. Now samtools doesn't seem to begin processing the data:

        ./samtools-0.1.17/samtools mpileup -I -C 0 -Q 40 -u -f ./hg19.fasta ./7741X2_Converted_reheaded.bam | ./samtools-0.1.17/bcftools/bcftools view -vcg - > ./7741X2.var.raw4

        [mpileup] 1 samples in 1 input files
        <mpileup> Set max per-file depth to 8000
        and stops here

        when I do -bcg instead it looks like reads begin processing:


        [mpileup] 1 samples in 1 input files
        <mpileup> Set max per-file depth to 8000
        [bcfview] 100000 sites processed.
        [afs] 0:99850.000 1:100.000 2:50.000
        [bcfview] 200000 sites processed.
        etc.

        before with 0.1.12a, both the -bcg and -vcg options began processing, only -vcg stopped prematurely.

        Is there a way to know if samtools gets all the way through the data?
        this is my header

        @HD VN:1.0 SO:coordinate
        @SQ SN:chr10 LN:135534747 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr11 LN:135006516 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr12 LN:133851895 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr13 LN:115169878 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr14 LN:107349540 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr15 LN:102531392 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr16 LN:90354753 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr17 LN:81195210 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr18 LN:78077248 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr19 LN:59128983 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr1 LN:249250621 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr20 LN:63025520 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr21 LN:48129895 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr22 LN:51304566 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr2 LN:243199373 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr3 LN:198022430 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr4 LN:191154276 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr5 LN:180915260 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr6 LN:171115067 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr7 LN:159138663 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr8 LN:146364022 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr9 LN:141213431 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrAdapter LN:9260 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrM LN:16571 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrPhiX_Illumina LN:5386 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrX LN:155270560 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrY LN:59373566 AS:hg19EnsKnGn46bpAdaptPhi.novoindex

        thanks,

        Comment


        • #5
          The mpileup step is very very long, much longer than the bcftools steps. Piping it doesn't really save much time, because the bcftools steps are a lot faster. And I guess it makes sense that if you ask it to make a all-points bcf, it'll show more progress. But is that really what you want?

          Comment


          • #6
            Thanks for your input. Samtools pipes output line by line- as they are processed- into BCF-tools, thus the benefit of piping, right? I tried the -b option as troubleshooting to figure out why -v option is not processing the data. Any other suggestions?
            Last edited by 12jrowley2; 08-09-2011, 11:11 AM.

            Comment


            • #7
              Originally posted by 12jrowley2 View Post
              Thanks for your input. Samtools pipes output line by line- as they are processed- into BCF-tools, thus the benefit of piping, right? I tried the -b option as troubleshooting to figure out why -v option is not processing the data. Any other suggestions?
              If it takes mpileup 2 hours to process all your reads, and then bcftools can process that output in 2 minutes, then piping is saving you all of 2 minutes.

              When I run mpileup solo, it sit on the "[mpileup] 1 samples in 1 input files
              <mpileup> Set max per-file depth to 8000" step for a long time. But the output file grows all the time. Then I run bcftools on that when it's done. (Or while it's running, just to get an idea of how far along it is, and about how many SNPs its finding) I can use the mpileup output to make a vcf for variant calling, and an all-points vcf for use in the vcf2fq program.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X