Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • clarissaboschi
    Member
    • Apr 2010
    • 63

    Indel analysis (SAMtools:pileup or mpileup)

    Hi! I am doing indel analysis, using SAMtools/bcftools. First I tried to use mpileup with the command line:
    samtools mpileup file.fa file.bam | bcftools view -bvcg - > var.raw.bcf
    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf


    I tried also the options at different time: –C50, -D8000, –d8000, -A, –BQ0, –r Chr and -r Chr:1-100 (interval in a chromosome).

    I had results for small parts of a chromosome. I had for a entire chromosome only when it is very small chromosome. Each time, the analysis stopped in a different region of the chromosome. So, for example for a big chromosome, I ran the analysis more than 10 times and I was not able to get for all chromosome (changing the regions using -r Chr:1-100; -r Chr:100-200, etc).

    After we tried the analysis with pileup (other version of SAMtools), and the analysis was ok, for the same files.

    Please, what do you think could be the problem? Should we use the mpileup instead of pileup? Why? I don’t want to do for each chromosome separately; I only tried this option to understand the problem.

    Thanks
    Clarissa
  • Heisman
    Senior Member
    • Dec 2010
    • 534

    #2
    I would try specifying a bed interval file as that is what works well for me.

    Comment

    • clarissaboschi
      Member
      • Apr 2010
      • 63

      #3
      Hi Heisman, could you please give more details about bed interval file?
      Thanks!!!

      Comment

      • Heisman
        Senior Member
        • Dec 2010
        • 534

        #4
        You can specify intervals by inputting a bed file with the "-l" or "-L" option (it's one of those (lower or uppercase L; I can't remember which off the top of my head).

        Bed files have the format:

        chr1 3456 80899

        Where the chromosome, base 1, and base 2 are tab delimited (they are actually space based coordinates, so to specify only base 4, for example, you would type chr1 3 4)

        From your command you were doing Chr:1-100, which doesn't specify the actual chromosome. It would have to be Chr2:1-100, for example.

        Comment

        • clarissaboschi
          Member
          • Apr 2010
          • 63

          #5
          Thanks for the explanation!!

          Maybe I did not explain well. I used the command to specify a region like an example: Chr2:1-2000 (I use the total length of the chromosome).

          But the problem is, for one chromosome I need to run so many times! Because the run stops at (for example) 5, 500 or 5000 bases, does not stop in the end of the chromosome!

          When I have a big chromosome this is very difficult (unless if I will use some perl scripts maybe..)! I need to run different bam files, so why does the run stop in the middle of the chromosome?

          I am thinking if this is normal, or maybe a memory problem. I don’t know....I have many files to run. It is a problem to run for each chromosome separately (imagine for many regions in each chromosome). When I tried to run for all chromosomes together, did not work as well, I had result only for a small part of one chromosome...

          Comment

          • Heisman
            Senior Member
            • Dec 2010
            • 534

            #6
            I don't know why it isn't working with you. I have no problem using hg19 as a reference and specifying all of the intervals of the exome fore whole exome data. Here are my commands:

            Code:
            samtools mpileup -q 5 -Q 15 -l [intervals_to_specify] -AB -ugf [reference.fa] [aligned.bam] | BCFtools view -bvcg -> [intermediate_file]
            
            BCFtools view -e [intermediate_file] | vcfutils varFilter -D 99999 > [variants.vcf]
            If that does not work (specify intervals in the bed file), write back.

            Comment

            • clarissaboschi
              Member
              • Apr 2010
              • 63

              #7
              Hi!! I tried your command line and I had results for a entire big chromosome (I tried only for the biggest chromosome!).

              I have some doubts about the -B option, so I ran the same files/command line but without the –B option.

              My results
              -With –B option
              Results for 100% of the chromosome
              Number of indels: 97,603
              -Without –B option
              Only results for 7% of the chromosome!
              Number of indels: 7,744

              -B option is the same as -BQ? Because in Samtools page there is a explanation:
              "Use `-BQ0 -d10000000 -f ref.fa' if the purpose is to get the precise depth of coverage rather than call SNPs. Under this setting, mpileup will count low-quality bases, process all reads (by default the depth is capped at 8000), and skip the time-demanding BAQ calculation. "
              or
              -B: disable BAQ computation

              Do you think my indel analysis will be fine if I am using the option –B?
              Why did you use this option?

              Thanks a lot!!!!

              Comment

              • Heisman
                Senior Member
                • Dec 2010
                • 534

                #8
                Ah, interesting; I had never considered the impact of "-B" on calling indels. I think this kind of proves you have to do it with the "-B". I have compared indel calls with samtools mpileup (using "-B") and Dindel and there is a large amount of overlap, so I think it's fine.

                Comment

                • clarissaboschi
                  Member
                  • Apr 2010
                  • 63

                  #9
                  Thanks a lot for your help!!!
                  Seems that the results are very similar with or without the option -B. I am checking the same region for the 2 results and I had the same indels, very similar number.

                  I will try to run for all chromosomes together.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  24 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  39 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  61 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...