Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indel analysis (SAMtools:pileup or mpileup)

    Hi! I am doing indel analysis, using SAMtools/bcftools. First I tried to use mpileup with the command line:
    samtools mpileup file.fa file.bam | bcftools view -bvcg - > var.raw.bcf
    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf


    I tried also the options at different time: –C50, -D8000, –d8000, -A, –BQ0, –r Chr and -r Chr:1-100 (interval in a chromosome).

    I had results for small parts of a chromosome. I had for a entire chromosome only when it is very small chromosome. Each time, the analysis stopped in a different region of the chromosome. So, for example for a big chromosome, I ran the analysis more than 10 times and I was not able to get for all chromosome (changing the regions using -r Chr:1-100; -r Chr:100-200, etc).

    After we tried the analysis with pileup (other version of SAMtools), and the analysis was ok, for the same files.

    Please, what do you think could be the problem? Should we use the mpileup instead of pileup? Why? I don’t want to do for each chromosome separately; I only tried this option to understand the problem.

    Thanks
    Clarissa

  • #2
    I would try specifying a bed interval file as that is what works well for me.

    Comment


    • #3
      Hi Heisman, could you please give more details about bed interval file?
      Thanks!!!

      Comment


      • #4
        You can specify intervals by inputting a bed file with the "-l" or "-L" option (it's one of those (lower or uppercase L; I can't remember which off the top of my head).

        Bed files have the format:

        chr1 3456 80899

        Where the chromosome, base 1, and base 2 are tab delimited (they are actually space based coordinates, so to specify only base 4, for example, you would type chr1 3 4)

        From your command you were doing Chr:1-100, which doesn't specify the actual chromosome. It would have to be Chr2:1-100, for example.

        Comment


        • #5
          Thanks for the explanation!!

          Maybe I did not explain well. I used the command to specify a region like an example: Chr2:1-2000 (I use the total length of the chromosome).

          But the problem is, for one chromosome I need to run so many times! Because the run stops at (for example) 5, 500 or 5000 bases, does not stop in the end of the chromosome!

          When I have a big chromosome this is very difficult (unless if I will use some perl scripts maybe..)! I need to run different bam files, so why does the run stop in the middle of the chromosome?

          I am thinking if this is normal, or maybe a memory problem. I don’t know....I have many files to run. It is a problem to run for each chromosome separately (imagine for many regions in each chromosome). When I tried to run for all chromosomes together, did not work as well, I had result only for a small part of one chromosome...

          Comment


          • #6
            I don't know why it isn't working with you. I have no problem using hg19 as a reference and specifying all of the intervals of the exome fore whole exome data. Here are my commands:

            Code:
            samtools mpileup -q 5 -Q 15 -l [intervals_to_specify] -AB -ugf [reference.fa] [aligned.bam] | BCFtools view -bvcg -> [intermediate_file]
            
            BCFtools view -e [intermediate_file] | vcfutils varFilter -D 99999 > [variants.vcf]
            If that does not work (specify intervals in the bed file), write back.

            Comment


            • #7
              Hi!! I tried your command line and I had results for a entire big chromosome (I tried only for the biggest chromosome!).

              I have some doubts about the -B option, so I ran the same files/command line but without the –B option.

              My results
              -With –B option
              Results for 100% of the chromosome
              Number of indels: 97,603
              -Without –B option
              Only results for 7% of the chromosome!
              Number of indels: 7,744

              -B option is the same as -BQ? Because in Samtools page there is a explanation:
              "Use `-BQ0 -d10000000 -f ref.fa' if the purpose is to get the precise depth of coverage rather than call SNPs. Under this setting, mpileup will count low-quality bases, process all reads (by default the depth is capped at 8000), and skip the time-demanding BAQ calculation. "
              or
              -B: disable BAQ computation

              Do you think my indel analysis will be fine if I am using the option –B?
              Why did you use this option?

              Thanks a lot!!!!

              Comment


              • #8
                Ah, interesting; I had never considered the impact of "-B" on calling indels. I think this kind of proves you have to do it with the "-B". I have compared indel calls with samtools mpileup (using "-B") and Dindel and there is a large amount of overlap, so I think it's fine.

                Comment


                • #9
                  Thanks a lot for your help!!!
                  Seems that the results are very similar with or without the option -B. I am checking the same region for the 2 results and I had the same indels, very similar number.

                  I will try to run for all chromosomes together.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X