Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup on multiple samples

    Hi,
    I have multiple samples from which I would like to call SNP. I've been trying either to call them separately or all at once

    samtools mpileup -uf hg19.fa sample1.bam | bcftools view -bvcg - > out.bcf
    or
    samtools mpileup -uf hg19.fa sample1.bam sample2.bam | bcftools view -bvcg - > out.bcf

    In the first case, I can get coverage (DP) at each position where a SNP is encountered, but only for the sample that has the SNP. That is, I would like to get the coverage at that position in all samples, but I could not.

    In the second case, for SNP I know are present in both samples, I found only 1 value for DP, which I assume is the combined coverage of both samples (see below).
    Is there any way I can get DP and DP4 for all samples even if the SNP is only in one?

    chr1 15211 . T G 51 . DP=16;AF1=1;CI95=0.8333,1;DP4=3,0,10,3;MQ=30;FQ=-37.5;PV4=1,1.3e-05,1,1 GT:PL:GQ 1/1:46,9,0:21 1/1:23,9,0:21

    Second question, I've been having a hard time trying to annotate the SNP calls in the vcf file (produced by bcftools view) with its rsID. I tried vcftools

    >bgzip UCSC_SNP_v132
    >zcat input.vcf.gz | vcftools_0.1.5/bin/fill-rsIDs -r UCSC_SNP_v132.gz | tabix-0.2.5/bgzip -c > out


    but kept getting error below which suggests to me that I'm not using the right format for the input dbSNP. I used the BED format downloaded from UCSC. What am I doing wrong here?

    tabix ./UCSC_SNP_v132.gz chr1 2>&1 |: No such file or directory at vcftools_0.1.5/bin/fill-rsIDs line 20
    main::error('tabix ./UCSC_SNP_v132.gz chr1 2>&1 |: No such file or directory') called at vcftools_0.1.5/bin/fill-rsIDs line 93
    main::fill_rsids('HASH(0x169602a0)', './UCSC_SNP_v132.gz') called at vcftools_0.1.5/bin/fill-rsIDs line 11

    Thanks!!

  • #2
    Regarding your first question: The -D option keeps per sample read depth with mpileup.

    Comment


    • #3
      Thanks, DerSeb!

      Regarding my 2nd question, I realized that error was because tabix (called inside fill-rsIDs) was not on my path.

      Comment


      • #4
        how to annotate VCF files created by bcftools view?

        I have 1 big vcf files which compares 12 samples by using samtools mpileup. And I would like annotate them? my reference genome is Hg19. How can annotate VCf files ?

        Could you write commands and set up and where to download vcftools?

        Thanks in advance.

        Are you using only vcftools? What about ANNOVAR?

        Comment


        • #5
          No, here's no way to run multiple sample .bams together and get the DP4 of each sample. I Which is a pity, since that would be useful. What you get instead are other stats, which can give you an idea of how good the SNP is in each sample.

          Comment


          • #6
            For annotating the VCF file you can use the Ensembl Variant Effect Predictor. It is very helpful and can also produce SIFT, PolyPhen and Condel consensus.


            I would recommend using the Perl script with a pre-built cache. http://ensembl.org/info/docs/variati...ep_script.html

            Here is the command that I use to annotate on a Mac:
            perl variant_effect_predictor.pl --species homo_sapiens -i SNP.vcf --format vcf -o SNP_effects.txt --sift b --polyphen b --condel b --gene --hgnc --canonical --regulatory --coding_only --no_intergenic --cache --compress gzcat --check_existing
            Variant Effect Predictor can also be used to check the SNPs against the 1000 Genomes and HapMap projects but I was not able to get it running. It was always throwing an error.

            --Dinesh
            Last edited by DineshCyanam; 12-28-2011, 01:26 PM.

            Comment


            • #7
              Thank you. Sorry for my late. I will try them.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Working...
              X