Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi sample SNP calling

    Hi all,

    I have two bam files, bam1 and bam2 with two different read groups. I would like GATK to treat as a single sample and do the SNP calling. And it can be done by giving same read group names to both the bam files and call SNP's by pooling them to a single bam file.

    But my interest is, Lets say a SNP i covered by 30 reads, i am interested in finding out number of reads that have come from bam1 and number of reads from bam2.

    How can we distinguish between the reads from two bam files after merging them into a single bam file with same read group name?

    Is there a way or any tool to achieve this? Any suggestions!!!

  • #2
    Once you know the snp location you could copy it to a separate file and use bedtools' intersection command between that location and each of your bam files. If the snp output is in vcf format then you just need to make a new file with only the row of your snp in it. The bed tools command might go like this:

    Code:
    bedtools intersect -wa -bed -abam bam1.bam -b snp.vcf > bam1hits.bed
    If you leave out the -bed option it will produce a bam file in case you'd rather keep that format for any downstream analysis.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      The bam header has separate fields for read group ID @RG and sample name @SN. You could extract the header from the bam files using samtools view, write a one-liner to modify sample name while keeping the readgroup names different and use samtools reheader to re-add the header to the corresponding bam files.

      Comment


      • #4
        Thank you both for the suggestions. I have figured out a way using GATK multi-sample snp calling which is very straight forward. Variant calling is performed using GATK for both the bam files in a single step which gives a single vcf output file. The output file has all the variants detected with the total depth of each variant and along with it there are specific fields for each bam file which gives the number of reads coming from each of these bam files.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        56 views
        0 likes
        Last Post seqadmin  
        Working...
        X