SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Summary of multi-sample VCF pravee1216 Bioinformatics 1 02-13-2013 10:27 AM
Large-Scale, Multi-Sample SNP Analysis Video DNASTAR Vendor Forum 0 10-26-2011 07:11 AM
An example, multi-sample VCF file? dagarfield Bioinformatics 0 10-18-2011 07:20 AM
How to deal with multi-sample NGS data? ssnowfox Bioinformatics 7 03-22-2011 01:49 PM

Reply
 
Thread Tools
Old 11-12-2012, 06:27 AM   #1
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default Multi sample SNP calling

Hi all,

I have two bam files, bam1 and bam2 with two different read groups. I would like GATK to treat as a single sample and do the SNP calling. And it can be done by giving same read group names to both the bam files and call SNP's by pooling them to a single bam file.

But my interest is, Lets say a SNP i covered by 30 reads, i am interested in finding out number of reads that have come from bam1 and number of reads from bam2.

How can we distinguish between the reads from two bam files after merging them into a single bam file with same read group name?

Is there a way or any tool to achieve this? Any suggestions!!!
meher is offline   Reply With Quote
Old 11-12-2012, 08:22 AM   #2
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Once you know the snp location you could copy it to a separate file and use bedtools' intersection command between that location and each of your bam files. If the snp output is in vcf format then you just need to make a new file with only the row of your snp in it. The bed tools command might go like this:

Code:
bedtools intersect -wa -bed -abam bam1.bam -b snp.vcf > bam1hits.bed
If you leave out the -bed option it will produce a bam file in case you'd rather keep that format for any downstream analysis.
sdriscoll is offline   Reply With Quote
Old 11-13-2012, 07:15 AM   #3
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 164
Default

The bam header has separate fields for read group ID @RG and sample name @SN. You could extract the header from the bam files using samtools view, write a one-liner to modify sample name while keeping the readgroup names different and use samtools reheader to re-add the header to the corresponding bam files.
vivek_ is offline   Reply With Quote
Old 11-15-2012, 12:28 PM   #4
meher
Member
 
Location: helsinki

Join Date: Jun 2011
Posts: 54
Default

Thank you both for the suggestions. I have figured out a way using GATK multi-sample snp calling which is very straight forward. Variant calling is performed using GATK for both the bam files in a single step which gives a single vcf output file. The output file has all the variants detected with the total depth of each variant and along with it there are specific fields for each bam file which gives the number of reads coming from each of these bam files.
meher is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO