Seqanswers Leaderboard Ad

**Rocketknight** · 08-01-2012, 03:48 AM

There is indeed, if you're using GATK. Make sure every BAM file has a unique read group (you can use Picard's AddOrReplaceReadGroups function), mark duplicates with Picard, then merge all the BAM files into one (each read will retain its read group identifier, so you can distinguish them later).

After that, proceed with the standard GATK pipeline on this single merged BAM file. This has several advantages over single-sample processing: Firstly, novel indels from one sample can be used to help realignment in other samples. Secondly, you can call variants for all samples simultaneously with GATK's UnifiedGenotyper. You can then do VQSR on this multi-sample VCF file, which allows you to use population-level information (InbreedingCoeff) to find false-positive SNPs.

**mboursnell** · 08-01-2012, 03:56 AM

Thanks. Do I use picard/MergeSamFiles to do the merging?

**Rocketknight** · 08-01-2012, 03:57 AM

Yep, that will work. (Edit: Make sure you have SORT_ORDER=coordinate set when you merge, as GATK will expect your BAMs to be sorted)

**mboursnell** · 08-01-2012, 03:59 AM

Do you use the Queue.jar and the DataProcessingPipeline.scala file to run the standard GATK pipeline, or do you make your own pipeline (e.g. in PERL) to do the same thing?

**Rocketknight** · 08-01-2012, 04:54 AM

I tinkered with GATK-Queue, but I had a couple of problems (and I'm not too familiar with Java/Scala), so in the end I just went with a simple Python script to run everything. I used Python's multiprocessing module to run multiple samples at once in order to take advantage of multiple cores without having to split single samples by region and recombine, but this won't be possible if you're merging all your BAM files into one (unless you have several multi-sample BAM files you'd like to process concurrently).

**mboursnell** · 08-01-2012, 04:57 AM

Would it be possible to have a look at your Python script to help me setting up my PERL script? [email protected]

**Rocketknight** · 08-01-2012, 05:22 AM

Sure thing, just sent it there.

**angelinasusan** · 02-03-2013, 08:19 PM

Could I take a look at your script?? I badly need some help with a pipeline I am building and this would be very helpful. my id : [email protected]

**mrood** · 02-28-2013, 08:31 AM

script please?

Hi, would anyone be willing to send me their script to look at? I am new to programming and would love an example to build my own off of! [email protected]
Thanks in advance!

**Jeremy** · 02-28-2013, 07:03 PM

You could also use Samtools mpileup and vcftools, it treats each bam file as a separate sample.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Data Processing Pipeline question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News