I wonder if I'm doing the things right; it takes so long.
I'm going to find SNPs for about 100 giga bps for the human genome.
After alignment, I processed my reads with samtools, picard and GATK.
1. Converting (divided for alignment efficiency) SAM to BAM format. [samtools]
~500 min.
2. Sorting the BAM format. [samtools]
~12 h.
3. Merging divided BAM formats to 1 BAM format per sample-run. [picard MergeSamFiles.jar]
4. Adding read groups which I didn't include in the SAM files. [picard AddOrReplaceReadGroups.jar]
5. Indexing sorted & merged BAM output. [samtools]
6. Create suspicious intervals for realignment. [GATK RealignTargetCreator]
7. Realign to remove false-positive SNPs and correct for false-negative indels - if this is the right explanation. [GATK IndelRealigner]
8. Marking plausible PCR duplicates. [picard MarkDuplicates.jar]
9. Re-indexing sorted-merged-realigned-deduplicated BAM. [samtools]
10. Recalibrating base quality [GATK CountCovariates, TableRecalibration]
11. Re-indexing sorted-merged-realigned-deduplicated-recalibrated BAM. [samtools]
12. Calling SNPs and indels [-glm BOTH] with GATK Bayesian caller. [GATK UnifiedGenotyper.jar]
~3.2 hours for my toy; ~1/500 of my real sample size.
13. Filter SNVs ## I haven't reached here yet.
Is it normal to take this long for pre-processing map/alignment results?
Before I go into troubleshooting I came here to ask for comments.
Hope you guys give me some comments.
Have a great day!!
I'm going to find SNPs for about 100 giga bps for the human genome.
After alignment, I processed my reads with samtools, picard and GATK.
1. Converting (divided for alignment efficiency) SAM to BAM format. [samtools]
~500 min.
2. Sorting the BAM format. [samtools]
~12 h.
3. Merging divided BAM formats to 1 BAM format per sample-run. [picard MergeSamFiles.jar]
4. Adding read groups which I didn't include in the SAM files. [picard AddOrReplaceReadGroups.jar]
5. Indexing sorted & merged BAM output. [samtools]
6. Create suspicious intervals for realignment. [GATK RealignTargetCreator]
7. Realign to remove false-positive SNPs and correct for false-negative indels - if this is the right explanation. [GATK IndelRealigner]
8. Marking plausible PCR duplicates. [picard MarkDuplicates.jar]
9. Re-indexing sorted-merged-realigned-deduplicated BAM. [samtools]
10. Recalibrating base quality [GATK CountCovariates, TableRecalibration]
11. Re-indexing sorted-merged-realigned-deduplicated-recalibrated BAM. [samtools]
12. Calling SNPs and indels [-glm BOTH] with GATK Bayesian caller. [GATK UnifiedGenotyper.jar]
~3.2 hours for my toy; ~1/500 of my real sample size.
13. Filter SNVs ## I haven't reached here yet.
Is it normal to take this long for pre-processing map/alignment results?
Before I go into troubleshooting I came here to ask for comments.
Hope you guys give me some comments.
Have a great day!!
Comment