I am entirely new to NextGen Sequencing data analysis and have been working on a project for a week. We have whole human exome 100bp paired end data from an Illumina HiSeq system, we are working on. Base calls are of good quality as assessed by FastQC.
I am using all open source software. Can you please tell me whether this a good pipeline for processing raw reads before variant calling?
Raw reads - Index reference genome with BWA - Align with BWA - sampe with BWA adding RG line - SAM to BAM with samtools - Mark and remove PCR duplicates with Picard - RealignerTargetCreator and IndelRealigner using knowns 1000G and Mills n 1000G with GATK - FixMateInformation with Picard - Count covariates using dbSNP135 and base quality score recalibration with GATK. All with default parameters.
Is there something you would suggest I modify?
In your opinion what is the best mutation caller for comparing cancer vs normal exomes, for further processing?
Any advice would be appreciated.
Thanks.
I am using all open source software. Can you please tell me whether this a good pipeline for processing raw reads before variant calling?
Raw reads - Index reference genome with BWA - Align with BWA - sampe with BWA adding RG line - SAM to BAM with samtools - Mark and remove PCR duplicates with Picard - RealignerTargetCreator and IndelRealigner using knowns 1000G and Mills n 1000G with GATK - FixMateInformation with Picard - Count covariates using dbSNP135 and base quality score recalibration with GATK. All with default parameters.
Is there something you would suggest I modify?
In your opinion what is the best mutation caller for comparing cancer vs normal exomes, for further processing?
Any advice would be appreciated.
Thanks.
Comment