Hello everyone,
I created a pipeline for alignment/variant calling for exomes that gathered from NextSeq500. Could you review this pipeline and share your thoughts about it?
What are the possible missing parts, flaws or unnecessary steps of this pipeline?
Thank you.
I created a pipeline for alignment/variant calling for exomes that gathered from NextSeq500. Could you review this pipeline and share your thoughts about it?
What are the possible missing parts, flaws or unnecessary steps of this pipeline?
Code:
cat ./*.fastq.gz > ./merged.fastq.gz bwa aln -t 12 ./refgenome.fa ./merged.fastq.gz > ./raw.sai bwa samse ./refgenome.fa ./raw.sai ./merged.fastq.gz > ./raw.sam samtools view -b -S ./raw.sam > ./raw.bam samtools view -bF 4 ./raw.bam > ./filtered.bam samtools sort ./filtered.bam ./sorted.bam rm ./*.sai rm ./*.sam java -Xmx1024m -jar Picard/AddOrReplaceReadGroups.jar I= ./sorted.bam O= ./sorted_all.bam SORT_ORDER=coordinate RGID=ID RGLB=${PWD##*/} RGPL=Illumina RGSM=${PWD##*/} RGPU=NXT001 RGCN=Done CREATE_INDEX=True java -Xmx1024m -jar GATK/gatk.jar -T UnifiedGenotyper -nct 12 -R ./refgenome.fa -I ./sorted_all.bam --dbsnp /dbsnp/dbsnp_138.hg19.vcf -o ./variant.vcf -stand_call_conf 50.0 -stand_emit_conf 10.0 -glm BOTH rm ./variant.vcf.idx java -Xmx1024m -jar GATK/gatk.jar -R ./refgenome.fa -T SelectVariants --variant ./variant.vcf -select "DP >= 5.0" -o ./variant_filtered.vcf --intervals exome.bed
Thank you.
Comment