I have aligned Illumina Mate Pair data using bwa. I only care about large structural variation, and would like to discard paired ends with iSize less then 1000. This would get rid of a lot of small fragments that represent artifacts of the library construction. How can I do this?
The steps I have followed to now are summarized below:
# create bwa index for human genome
bwa index -a bwtsw human_g1k_v37.fasta
#Reverse complement fastq to convert from Mate Pair to Paired End
fastx_reverse_complement -i R1_Ill.fastq -o R1.fastq
fastx_reverse_complement -i R2_Ill.fastq -o R2.fastq
#Align individual reads
bwa aln -I human_g1k_v37.fasta R1.fastq>R1.sai
bwa aln -I human_g1k_v37.fasta R2.fastq>R2.sai
# Align paired ends
bwa sampe human_g1k_v37.fasta R1.sai R2.sai R1.fastq R2.fastq>pe.sam
# create BAM file
samtools view -bT human_g1k_v37.fasta -o pe.bam pe.sam
sort pe.bam pe_sort
samtools index pe_sort.bam
# discard concordant pairs or low quality or unmapped or unmapped mate
samtools view -b -q 37 -F 1806 pe_sort.bam -o pe_discordant.bam
The steps I have followed to now are summarized below:
# create bwa index for human genome
bwa index -a bwtsw human_g1k_v37.fasta
#Reverse complement fastq to convert from Mate Pair to Paired End
fastx_reverse_complement -i R1_Ill.fastq -o R1.fastq
fastx_reverse_complement -i R2_Ill.fastq -o R2.fastq
#Align individual reads
bwa aln -I human_g1k_v37.fasta R1.fastq>R1.sai
bwa aln -I human_g1k_v37.fasta R2.fastq>R2.sai
# Align paired ends
bwa sampe human_g1k_v37.fasta R1.sai R2.sai R1.fastq R2.fastq>pe.sam
# create BAM file
samtools view -bT human_g1k_v37.fasta -o pe.bam pe.sam
sort pe.bam pe_sort
samtools index pe_sort.bam
# discard concordant pairs or low quality or unmapped or unmapped mate
samtools view -b -q 37 -F 1806 pe_sort.bam -o pe_discordant.bam
Comment