Hi,
This is my first attempt at aligning SOLiD paired end sequence which is whole exome data.
F3 file has reads minlength=50, F5-BC file has reads minlength=35
Previously I have only worked with Illumina sequence.
The following are the commands I used to do an assembly using BWA.
Unfortunately as you will see the results from samtools flagstat are disappointing. Only 57% reads mapped, & worse still 0.5% properly paired.
Can anyone tell me where I'm going wrong?
Thank you
alig
PS I've played with the bwa aln parameters a bit with no improvement. I'm wondering whether it's the solid2fastq step or even my reference sequence. Any help will be greatly appreciated.
#Created colorspace index
bwa index -a bwtsw -c hg18_noran_mask.fa
#modified solid2fastq.pl script
95 #if (/^>(\d+)_(\d+)_(\d+)_[FR]3/) {
96 if (/^>(\d+)_(\d+)_(\d+)_[F3|R3|F5-P2]/) {
#also renamed the F5-P2 to R3:
solid_data_F5-P2.csfasta -> solid_data_R3.csfasta
solid_data_F5-P2_QV.qual -> solid_data_R3_QV.qual
#converted to fastq.gz
perl ../../BWA_new/bwa-0.5.9/solid2fastq_mod.pl solid_data solid_data
# bwa aln
./../BWA_new/bwa-0.5.9/bwa aln -n 0.04 -o 1 -e -1 -d 5 -i 5 -l 25 -k 2 -t 4 -M 3 -O 11 -E 4 -c -q 0 ../cs_index/hg18_noran_mask.fa solid_data.read1.fastq.gz > solid_data_read1_bwa_hg18.sai
Repeated for fastq 2 file for paired end reads
#bwa sampe
../../BWA_new/bwa-0.5.9/bwa sampe -n 3 -N 10 ../cs_index/hg18_noran_mask.fa solid_data_read1_bwa_hg18.sai solid_data_read2_bwa_hg18.sai solid_data.read1.fastq.gz solid_data.read2.fastq.gz > solid_data.sam
#samtools
samtools view -bT ../cs_index/hg18_noran_mask.fa solid_data.sam -o solid_data.bam (as no .fai file from indexed RefSeq)
samtools sort solid_data.bam solid_data_sorted
samtools index solid_data_sorted.bam
samtools flagstat solid_data.bam
235101678 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
134185711 mapped (57.08%:- nan%)
235101678 paired in sequencing
117550839 read 1
117550839 read 2
1196926 properly paired (0.51%:- nan%)
89122266 with itself & mate mapped
45063445 singletons (19.1%:- nan%)
2798688 with mate mapped to different chr
945434 with mate mapped to a different chr (mapQ>=5)
This is my first attempt at aligning SOLiD paired end sequence which is whole exome data.
F3 file has reads minlength=50, F5-BC file has reads minlength=35
Previously I have only worked with Illumina sequence.
The following are the commands I used to do an assembly using BWA.
Unfortunately as you will see the results from samtools flagstat are disappointing. Only 57% reads mapped, & worse still 0.5% properly paired.
Can anyone tell me where I'm going wrong?
Thank you
alig
PS I've played with the bwa aln parameters a bit with no improvement. I'm wondering whether it's the solid2fastq step or even my reference sequence. Any help will be greatly appreciated.
#Created colorspace index
bwa index -a bwtsw -c hg18_noran_mask.fa
#modified solid2fastq.pl script
95 #if (/^>(\d+)_(\d+)_(\d+)_[FR]3/) {
96 if (/^>(\d+)_(\d+)_(\d+)_[F3|R3|F5-P2]/) {
#also renamed the F5-P2 to R3:
solid_data_F5-P2.csfasta -> solid_data_R3.csfasta
solid_data_F5-P2_QV.qual -> solid_data_R3_QV.qual
#converted to fastq.gz
perl ../../BWA_new/bwa-0.5.9/solid2fastq_mod.pl solid_data solid_data
# bwa aln
./../BWA_new/bwa-0.5.9/bwa aln -n 0.04 -o 1 -e -1 -d 5 -i 5 -l 25 -k 2 -t 4 -M 3 -O 11 -E 4 -c -q 0 ../cs_index/hg18_noran_mask.fa solid_data.read1.fastq.gz > solid_data_read1_bwa_hg18.sai
Repeated for fastq 2 file for paired end reads
#bwa sampe
../../BWA_new/bwa-0.5.9/bwa sampe -n 3 -N 10 ../cs_index/hg18_noran_mask.fa solid_data_read1_bwa_hg18.sai solid_data_read2_bwa_hg18.sai solid_data.read1.fastq.gz solid_data.read2.fastq.gz > solid_data.sam
#samtools
samtools view -bT ../cs_index/hg18_noran_mask.fa solid_data.sam -o solid_data.bam (as no .fai file from indexed RefSeq)
samtools sort solid_data.bam solid_data_sorted
samtools index solid_data_sorted.bam
samtools flagstat solid_data.bam
235101678 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
134185711 mapped (57.08%:- nan%)
235101678 paired in sequencing
117550839 read 1
117550839 read 2
1196926 properly paired (0.51%:- nan%)
89122266 with itself & mate mapped
45063445 singletons (19.1%:- nan%)
2798688 with mate mapped to different chr
945434 with mate mapped to a different chr (mapQ>=5)
Comment