I am trying to get going with paired-end mapping using BFAST 0.7.0a, but stalling so far.
I have 50bp F3 reads and 35bp F5-BC reads from a SOLiD4.
This is my current workflow:
- filterer the reads to remove poor quality reads
- combine F3 and F5-BC read files to an interleaved FASTQ file using solid2fastq.
- remove reads that do not have a corresponding partner (due to quality filtering step)
so only paired 50bp and 35bp reads in FASTQ, e.g.:
- run bfast match
- run bfast localalign
-run bfast postprocess
- convert SAM to BAM and run flagstat
No valid pairs!
If i map the F3 and F5_BC individually i get about 55% mapping for F3, but 0% for F5_BC.
The flags in the SAM file also indicate the reads are indeed in pairs, bit the downstream read does not match (89 for F3 and 165 for F5_BC).
So the F5_BC is not mapping at all for some reason. Am i doing something silly here?!
Thanks for any guidance!
Ian
I have 50bp F3 reads and 35bp F5-BC reads from a SOLiD4.
This is my current workflow:
- filterer the reads to remove poor quality reads
- combine F3 and F5-BC read files to an interleaved FASTQ file using solid2fastq.
- remove reads that do not have a corresponding partner (due to quality filtering step)
so only paired 50bp and 35bp reads in FASTQ, e.g.:
Code:
@2_58_1022 T11030012.13.031.13..220020201100120232000.0010..00 + 8B>@7<A9!A=!A?9!BA!!;B?B:@A?@79A=?7/B??=/!&?.4!!>< @2_58_1022 G03012022110321000220002310222320030 + .386;BA749?>50<@8591@:=7+6'1)3+&4%/ @2_59_549 T32212131.1203201231302132200022200213130200102.020 + A?=@@@B@!B?@8@@-?A?@>A@@=<>>6?(//;=*;6?60'1+<1!0'? @2_59_549 G31302031000200031220202301033020330 + 93<A=B/>7+'525=;896@8/=4/?>:@9/=B6B
Code:
bfast match -f genome.fa -A 1 -n 1 -t -r interleaved.fastq > interleaved.bmf
Code:
bfast localalign -f genome.fa -m interleaved.bmf -A 1 -n 1 -t > interleaved.baf
Code:
bfast postprocess -f genome.fa -i interleaved.baf -A 1 -O 1 -n 1 -t -a 2 -Y 0 > interleaved.sam
Code:
31104184 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 8402049 + 0 mapped (27.01%:-nan%) 31104184 + 0 paired in sequencing 15552092 + 0 read1 15552092 + 0 read2 0 + 0 properly paired (0.00%:-nan%) 0 + 0 with itself and mate mapped 8402049 + 0 singletons (27.01%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
If i map the F3 and F5_BC individually i get about 55% mapping for F3, but 0% for F5_BC.
The flags in the SAM file also indicate the reads are indeed in pairs, bit the downstream read does not match (89 for F3 and 165 for F5_BC).
So the F5_BC is not mapping at all for some reason. Am i doing something silly here?!
Thanks for any guidance!
Ian
Comment