I have a set of PacBio reads and I need to find which of them actually bridge two scaffolds of an existing assembly.
So far I've tried blasting the PacBio reads to the scaffolds of the assembly with BLASR and with the report I do the following:
cat report.blasr.csv | sort -k1,1 --stable | gawk '{if ( $1==old ) { print $0 }; old=$1; }' > sorted.blasr.csv
This gives me a file with the pacbio reads sorted by name so I can see reads that map to more than one scaffold, I can then manually look if the read aligns to the beginning of one scaffold and the end of another, but that is too much of monkey work and I can't think of a script that would do this. Anyone knows a tool that will give me this info?
Thanks community
So far I've tried blasting the PacBio reads to the scaffolds of the assembly with BLASR and with the report I do the following:
cat report.blasr.csv | sort -k1,1 --stable | gawk '{if ( $1==old ) { print $0 }; old=$1; }' > sorted.blasr.csv
This gives me a file with the pacbio reads sorted by name so I can see reads that map to more than one scaffold, I can then manually look if the read aligns to the beginning of one scaffold and the end of another, but that is too much of monkey work and I can't think of a script that would do this. Anyone knows a tool that will give me this info?
Thanks community