View Single Post
Old 10-20-2012, 08:24 PM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

It is pretty remarkable that OpGen doesn't provide this, especially when a workable one can be written in a page of Perl or Python. I'm a bit too tired to do so tonight, but here is an outline (maybe next week I'll actually throw this together & post it)

1) Read the MapSolver placement report file -- there is a one line header and then tab-delimited lines with the information of interest; stop scanning when you hit a blank line (there are later pieces with other information).
2) Parse the lines just read. Optical map id is column 0, start and end on map are columns 1 & 2, contig id in column 3 (plus mapping method; need to strip this out by removing 1st space and everything afterwards), start & end positions of contig are columns 4 & 5 and the orientation in column 6. If placement did not include entire contig, need to decide whether to truncate contig to what MapSolver liked OR cram entire piece in
4) Generate list of contigs plus the intervening gaps. If MapSolver thinks they overlapped, need to decide whether to still pad with a gap or not
5) Read in FASTA file with contigs, saving those that are needed; trim & reverse complement as needed
6) Build scaffold(s)
7) Write scaffold(s)

SIMPLE! :-)

Bio::SeqIO & Bio::Seq will be essential for doing this in Perl
krobison is offline   Reply With Quote