View Single Post
Old 06-30-2014, 01:12 PM   #5
Location: Prague, Czech Republic

Join Date: Nov 2010
Posts: 40

Hi, it happened I did all the development on my own so currently I only offer a data cleanup as a service (or even assembly). It is not only the code (28k lines of python code) but also a collection of artefacts which I found more 'manually' than by any 'computer-based' approach. They are not so abundant in one dataset while maybe you hit them in some other later on ...

I am a molecular biologist and with some datasets (transcriptomes) I had a lot of fun while looking for the restrictions sites, ligation results, and namely tried to come up with an answer how they emerged and how to generalize queries for them. To date I developed/tested it on 2227 datasets, better not counting how many times I re-calculated all of them from scratch once I realized something has been escaping me to date. ( You wouldn't believe that I am still finding datasets produced by yet another lab protocol with yet another batch of primers/adapters and associated issues.

It even works on at least some WGS IonTorrent datasets as the lab protocols are just same. If I am not mistaken it was started by people who left 454 so some ideas and issues are common to both.

Unfortunately, I cannot share the code or even the queries. You can find URL in my Profile.

For your particular case, I think it is better to get more sequencing data, the 43bp are too short these days and I doubt it is worth the efforts.
martin2 is offline   Reply With Quote