Hi all - I have a few data sets (assembled sequences), which were generated using a few technologies - Sanger, Illumina (assembled with trinity), 454 (assembled with iAssembler, or in another way with mira and cap3). I would like to use all these datasets for a comparative analysis. My analysis requires that I minimize the amount of redundancies in each data set - that is, I prefer to have alternative splice variants clustered together, and choose a consensus for all of the together, rather than to have them separated.
I would highly appreciate any suggestions. Currently, I am thinking about running CAP3 on each assembled data set one time, in order to reduce redundancy and also homogenize the data a bit. I am new to all this thought, so I am not sure if this is the best options. I did compare cap3 and cdhit on artifical splice variants that I created - and reached the conclusion that cap3 is better for this job.
Thank you in advance for your help
I would highly appreciate any suggestions. Currently, I am thinking about running CAP3 on each assembled data set one time, in order to reduce redundancy and also homogenize the data a bit. I am new to all this thought, so I am not sure if this is the best options. I did compare cap3 and cdhit on artifical splice variants that I created - and reached the conclusion that cap3 is better for this job.
Thank you in advance for your help
Comment