Dear all,
I am new to the annotation of novel genome assembly.
So far, many workflows of genome annotation will first carried out a repeat masking step prior to gene model construction (http://www.nature.com/nrg/journal/v1...l/nrg3174.html, http://www.gmod.org/wiki/MAKER and https://www.cosmoss.org/physcome_pro...n_pipeline/JGI etc).
These approaches generally claimed that the 'repeats' will generate false positive annotations. But I could not see the reason why we need to do the masking before contructing the gene models. Why is it not a good idea to generate the models and then classified whehther these are repeats or gene regions later? Is it because that will slow down the analysis or the order is actually essential to the correct construction of a 'true' gene model?
Thanks!
zx
I am new to the annotation of novel genome assembly.
So far, many workflows of genome annotation will first carried out a repeat masking step prior to gene model construction (http://www.nature.com/nrg/journal/v1...l/nrg3174.html, http://www.gmod.org/wiki/MAKER and https://www.cosmoss.org/physcome_pro...n_pipeline/JGI etc).
These approaches generally claimed that the 'repeats' will generate false positive annotations. But I could not see the reason why we need to do the masking before contructing the gene models. Why is it not a good idea to generate the models and then classified whehther these are repeats or gene regions later? Is it because that will slow down the analysis or the order is actually essential to the correct construction of a 'true' gene model?
Thanks!
zx