Dear all,
I am trying to identify repeats in a plant genome assembly using RepeatModeler. Currently my understanding is that RepeatModeler encompasses several repeat finding tools, some of which identify repeats based on a random subset of the assembly (i.i.r.c. RECON does this).
In order to test reproducibility I ran RepeatModeler three times. Subsequently I used the generated libraries to mask the assembly using RepeatMasker. This effectively gives 6 different masking options:
Run 1
Run 2
Run 3
Run 1 + 2
Run 1 + 3
Run 1 + 2 + 3
I would expect there to be variation between Runs 1, 2 & 3, and that the total number of masked nucleotides increases with combining runs. This seems to be the case for combination 1 + 2 and 1 + 3. However, the total number of masked nucleotides in 1 + 2 + 3 is lower than any of the single runs.
Upon closer inspection, the details of what repeats are masked by RepeatMasker als don't add up. For instance:
Run1: SINEs: 0 0 bp 0.00 %
Run2: SINEs: 960 479077 bp 0.09 %
Run 1 + 2: SINEs: 833 420888 bp 0.08 %
Any insights would be greatly appreciated since I am at a loss on how this is possible?
I am trying to identify repeats in a plant genome assembly using RepeatModeler. Currently my understanding is that RepeatModeler encompasses several repeat finding tools, some of which identify repeats based on a random subset of the assembly (i.i.r.c. RECON does this).
In order to test reproducibility I ran RepeatModeler three times. Subsequently I used the generated libraries to mask the assembly using RepeatMasker. This effectively gives 6 different masking options:
Run 1
Run 2
Run 3
Run 1 + 2
Run 1 + 3
Run 1 + 2 + 3
I would expect there to be variation between Runs 1, 2 & 3, and that the total number of masked nucleotides increases with combining runs. This seems to be the case for combination 1 + 2 and 1 + 3. However, the total number of masked nucleotides in 1 + 2 + 3 is lower than any of the single runs.
Upon closer inspection, the details of what repeats are masked by RepeatMasker als don't add up. For instance:
Run1: SINEs: 0 0 bp 0.00 %
Run2: SINEs: 960 479077 bp 0.09 %
Run 1 + 2: SINEs: 833 420888 bp 0.08 %
Any insights would be greatly appreciated since I am at a loss on how this is possible?