Hello,
i am trying to assembling datasets that have large difference in reads coverage with mira. Apparently within the MIRA there are several parameters that can improve the assembly with this particular type of problem.
My command line is:
The parameters:
-LR:mxti=no, AS:epoq=no, LR:wqf=no are related to quality and have no impact on my problem.
But the parameter:
-AS:urd=no was just what I needed to solve my problem. But unfortunately, this did not happen.
For a better explanation of this parameters:
Does anyone know of some more parameters to help me with this problem? Or some set of parameters that work well for my case?
Thank you,
Andre
i am trying to assembling datasets that have large difference in reads coverage with mira. Apparently within the MIRA there are several parameters that can improve the assembly with this particular type of problem.
My command line is:
mira --job=genome,denovo,iontor,accurate -SK:not=20 -AS:urd=no -AS:sd=no -fasta IONTOR_SETTINGS -LR:mxti=no -LR:wqf=no -AS:epoq=no -FN:fai=input
The parameters:
-LR:mxti=no, AS:epoq=no, LR:wqf=no are related to quality and have no impact on my problem.
But the parameter:
-AS:urd=no was just what I needed to solve my problem. But unfortunately, this did not happen.
For a better explanation of this parameters:
[uniform_read_distribution(urd)=on|yes|1, off|no|0] Default is currently yes for genome assemblies and no for EST assemblies or assemblies with Solexa data.
Takes effect only if uniform read distribution ([-AS:urd]) is on. When set to yes, mira will analyse coverage of contigs built at a certain stage of the assembly and estimate an average expected coverage of reads for contigs. This value will be used in subsequent passes of the assembly to ensure that no part of the contig gets significantly more read coverage of reads that were previously identified as repetitive than the estimated average coverage allows for. This switch is useful to disentangle repeats that are otherwise 100% identical and generally allows to build larger contigs. It is expected to be useful for Sanger and 454 sequences. Usage of this switch with Solexa and Ion Torrent data is currently not recommended.
It is a real improvement to disentangle repeats, but has the side-effect of creating some "contig debris" (small and low coverage contigs, things you normally can safely throw away as they are representing sequence that already has enough
coverage). This switch must be set to no for EST assembly, assembly of transcripts etc. It is recommended to also switch this off for mapping assemblies.
Takes effect only if uniform read distribution ([-AS:urd]) is on. When set to yes, mira will analyse coverage of contigs built at a certain stage of the assembly and estimate an average expected coverage of reads for contigs. This value will be used in subsequent passes of the assembly to ensure that no part of the contig gets significantly more read coverage of reads that were previously identified as repetitive than the estimated average coverage allows for. This switch is useful to disentangle repeats that are otherwise 100% identical and generally allows to build larger contigs. It is expected to be useful for Sanger and 454 sequences. Usage of this switch with Solexa and Ion Torrent data is currently not recommended.
It is a real improvement to disentangle repeats, but has the side-effect of creating some "contig debris" (small and low coverage contigs, things you normally can safely throw away as they are representing sequence that already has enough
coverage). This switch must be set to no for EST assembly, assembly of transcripts etc. It is recommended to also switch this off for mapping assemblies.
Thank you,
Andre
Comment