Dear all,
This is my first post to this extremely useful forum about NGS (congratulations to everybody, and specially to the creators and maintainers), so that I would like to introduce myself to the community. My name is Javier Forment, and I am the responsible (and the only member) of a Bioinformatics Core Facility at the Institute for Molecular an Cell Plant Biology (IBMCP, for 'Instituto de Biologia Molecular y Celular de Plantas', www.ibmcp.upv.es), a research center in Valencia, Spain, belonging to the Universidad Politecnica de Valencia (www.upv.es) and the Spanish Scientific Research Council (www.csic.es).
I would like to use SAET (http://solidsoftwaretools.com/gf/project/saet/) to improve the quality of SOLiD data produced by the different research groups at the IBMCP, and have a question about the parameter providing the expected length of the sequence from which reads were sampled. In one case, this sequence is a not-yet-finished draft genome, so that it has a significant amount of N's in some regions of the scaffolds.
So the question is: do I must include these N's in the reference sequence size? Does it really matters? (size with N's: 794 MB; size without N's: 721 MB)
Thanks in advance and best regards,
Javier.
This is my first post to this extremely useful forum about NGS (congratulations to everybody, and specially to the creators and maintainers), so that I would like to introduce myself to the community. My name is Javier Forment, and I am the responsible (and the only member) of a Bioinformatics Core Facility at the Institute for Molecular an Cell Plant Biology (IBMCP, for 'Instituto de Biologia Molecular y Celular de Plantas', www.ibmcp.upv.es), a research center in Valencia, Spain, belonging to the Universidad Politecnica de Valencia (www.upv.es) and the Spanish Scientific Research Council (www.csic.es).
I would like to use SAET (http://solidsoftwaretools.com/gf/project/saet/) to improve the quality of SOLiD data produced by the different research groups at the IBMCP, and have a question about the parameter providing the expected length of the sequence from which reads were sampled. In one case, this sequence is a not-yet-finished draft genome, so that it has a significant amount of N's in some regions of the scaffolds.
So the question is: do I must include these N's in the reference sequence size? Does it really matters? (size with N's: 794 MB; size without N's: 721 MB)
Thanks in advance and best regards,
Javier.