Originally posted by lcollado
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Simulating longer PE Illumina reads,
Hi. I finally found this after searching a bit...on the metasim website there are two configuration files for both 60 and 80 pb PE illumina reads...basically, these contain all the parameters for Illumina PE error models and you can upload it as a configuration file. Hope that helps.
kathryn
Comment
-
Originally posted by nilshomer View PostAh, get the source code via git as there is not release yet:
Code:git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa
Just did a git clone
but I can't configure / make
$ ./configure
bash: ./configure: No such file or directory
Comment
-
Originally posted by KevinLam View Postis it missing some files?
Just did a git clone
but I can't configure / make
$ ./configure
bash: ./configure: No such file or directory
Code:sh autogen.sh
Comment
-
Thanks Nils!
Actually there's only one shell script so it's quite evident (my bad)
anyway i ran that
Code:sh autogen.sh Preparing the dnaa build system...please wait ERROR: Unable to locate GNU Autoconf. ERROR: To prepare the dnaa build system from scratch, at least version 2.52 of GNU Autoconf must be installed. autogen.sh does not need to be run on the same machine that will run configure or make. Either the GNU Autotools will need to be installed or upgraded on this system, or autogen.sh must be run on the source code on another system and then transferred to here. -- Cheers!
is it possible for you to include the autoconf files?
I do not have that installed on my system
Comment
-
Originally posted by KevinLam View PostThanks Nils!
Actually there's only one shell script so it's quite evident (my bad)
anyway i ran that
Code:sh autogen.sh Preparing the dnaa build system...please wait ERROR: Unable to locate GNU Autoconf. ERROR: To prepare the dnaa build system from scratch, at least version 2.52 of GNU Autoconf must be installed. autogen.sh does not need to be run on the same machine that will run configure or make. Either the GNU Autotools will need to be installed or upgraded on this system, or autogen.sh must be run on the source code on another system and then transferred to here. -- Cheers!
is it possible for you to include the autoconf files?
I do not have that installed on my system
Comment
-
wgsim
Hello,
I am using wgsim to generate simulated reads of 76bp length(Solexa).
The fastq that is generated - Is it solexa fastq or sanger fastq ? Since there is no options to specify the fastq type required, I thought it to be Sanger. Is it correct?
Thanks,
Srividya
Comment
-
Hello srividya,
I don't know the answer, but you can find out using the ASCII table: http://es.wikipedia.org/wiki/ASCII
Solexa fastq (>= 1.3) won't have any values below 64. Meaning that numbers (48 to 57 in decimal ASCII) shouldn't appear in the quality lines of your fastq file.
Greetings,
LeonardoL. Collado Torres, Ph.D. student in Biostatistics.
Comment
-
No problem and I'm glad you were able to solve your question
LeoL. Collado Torres, Ph.D. student in Biostatistics.
Comment
-
Questions regarding synthetic data generation
Hi all,
I found this thread about generating synthetic reads for Illumina platform and since I need to generate such synthetic data, I post my question here (as opposed to creating a new thread!).
1) is it possible to generate SE reads and not PE?
2) does anyone know the advantage/disanvantages of “wgsim” from SAMTOOLs vs. “dwgsim” from the DNAA package? What has been modified in dwgsim? it is not very clear to me, since the README file of DNAA package says that:
“This is a fork of the SAMtools wgsim, since certain assumptions are made that we do not agree with.”
what are these assumptions? What has been modified? Is there any publication that elaborates these issues?
3) is there any statistical consideration involved in the generation of the reads? e.g. larger genes on the genome get more reads? Or is there any distribution-related consideration while sheering the reference genome? is the errors distributed uniformly in both software?
4) any other recommendations for synthetic data generation?
Thank you for any help in advance
Comment
-
Originally posted by tldgID View PostHi all,
I found this thread about generating synthetic reads for Illumina platform and since I need to generate such synthetic data, I post my question here (as opposed to creating a new thread!).
1) is it possible to generate SE reads and not PE?
2) does anyone know the advantage/disanvantages of “wgsim” from SAMTOOLs vs. “dwgsim” from the DNAA package? What has been modified in dwgsim? it is not very clear to me, since the README file of DNAA package says that:
“This is a fork of the SAMtools wgsim, since certain assumptions are made that we do not agree with.”
what are these assumptions? What has been modified? Is there any publication that elaborates these issues?
3) is there any statistical consideration involved in the generation of the reads? e.g. larger genes on the genome get more reads? Or is there any distribution-related consideration while sheering the reference genome? is the errors distributed uniformly in both software?
4) any other recommendations for synthetic data generation?
Thank you for any help in advance
2) The fork was done to provide better color space (SOLiD) support, in particular to include the first color and adapter.
3) Random read placement, errors distributed according to the error rate.
Comment
-
Originally posted by nilshomer View Post1) Yes, specify "-2 0".
2) The fork was done to provide better color space (SOLiD) support, in particular to include the first color and adapter.
3) Random read placement, errors distributed according to the error rate.
Thank you Nils!
About Q2: so, if I need Illumina-like synthetic data, it won't make a difference to use “wgsim” or “dwgsim”?
About Q3: can you elaborate more about “Random read placement”? My understanding is that the error rate is pre-specified, then when the reads are generated, in each position, the nt can be changed according to the error rate. Is this related to “Random read placement” or you meant something else?
Thanks again
Comment
-
Q2: there are a number of differences, including left-justification of indels and small bug fixes. You will notice differences and I encourage you test both out as I cannot predict all the differences.
Q3: a read's start position is randomly drawn from all possible start positions. Random errors are then introduced according to the per-base error rate.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment