Originally posted by chintanspy
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
This topic is closed.
X
X
-
You attached only the first read in each pair. The second end is found in "test_new_v.bwa.read2.fastq".
The "-S" option controls how pairs are related (same strand/opposite strand). This doesn't preclude the individual reads to mapping to the reverse strand. Perhaps this is where the confusion lies?
Comment
-
Originally posted by nilshomer View PostYou attached only the first read in each pair. The second end is found in "test_new_v.bwa.read2.fastq".
The "-S" option controls how pairs are related (same strand/opposite strand). This doesn't preclude the individual reads to mapping to the reverse strand. Perhaps this is where the confusion lies?
my *.bwa.read2.fastq file is empty as I had used "-2 0".
So "-S" option says "default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent)". For SOLiD/Ion Torrent, reads have to be from same strand, right?
The command that I used is
dwgsim -1 30 -e 0 -E 0 -r 0 -X 0 -y 0 -R 0 -2 0 -c 2 -B -f ATGC test.fa ionData/test
From the above command I expect
1. no errors
2. no mutations (no indels)
3. no reads from opposite strand
I get reads without errors and mutations but there are reads from opposite strand.
My question is why am I getting the reads from negative strand ?
Regards,
Chintan Vora
Comment
-
You are getting reads mapping to the negative strand since the simulator draws reads from either strand. The "-S" option does not affect this behavior. Again, there is no way to specify to have individual templates maps only to one strand or the other. This is drawn randomly (0.5 probability).
The "-S" option specifies the strand relationship between the first end and second end. In this case you have no second end ("-2 0"), so this option has no effect. In the case you had "-2 100" or the like, the "-S" option would say are the first end and second end always on the same strand? For example, if the first end maps to the reverse strand then with "-S 1" the second end would also map to the reverse strand. Perhaps you should search the forum on the meaning of paired ends and mate pairs, since they are used in the description of "-S 1" and "-S 2".
Comment
-
Evaluating mapping
In 'Evaluating mapping - dwgsim_eval' there are columns without "single quote" (i.e. mc, mi, mu, um, uu ) and with "single quote" (i.e mc', mi', mu', um', uu')
I saw that these are cumulative values used for calculating sensitivity, FDR, positive predictive value.
At a given threshold,
sensitivity = True Positives(TP) / True Positives(TP) + False Negatives(FN)
so TP = mc , right ?
FN = um , right ?
can you please elaborate on the fields with "single quote" ?Regards,
Chintan Vora
Comment
-
Originally posted by chintanspy View PostIn 'Evaluating mapping - dwgsim_eval' there are columns without "single quote" (i.e. mc, mi, mu, um, uu ) and with "single quote" (i.e mc', mi', mu', um', uu')
I saw that these are cumulative values used for calculating sensitivity, FDR, positive predictive value.
At a given threshold,
sensitivity = True Positives(TP) / True Positives(TP) + False Negatives(FN)
so TP = mc , right ?
FN = um , right ?
can you please elaborate on the fields with "single quote" ?
Comment
-
Originally posted by nilshomer View PostLet me know if the documentation answers the question. If not, lets work on improving the documentation.
http://sourceforge.net/apps/mediawik...n#Output_Table
"There was an error processing your request ...
Group not found for name: dnaa "Regards,
Chintan Vora
Comment
-
Originally posted by chintanspy View PostThere is an error on the link you provided
"There was an error processing your request ...
Group not found for name: dnaa "
Comment
-
My understanding :
True Positive (TP) Number of reads mapped correctly that should be mapped
False Positive (FP) Number of reads mapped incorrectly that should be mapped at simulation co-ordinate
False Positive (FP) Number of reads mapped that should be unmapped
False Negative (FN) Number of reads unmapped that should be mapped
True Negative (TN) Number of reads unmapped that should be unmapped
Now if I want to calculate sensitivity and specificity at given threshold (for example, at MAPQ 37)
I would calulate:
sensitivity at given threshold = mc / (mc + mu) = TP / (TP + FN)
specificity at given threshold = uu / (mi + um + uu) = TN / (FP + TN)
But, your formula for sensitivity at the threshold is (mc / (mc' + mi' + mu'))
So, I am not sure how much I am correct ?Regards,
Chintan Vora
Comment
-
These values don't fall into TP/FP categories, as you would need four values for a contingency table. We have five, with "mi" not able to be grouped with any other. For example, "mi" is not a false positive. If it was a false positive, that means it should have been a negative. But wait, a negative means it should not be able to be mapped, but in fact, the definition of "mi" is that it should be mapped. This is the critical issue. On the other hand, it is not a false negative, since it was mapped. The issue is that it lies in between: it was mapped (positive), just not to the correct location (so not a true positive).
Again, we can't associate "mi" with a group in the contingency table.
Sensitivity should always be about how often do you map correctly when the read should have mapped correctly.
Specificity should be similar to the positive predictive value, namely, how often do you map it correctly when you map a read.
Edit: I wanted to say that it took me a while to reconcile this fact too, and you can see that in the git repository log, where I changed my definitions to come to this final conclusion during the course of the development of this software.Last edited by nilshomer; 04-26-2012, 10:02 PM.
Comment
-
Originally posted by nilshomer View PostThese values don't fall into TP/FP categories, as you would need four values for a contingency table. We have five, with "mi" not able to be grouped with any other. For example, "mi" is not a false positive. If it was a false positive, that means it should have been a negative. But wait, a negative means it should not be able to be mapped, but in fact, the definition of "mi" is that it should be mapped. This is the critical issue. On the other hand, it is not a false negative, since it was mapped. The issue is that it lies in between: it was mapped (positive), just not to the correct location (so not a true positive).
Again, we can't associate "mi" with a group in the contingency table.
My other question was why mc', mi', mu' have be used for calculating sensitivity at the threshold instead of mc, mi, mu ?Regards,
Chintan Vora
Comment
-
Originally posted by chintanspy View PostMy other question was why mc', mi', mu' have be used for calculating sensitivity at the threshold instead of mc, mi, mu ?
Comment
-
Hi Nils,
just stumbled across dwgsim in my search for a read simulator, love the fact that I can generate Illumina, Solid and Ion-torrent on the same tool as I have data from all platforms. Nice touch as well on including the galaxy wrappers
At the risk of sounding ungrateful, is there an ETA on the dwgsim_pileup_eval.pl? You'd solve all my problems in one go.
Thanks for sharing,
Cheers
Seb
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
57 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
56 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment