Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by chintanspy View Post
    I will try and let you know the output.



    The following reads map on to opposite strand
    CCCAAAACGTTTTTTCAGATTTTGCATTTT
    TCCCAAAACGTTTTTTCAGATTTTGCATTT
    TTCCCAAAACGTTTTTTCAGATTTTGCATT
    TTTCCCAAAACGTTTTTTCAGATTTTGCAT
    TTTTCCCAAAACGTTTTTTCAGATTTTGCA
    TTTTTCCCAAAACGTTTTTTCAGATTTTGC
    TTTTTTCCCAAAACGTTTTTTCAGATTTTG
    TTTTTTTCCCAAAACGTTTTTTCAGATTTT
    TTTTTTTTCCCAAAACGTTTTTTCAGATTT
    TTTTTTTTTCCCAAAACGTTTTTTCAGATT
    TTTTTTTTTTCCCAAAACGTTTTTTCAGAT
    Can you show the pairs in FASTQ format? I need to see the read names as well.

    Comment


    • #17
      Hi nils,

      I tried the latest code with commit "6c33e0e5c64c816a5b95c7eac155eef2b4b8155c".
      I have attached the outputs of the old and latest code

      each zip file consists of 2 files:
      1. *.bwa.read1.fastq
      2. *.sam

      I am still getting the same output as the previous code.
      Attached Files
      Regards,
      Chintan Vora

      Comment


      • #18
        You attached only the first read in each pair. The second end is found in "test_new_v.bwa.read2.fastq".

        The "-S" option controls how pairs are related (same strand/opposite strand). This doesn't preclude the individual reads to mapping to the reverse strand. Perhaps this is where the confusion lies?

        Comment


        • #19
          Originally posted by nilshomer View Post
          You attached only the first read in each pair. The second end is found in "test_new_v.bwa.read2.fastq".

          The "-S" option controls how pairs are related (same strand/opposite strand). This doesn't preclude the individual reads to mapping to the reverse strand. Perhaps this is where the confusion lies?
          Yes, you are right...mapping is done on both the strands.

          my *.bwa.read2.fastq file is empty as I had used "-2 0".

          So "-S" option says "default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent)". For SOLiD/Ion Torrent, reads have to be from same strand, right?

          The command that I used is
          dwgsim -1 30 -e 0 -E 0 -r 0 -X 0 -y 0 -R 0 -2 0 -c 2 -B -f ATGC test.fa ionData/test

          From the above command I expect
          1. no errors
          2. no mutations (no indels)
          3. no reads from opposite strand

          I get reads without errors and mutations but there are reads from opposite strand.
          My question is why am I getting the reads from negative strand ?

          Regards,
          Chintan Vora

          Comment


          • #20
            You are getting reads mapping to the negative strand since the simulator draws reads from either strand. The "-S" option does not affect this behavior. Again, there is no way to specify to have individual templates maps only to one strand or the other. This is drawn randomly (0.5 probability).

            The "-S" option specifies the strand relationship between the first end and second end. In this case you have no second end ("-2 0"), so this option has no effect. In the case you had "-2 100" or the like, the "-S" option would say are the first end and second end always on the same strand? For example, if the first end maps to the reverse strand then with "-S 1" the second end would also map to the reverse strand. Perhaps you should search the forum on the meaning of paired ends and mate pairs, since they are used in the description of "-S 1" and "-S 2".

            Comment


            • #21
              I got the term 'same strand' misinterpreted. I ignored the 'mate-pair' term.

              Now its clear.

              Thanks a lot for answering patiently.
              Regards,
              Chintan Vora

              Comment


              • #22
                Evaluating mapping

                In 'Evaluating mapping - dwgsim_eval' there are columns without "single quote" (i.e. mc, mi, mu, um, uu ) and with "single quote" (i.e mc', mi', mu', um', uu')

                I saw that these are cumulative values used for calculating sensitivity, FDR, positive predictive value.

                At a given threshold,

                sensitivity = True Positives(TP) / True Positives(TP) + False Negatives(FN)

                so TP = mc , right ?
                FN = um , right ?

                can you please elaborate on the fields with "single quote" ?
                Regards,
                Chintan Vora

                Comment


                • #23
                  Originally posted by chintanspy View Post
                  In 'Evaluating mapping - dwgsim_eval' there are columns without "single quote" (i.e. mc, mi, mu, um, uu ) and with "single quote" (i.e mc', mi', mu', um', uu')

                  I saw that these are cumulative values used for calculating sensitivity, FDR, positive predictive value.

                  At a given threshold,

                  sensitivity = True Positives(TP) / True Positives(TP) + False Negatives(FN)

                  so TP = mc , right ?
                  FN = um , right ?

                  can you please elaborate on the fields with "single quote" ?
                  Let me know if the documentation answers the question. If not, lets work on improving the documentation.
                  Download dnaa for free. DNAA is the DNA analysis package, for analyzing next-generation post-alignment whole genome resequencing data. Specifically, DNAA is able to find structural variation, SNP and indel variants, as well as evaluating the mapping and data quality.

                  Comment


                  • #24
                    Originally posted by nilshomer View Post
                    Let me know if the documentation answers the question. If not, lets work on improving the documentation.
                    http://sourceforge.net/apps/mediawik...n#Output_Table
                    There is an error on the link you provided

                    "There was an error processing your request ...


                    Group not found for name: dnaa "
                    Regards,
                    Chintan Vora

                    Comment


                    • #25
                      Originally posted by chintanspy View Post
                      There is an error on the link you provided

                      "There was an error processing your request ...


                      Group not found for name: dnaa "
                      Looks like sourceforge is having problems, here's another link:
                      Whole Genome Simulator for Next-Generation Sequencing - nh13/DWGSIM

                      Comment


                      • #26
                        My understanding :

                        True Positive (TP) Number of reads mapped correctly that should be mapped
                        False Positive (FP) Number of reads mapped incorrectly that should be mapped at simulation co-ordinate
                        False Positive (FP) Number of reads mapped that should be unmapped
                        False Negative (FN) Number of reads unmapped that should be mapped
                        True Negative (TN) Number of reads unmapped that should be unmapped

                        Now if I want to calculate sensitivity and specificity at given threshold (for example, at MAPQ 37)

                        I would calulate:

                        sensitivity at given threshold = mc / (mc + mu) = TP / (TP + FN)

                        specificity at given threshold = uu / (mi + um + uu) = TN / (FP + TN)

                        But, your formula for sensitivity at the threshold is (mc / (mc' + mi' + mu'))

                        So, I am not sure how much I am correct ?
                        Regards,
                        Chintan Vora

                        Comment


                        • #27
                          These values don't fall into TP/FP categories, as you would need four values for a contingency table. We have five, with "mi" not able to be grouped with any other. For example, "mi" is not a false positive. If it was a false positive, that means it should have been a negative. But wait, a negative means it should not be able to be mapped, but in fact, the definition of "mi" is that it should be mapped. This is the critical issue. On the other hand, it is not a false negative, since it was mapped. The issue is that it lies in between: it was mapped (positive), just not to the correct location (so not a true positive).

                          Again, we can't associate "mi" with a group in the contingency table.

                          Sensitivity should always be about how often do you map correctly when the read should have mapped correctly.

                          Specificity should be similar to the positive predictive value, namely, how often do you map it correctly when you map a read.

                          Edit: I wanted to say that it took me a while to reconcile this fact too, and you can see that in the git repository log, where I changed my definitions to come to this final conclusion during the course of the development of this software.
                          Last edited by nilshomer; 04-26-2012, 10:02 PM.

                          Comment


                          • #28
                            Originally posted by nilshomer View Post
                            These values don't fall into TP/FP categories, as you would need four values for a contingency table. We have five, with "mi" not able to be grouped with any other. For example, "mi" is not a false positive. If it was a false positive, that means it should have been a negative. But wait, a negative means it should not be able to be mapped, but in fact, the definition of "mi" is that it should be mapped. This is the critical issue. On the other hand, it is not a false negative, since it was mapped. The issue is that it lies in between: it was mapped (positive), just not to the correct location (so not a true positive).

                            Again, we can't associate "mi" with a group in the contingency table.
                            I agree with your explanation. There should be someway to resolve this. If I could have the counts of the read with same sequence will that be helpful ? So we could have probability the read 'X' could map to its simulation position with probability 'n' . But I guess it will be complicated. And it will be same case while identifying variations, right?


                            My other question was why mc', mi', mu' have be used for calculating sensitivity at the threshold instead of mc, mi, mu ?
                            Regards,
                            Chintan Vora

                            Comment


                            • #29
                              Originally posted by chintanspy View Post
                              My other question was why mc', mi', mu' have be used for calculating sensitivity at the threshold instead of mc, mi, mu ?
                              This is another subtlety, how do we actually know at which threshold a read should map? The aligner is the one that determines the mapping quality, but I don't know of a good way to have the "true" mapping quality.

                              Comment


                              • #30
                                Hi Nils,
                                just stumbled across dwgsim in my search for a read simulator, love the fact that I can generate Illumina, Solid and Ion-torrent on the same tool as I have data from all platforms. Nice touch as well on including the galaxy wrappers
                                At the risk of sounding ungrateful, is there an ETA on the dwgsim_pileup_eval.pl? You'd solve all my problems in one go.
                                Thanks for sharing,
                                Cheers
                                Seb

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                57 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                56 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X