Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage

    SliderII is now available from:

    High quality SNP calling using Illumina data at minimal coverage


    Sorry for the delay,

    Nawar

  • #2
    Thanks for this. I am always fancinated by slider. I guess this is the first SNP caller that explicitly use four quality values. James Bonfield and Mark Daly both believe and show some preliminary result that using four values leads to better SNP calls. Some comments on the figures at your website:

    1. It is interesting to see you also come to the point of using known allele frequency as a prior, the same as BGI's SNP caller. When I did SNP calling for that NA18507, I also suggested this, but all the rest of people said it is cheating somehow and rejected my suggestion. They more like to think there are two problems: SNP discovery and genotyping. For SNP discovery, we only use a flat prior and for genotyping, we use the allele frequency.

    2. How Slider detect paralogous regions? To detect CNV first and then filter out the SNPs in CNVs? I agree that setting maximum depth as is used by maq is not a good way.

    3. I am not sure if I read your paper properly. As I understand, only one mutation (not sequencing errors) is allowed on one read. Is that right?

    Comment


    • #3
      step by step

      I checked http://www.bcgsc.ca/platform/bioinfo/software/SliderII
      and think it does alignment by steps.

      # Alignment.Java: Find read locations on the reference sequence with an exact match and one-off match (one base mismatch) to prb derived sequences.
      # Extend.java: Expand reads to include up to 3 mismatches

      Comment


      • #4
        Any insight on how slider results compare to MAQ SNP calling on single/paired data?

        Originally posted by lh3 View Post
        Thanks for this. I am always fancinated by slider. I guess this is the first SNP caller that explicitly use four quality values. James Bonfield and Mark Daly both believe and show some preliminary result that using four values leads to better SNP calls. Some comments on the figures at your website:

        1. It is interesting to see you also come to the point of using known allele frequency as a prior, the same as BGI's SNP caller. When I did SNP calling for that NA18507, I also suggested this, but all the rest of people said it is cheating somehow and rejected my suggestion. They more like to think there are two problems: SNP discovery and genotyping. For SNP discovery, we only use a flat prior and for genotyping, we use the allele frequency.

        2. How Slider detect paralogous regions? To detect CNV first and then filter out the SNPs in CNVs? I agree that setting maximum depth as is used by maq is not a good way.

        3. I am not sure if I read your paper properly. As I understand, only one mutation (not sequencing errors) is allowed on one read. Is that right?
        --
        bioinfosm

        Comment


        • #5
          Yes, just check the link:

          High quality SNP calling using Illumina data at minimal coverage


          N.

          Comment


          • #6
            - Regarding paralogous, Slider identify paralogous SNPs (and contig edge SNPs) as they are likely to be at the edges of the reads.
            - Yes, Slider (and SliderII) allows up to one mutation, plus, it consider all possible bases in prb data, and when using PET reads, SliderII force align reads if other side is aligned.

            Nawar

            Comment


            • #7
              2. Do you mean you exclude SNPs towards the ends of a read? These are the false SNPs caused by indels. A better strategy would be to filter out SNPs close to predicted indels.

              3. Sorry that I did not read through the whole page. I now realize that this is a seeding-extension algorithm. You allow maximum one mutation in the seed but may extend the seed to allow more. By the way, the page said "the smaller the seed size is, the faster the alignment will be". Is this a typo?

              Comment


              • #8
                > Hi,
                >
                > I am very interested in your SNP Caller SilderII. I am trying to use it. I have one question for you. What's the meaning of SNP_in in the config file? I didn't find the explanation for the item from sliderII website.
                >
                > Thank you very much.
                >
                > Rebecca

                SNP_in is the expected number of bases in the reference genome for each one SNP, for the human genome, this number should be 1000.

                Nawar

                Comment


                • #9
                  What coordinate system are you using for generating a table of known snps to feed into SliderII. Is this 1 based? or 0 based? Does anyone have a table for mouse 2007, mm9?

                  Comment


                  • #10
                    Hi Nix,

                    I used the Ensembl Variation database (version 50) SNPs.
                    You need to adjust the format.

                    Nawar

                    Comment


                    • #11
                      Is SliderII's paper published?
                      And the picture in this link can not be displayed
                      High quality SNP calling using Illumina data at minimal coverage
                      Last edited by pengchy; 10-09-2011, 11:52 PM. Reason: more questions

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 08:47 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      59 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      54 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X