Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does a reliable consensus mean more reliable SNPs?

    Dear All,

    I'm relatively new to WGS analysis so please excuse any naivety on my part.

    Before getting the WGS sequences I have confirmed the presence or absence of certain oligonucleotides in various bacterial DNA samples. So I know I should see these sequences in the final consensus sequence.

    Is it true to think that if I can produce a more reliable consensus sequence then the SNP calls are also likely to be more reliable. I appreciate that there are many SNP quality filters etc that will also be applied that can lead to difference between a consensus and a SNP call, but I just wanted to get an idea of the overall correlation between the consensus and SNPs.

    If there is a high correlation between the two then surely if I make sure that my consensus sequences are as reliable as possible, when I come to calling the SNPs from the same mapped reads they will be more reliable???

    Apologies if I'm totally wrong about this.

    Best wishes lg36

  • #2
    Hi lg36,

    you're not wrong about this at all -- this is in fact a pretty important factor in SNP discovery.

    Your SNPs can only ever be as good as your reference and your mapping. If your reference contains errors, this will propagate right through into your SNP calls, and similarly if you mismap lots of reads you will also increase your false positive SNP rate.

    I routinely map the reads from the individual used to make the reference back to the reference before I do any mapping of other individuals onto that reference for SNP discovery. I then call SNPs on that mapping first, and I always get SNPs here.

    In a homozygous or haploid organism this will give you a list of positions where there reference most likely contains errors -- in an ideal case there should be zero SNPs when I map the reads back onto the reference that was made from the same reads. I don't know what you work with but I am fortunate in that I do a lot of work with cultivated barley which is essentially homozygous and that simplifies matters obviously.

    I then subtract the list of SNPs called there from any list of SNPs generated with reads from a different individual -- it's essentially a way of removing background noise. I guess if you have a heterozygous organism and it's well curated you could probably use a public, curated list of SNPs instead.

    This gives you much cleaner SNP sets and reduces the false positive rate but the caveat is that potentially you may be increasing your false negative rate (I don't have any data on this yet). It all depends on what your SNPs are for - if reliability is key, then this works well. You may also want to remove duplicates from the mapping -- that also reduces your FP rate.

    cheers

    Micha

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM
    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:57 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-06-2024, 07:17 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-02-2024, 08:06 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-30-2024, 12:17 PM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Working...
    X