Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MAQ: Obtain quality & read depth info non-variant (referent) sites

    I am trying to compare the genotypes from sequence data (GA2) to those of an Illumina array chip.

    I used MAQ to align the sequence data, which made it very easy to obtain information about read depth and consensus quality for VARIANT sites via the cns.final.snp file.

    Unfortunately, if a SNP is not listed in this file (because it was either not covered, low quality, homozygous referent, etc.), I must use the cns2view command to obtain the same types of information about it. This is fine, but I would also like to pass these sites through the same set of quality filters as cns.final.snp.

    Thus, I tried to run my cns.view file through SNPFilter using the same parameters that were used to produce cns.final.snp. 2 problems resulted. First, the list of remaining sites was much lower than that of cns.final.snp (108,492 cns.final.view, 186,232 cns.final.snp). This seems strange since cns.view contains all the same information as cns.snp and more! Second, I realized that neither of the cns.final.* files are being filtered on concensus quality (column 5 in cns.snp), even though the parameter (-q) is set to 40.

    Can anyone provide insight to these problems?
    Can SNPFilter only be used with variant sites? If so, can someone suggest a way to replicate the filters so that I can apply them to other sites.
    What is up with the -q parameter? Why does it not seem to be doing anything?

    Thanks in advance for any assistance!

  • #2
    SNPfilter explained!

    Alright, I dug through the perl script (maq.pl), and I think I've got it all figured out. Just in case anyone else runs into similar difficulties, I'll go ahead and discuss what I found.

    First, the list of remaining sites was much lower than that of cns.final.snp (108,492 cns.final.view, 186,232 cns.final.snp). This seems strange since cns.view contains all the same information as cns.snp and more!
    The reason that I had so many less results in the cns.final.view than in the cns.final.snp is because of the -N (max SNPS in window) and -W (size of window) parameters . Basically, since the cns.view file had a line for virtually every bp, it (often) appeared to SNPfilter that there was too many SNPS (bp's) in the window and it would discard them all. Of course, this is perfectly fine if we're actually dealing with SNPS, but since I was trying to filter non-variant sites, these parameters started to cause problems.

    I have yet to test this, but I believe this can be avoided by setting N>W. Note: default value for W is 10.
    Second, I realized that neither of the cns.final.* files are being filtered on concensus quality (column 5 in cns.snp), even though the parameter (-q) is set to 40.
    It seems that there are two ways to pass the consensus quality filter. (1) Have a consensus quality greater than the cut-off (2) Have a reference base different from the 2nd best call AND have the (consensus quality + log likelihood ratio of 2nd & 3rd best call) > the cut-off. In other words, (column5 + column11)>cut-off. I don't understand the purpose of this 2nd criteria, but that explains why so many low consensus quality sites were making it through.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    25 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    22 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Working...
    X