Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • use of export/sequence data

    Dear all,
    I am analysing sequencing data for pooled samples for a candidate gene to look for rare variants. Using the data from the illumina pipeline I first used the s_N_sequence.txt filtered data and mapped it to my candidate gene. If I understand correctly It is filtered by how well it aligns to the human genome, using certain parameters.
    If I repeat my analysis using the unfiltered data which is s_N_export.txt I get a better depth of coverage.
    Is it OK to use this data, or am I introducing errors?
    Because I already have some PCR introduced errors I am filtering out very low frequency snps from my data, so any very low frequency errors from the sequencing data will be filtered out here too.

    Any thoughts would be greatly appreciated.

    Best Wishes
    Michelle

  • #2
    Originally posted by mimi_lupton View Post
    Dear all,
    If I understand correctly It is filtered by how well it aligns to the human genome, using certain parameters.
    Michelle
    Michelle,

    The filtering is independent of alignment; it is based solely on the relative intensity of the fluorescent signals. There are two methods Illumina uses to calculate relative intensities called Chastity and Purity. Chastity is defined as the ratio of the intensity of the most intense base for a cluster divided by the sum of the most intense plus the second most intense signal. Purity is defined as the ratio of the most intense signal divided by the sum of all four fluorescent signals. The default parameter used by GERALD when filtering reads is CHASTITY ≥ 0.6. Stated another way (after doing a little algebra) the most intense signal must be at least 1.5x higher than the second most intense signal. Also, filter passing is only based on the signals over the first 12 cycles. I am not sure whether this means that the value must be ≥ 0.6 for each of those 12 cycles or that average is ≥ 0.6.

    You may have confused the read filtering with quality score calculation. Initial quality scores are based on the observed intensities but the scores may then be calibrated based on the alignment of the control sample to its reference sequence. Reads which do not pass filtering will have lower overall quality scores.

    Now given all that, I don't think I'm the one to answer your real question, can you use unfiltered reads to identify rare variants. I do know that MAQ uses the quality score information when calculating its alignment but I don't know if this carries over into their SNP calling algorithm(s). Hopefully someone with more experience in SNP analysis will offer some input.

    Comment


    • #3
      Thanks for your reply, that makes thing clearer.
      Because I am looking at pools of lots of individuals I am not using the MAQ SNP calling algorithm, but calling my own SNPs using the pile up function. So the main question I am asking really is whether the non filtered data aligned to the reference is reliable.

      Any thoughts would be greatly appreciated.

      Thanks
      Michelle

      Comment


      • #4
        The current filter is quite strong in that it may filter a lot of good data. People are arguing a lot whether/how to use unfiltered data, but I think most of them agree we should at least apply some filters. If you do not want to invent time on studying better filters, I would recommend to use the filter implemented in the pipeline.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X