Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to mask low coverage positions

    Hello,

    I am looking for a way to mask positions below a certain coverage, i.e., convert all positions below x-fold coverage to N.

    The problem is not calculating the coverage for each position (I can do that with BEDtools genomecov or BBmap pileup.sh), but the actual conversion of the input file. I was looking into BEDtools maskfasta for example, but couldn't find a way to set a specific coverage-threshold.

    Thanks for any suggestion!

  • #2
    The reason why masking performs better than trimming and no quality control in reducing the false-positive rate is apparent because masking maximizes the information content of a raw read while removing the base calls with low qualities. However, masking did not improve for the false-negative calls due to a low coverage. Therefore, we recommend masking as a preprocessing method to remove low quality base calls in NGS since it reduces the false-positive rate without sacrificing the false-negative rate. Masking could be used effectively in reducing the false-positive rate also for the identification of somatic mutations in cancer screening by RNA-seq.
    Clinical Research

    Comment


    • #3
      Try this code to mask coverage positions

      layout(local_size_x = 1, local_size_y = 1) in;
      writeonly uniform uimage2D img_output;

      struct LineSegment
      {
      vec2 mP1;
      vec2 mP2;
      };

      layout(std140) uniform GlyphData
      {
      int mTextureOffsetX; // Target X location in atlas
      int mTextureOffsetY; // Target Y location in atlas
      int mWidth; // Width of glyph
      int mHeight; // Height of glyph
      vec2 mBorderUnits; // Amount of border to preserve
      int mScanlineStartIndices[100]; // Start into the sorted line segment array

      }glyphData;

      layout(std140) uniform OutlineData
      {
      vec2 mSize; // Size of glyph in units
      vec2 mMinimum; // Minimum of glyph in units
      int mNumSortedOutlines; // Amount of outlines
      vec2 mSortedOutlinesP1[1000]; // Top point of outlines
      vec2 mSortedOutlinesP2[1000]; // Bottom point of outlines

      }outlineData;

      void main()
      {
      uint result = 0;

      ivec2 pixelCoords = ivec2(gl_GlobalInvocationID.xy);
      int startIndex = glyphData.mScanlineStartIndices[pixelCoords.y];

      if (startIndex != -1)
      {
      float yUnits = float(pixelCoords.y) / float(glyphData.mHeight);
      float nextYUnits = float(pixelCoords.y + 1) / float(glyphData.mHeight);

      vec2 sizeWithBorder = outlineData.mSize + glyphData.mBorderUnits * 2.0f;
      vec2 minimum = outlineData.mMinimum - glyphData.mBorderUnits;

      for (int index = startIndex; index != outlineData.mNumSortedOutlines; ++index)
      {
      LineSegment line;
      line.mP1 = (outlineData.mSortedOutlinesP1[index] - minimum) / sizeWithBorder;
      line.mP2 = (outlineData.mSortedOutlinesP2[index] - minimum) / sizeWithBorder;

      if (line.mP1.y >= nextYUnits)
      break;

      float dy = line.mP2.y - line.mP1.y;
      if (dy == 0.0)
      continue;

      float dx = line.mP2.x - line.mP1.x;
      float dx_dy = dx / dy;

      for (int subY = 0; subY != 4; ++subY)
      {
      for (int subX = 0; subX != 4; ++subX)
      {
      float subPixelYOffset = float(3 - subX) * (1.0 / 16.0) + (1.0 / 32.0) + float(subY) * 0.25f;
      float subPixelXOffset = float(subY) * (1.0 / 16.0) + (1.0 / 32.0) + float(subX) * 0.25f;

      float sampleY = yUnits + subPixelYOffset / float(glyphData.mHeight);
      if (sampleY >= line.mP1.y && sampleY < line.mP2.y)
      {
      float lineXUnits = line.mP1.x + (sampleY - line.mP1.y) * dx_dy;
      float curX = float(pixelCoords.x + subPixelXOffset) / float(glyphData.mWidth);
      if (curX > lineXUnits)
      {
      uint mask = 1 << ((3 - subX) + subY * 4);
      result ^= mask;
      }
      }
      }
      }
      }
      }

      ivec2 textureOffset = ivec2(pixelCoords.x + glyphData.mTextureOffsetX, pixelCoords.y + glyphData.mTextureOffsetY);
      imageStore(img_output, textureOffset, uvec4(result, 0, 0, 0));
      }
      Clinical Research

      Comment


      • #4
        You're on the right track with BEDtools. Use the 'genomecov' command with the '-bg' flag (BEDGRAPH format) to calculate coverage, filter using 'awk' (awk '$4 < THRESHOLD_VALUE') to retain the low-coverage intervals, then back to BEDtools 'maskfasta' to convert those intervals to Ns.
        Last edited by HESmith; 01-02-2019, 06:41 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM
        • seqadmin
          The Impact of AI in Genomic Medicine
          by seqadmin



          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
          02-26-2024, 02:07 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-14-2024, 06:13 AM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-08-2024, 08:03 AM
        0 responses
        71 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-07-2024, 08:13 AM
        0 responses
        80 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-06-2024, 09:51 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X