Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to mask low coverage positions

    Hello,

    I am looking for a way to mask positions below a certain coverage, i.e., convert all positions below x-fold coverage to N.

    The problem is not calculating the coverage for each position (I can do that with BEDtools genomecov or BBmap pileup.sh), but the actual conversion of the input file. I was looking into BEDtools maskfasta for example, but couldn't find a way to set a specific coverage-threshold.

    Thanks for any suggestion!

  • #2
    The reason why masking performs better than trimming and no quality control in reducing the false-positive rate is apparent because masking maximizes the information content of a raw read while removing the base calls with low qualities. However, masking did not improve for the false-negative calls due to a low coverage. Therefore, we recommend masking as a preprocessing method to remove low quality base calls in NGS since it reduces the false-positive rate without sacrificing the false-negative rate. Masking could be used effectively in reducing the false-positive rate also for the identification of somatic mutations in cancer screening by RNA-seq.
    Clinical Research

    Comment


    • #3
      Try this code to mask coverage positions

      layout(local_size_x = 1, local_size_y = 1) in;
      writeonly uniform uimage2D img_output;

      struct LineSegment
      {
      vec2 mP1;
      vec2 mP2;
      };

      layout(std140) uniform GlyphData
      {
      int mTextureOffsetX; // Target X location in atlas
      int mTextureOffsetY; // Target Y location in atlas
      int mWidth; // Width of glyph
      int mHeight; // Height of glyph
      vec2 mBorderUnits; // Amount of border to preserve
      int mScanlineStartIndices[100]; // Start into the sorted line segment array

      }glyphData;

      layout(std140) uniform OutlineData
      {
      vec2 mSize; // Size of glyph in units
      vec2 mMinimum; // Minimum of glyph in units
      int mNumSortedOutlines; // Amount of outlines
      vec2 mSortedOutlinesP1[1000]; // Top point of outlines
      vec2 mSortedOutlinesP2[1000]; // Bottom point of outlines

      }outlineData;

      void main()
      {
      uint result = 0;

      ivec2 pixelCoords = ivec2(gl_GlobalInvocationID.xy);
      int startIndex = glyphData.mScanlineStartIndices[pixelCoords.y];

      if (startIndex != -1)
      {
      float yUnits = float(pixelCoords.y) / float(glyphData.mHeight);
      float nextYUnits = float(pixelCoords.y + 1) / float(glyphData.mHeight);

      vec2 sizeWithBorder = outlineData.mSize + glyphData.mBorderUnits * 2.0f;
      vec2 minimum = outlineData.mMinimum - glyphData.mBorderUnits;

      for (int index = startIndex; index != outlineData.mNumSortedOutlines; ++index)
      {
      LineSegment line;
      line.mP1 = (outlineData.mSortedOutlinesP1[index] - minimum) / sizeWithBorder;
      line.mP2 = (outlineData.mSortedOutlinesP2[index] - minimum) / sizeWithBorder;

      if (line.mP1.y >= nextYUnits)
      break;

      float dy = line.mP2.y - line.mP1.y;
      if (dy == 0.0)
      continue;

      float dx = line.mP2.x - line.mP1.x;
      float dx_dy = dx / dy;

      for (int subY = 0; subY != 4; ++subY)
      {
      for (int subX = 0; subX != 4; ++subX)
      {
      float subPixelYOffset = float(3 - subX) * (1.0 / 16.0) + (1.0 / 32.0) + float(subY) * 0.25f;
      float subPixelXOffset = float(subY) * (1.0 / 16.0) + (1.0 / 32.0) + float(subX) * 0.25f;

      float sampleY = yUnits + subPixelYOffset / float(glyphData.mHeight);
      if (sampleY >= line.mP1.y && sampleY < line.mP2.y)
      {
      float lineXUnits = line.mP1.x + (sampleY - line.mP1.y) * dx_dy;
      float curX = float(pixelCoords.x + subPixelXOffset) / float(glyphData.mWidth);
      if (curX > lineXUnits)
      {
      uint mask = 1 << ((3 - subX) + subY * 4);
      result ^= mask;
      }
      }
      }
      }
      }
      }

      ivec2 textureOffset = ivec2(pixelCoords.x + glyphData.mTextureOffsetX, pixelCoords.y + glyphData.mTextureOffsetY);
      imageStore(img_output, textureOffset, uvec4(result, 0, 0, 0));
      }
      Clinical Research

      Comment


      • #4
        You're on the right track with BEDtools. Use the 'genomecov' command with the '-bg' flag (BEDGRAPH format) to calculate coverage, filter using 'awk' (awk '$4 < THRESHOLD_VALUE') to retain the low-coverage intervals, then back to BEDtools 'maskfasta' to convert those intervals to Ns.
        Last edited by HESmith; 01-02-2019, 06:41 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X