Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • programs for filtering low complexity

    This question was asked elsewhere but I have the same question as below, any help.

    1) I have many "low-complexity" reads. Some are simply polyA, polyC,
    > etc. But some others are runs of "ATATAT" or "CACACACA", etc. Previously
    >
    > I would have used "dust" on the command line to filter out this kind of
    > read in a fasta file. Any ideas on how to achieve similar functionality
    > in the ShortRead world?

  • #3
    Hi,

    I am looking for a definition of low complexity reads for reads of variable lengths (about 100 nucleotides long).

    Right now, I am using the following definition:
    - Divide a read in subsegments of 32 nucleotides. (last subsegment is overlapping one before last)
    - Count number of unique tri-nucleotides in each segment.
    - If number of unique tri-nucleotides is smaller than 5, then the segment is of "low complexity"
    - If there is at least one "low complexity" segment, the read is considered "low complexity"

    Comments regarding the relevancy of this definition would be appreciated.

    Regards, Michael.

    Comment


    • #4
      Originally posted by mdaskal View Post
      Hi,

      I am looking for a definition of low complexity reads for reads of variable lengths (about 100 nucleotides long).

      Right now, I am using the following definition:
      - Divide a read in subsegments of 32 nucleotides. (last subsegment is overlapping one before last)
      - Count number of unique tri-nucleotides in each segment.
      - If number of unique tri-nucleotides is smaller than 5, then the segment is of "low complexity"
      - If there is at least one "low complexity" segment, the read is considered "low complexity"

      Comments regarding the relevancy of this definition would be appreciated.

      Regards, Michael.

      Why 5?

      How does your operational definition compare to SEG / Dust?
      Homepage: Dan Bolser
      MetaBase the database of biological databases.

      Comment


      • #5
        Low complexity reads

        Hi Dan,

        I am not familiar with the definition of SEG / Dust.
        Where can I find some details about it?

        Would you suggest another limit than 5 unique tri-nucleotides (higher?lower?)?

        Regards, Michael.
        Last edited by mdaskal; 02-05-2012, 07:07 AM.

        Comment


        • #6
          I guess they are both in PubMed? Internet is slow here atm or else I'd link...

          I don't suggest an alternative without data... so my question really was, did you analyse / benchmark your metric? i.e. what fraction of reads are low complexity at 3, 4, 5, 6, etc...
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Investigating the Gut Microbiome Through Diet and Spatial Biology
            by seqadmin




            The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
            02-24-2025, 06:31 AM
          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin




            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          149 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-28-2025, 12:58 PM
          0 responses
          223 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-24-2025, 02:48 PM
          0 responses
          590 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-21-2025, 02:46 PM
          0 responses
          259 views
          0 likes
          Last Post seqadmin  
          Working...
          X