Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • low mapping efficiency of cd-hit-otu

    Hi,
    I have an Illumina paired-end data set of a couple million ITS-2 sequences. I am using cd-hit-otu to do chimera checking, merging the pairs, and clustering into OTUs.

    I have been getting pretty low mapping with a mean of 25% across 20 samples ranging between 1.2-53%. I have looked through the cd-hit-otu clustering folders and it seems I am loosing a lot of reads in the Assembly by looking at the link.log. Chimeras don't seem to be the problem as I am loosing very few reads here.

    There doesn't seem to be a lot of information on how the reads are processed into the link.log file. When I look in the link.log file , I see that of the 573,596 cleaned reads I start with, only 180,847 are being used. This value seems to match up with the number of reads in the wrong-contigs.ids.

    Does anyone have experience with cd-hit-otu or a similar problem of "missing reads?" Alternatively, can anyone provide me with more information on the link.log file or wrong-contigs.ids file in cd-hit-otu?

    This product no longer seems to be supported unfortunately by the developers

    Thanks in advance

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
57 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
56 views
0 likes
Last Post seqadmin  
Working...
X