Unconfigured Ad

**DrYak** · 04-07-2016, 06:58 AM

Hi,

Well, I found (to my chagrin) that cd-hit has an aux tools package containing the cd-hit-dup tool.

I do not, however, get the same results using cd-hit-est and cd-hit-dup.

If I use cd-hit with the following parameters:

cd-hit-est -i in.fasta -o out -c 0.95 -n 10 -d 0 - T 20

I get 85497 finished 69413 clusters

i.e. 69413 clusters from 85497 starting sequences.

If I use cd-hit-dup with the following parameters:

cd-hit-dup -i in.fasta -o out-nodupes.fasta -m false -e 0.05 -f true

Which as far as I know should have the same similarity cut-off (95%) and remove smaller sequences (-m false) and chimeras, I get:

Number of reads: 85497
Number of clusters found: 82927
Number of chimeric clusters found: 6

i.e 82921 clusters from 85497 starting sequences.

Can someone suggest an explanation for the such a huge difference?

Thanks in advance.

**mastal** · 04-07-2016, 07:05 AM

I think what you want is software that calls a consensus sequence from each cluster, rather than dedupe.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Dedupe on assembled RNA-Seq?

Comment

Comment

Latest Articles

ad_right_rmr

News