![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CD-HIT-EST-2D Error: segmentation fault ('core' dumped) | Joel_91 | Bioinformatics | 1 | 05-21-2020 02:11 PM |
Plasmid Database / API which offers data in JSON format | jessica-jordan | Bioinformatics | 0 | 03-22-2016 09:37 AM |
What is the difference between the GEO database and TSA database in genbank ? | anuj2054 | Bioinformatics | 0 | 03-03-2015 01:59 PM |
Removing homopolymer associated indels from EST assembly | JackieBadger | Bioinformatics | 1 | 03-08-2012 09:12 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Sweden Join Date: Feb 2016
Posts: 5
|
![]()
Hey all,
I have the following problem. I have a plasmid sequence database (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plasmid/) that is heavily redundant. I have been trying to remove redundancy and to obtain a set of representative sequences using cd-hit-est (http://weizhong-lab.ucsd.edu/cd-hit/...hit_user_guide) as follows: Code:
cd-hit-est -i fastadb -o outfilename -c 0.95 -n 9 -g 1 Now to my problem: Removing the redundancy from the database does not seem to work. Two sequences that are 100% identical over 100% of the sequence length (they have the same length) end up in different clusters instead of the same one. I have checked the similarity of the sequences aligning them through BLAST, and as stated above, the sequences are identical. The output clustering file looks like this: Code:
>Cluster 39 0 6222nt, >gi|410475454|ref|NC... * >Cluster 40 0 6211nt, >gi|387504713|ref|NC... at +/98.10% 1 6222nt, >gi|41056918|ref|NC_... * 2 6222nt, >gi|118480566|ref|NC... at +/98.09% >Cluster 41 0 6222nt, >gi|844749291|ref|NZ... * Does anyone know what the problem here might be? Am I missing something? Thanks in advance! |
![]() |
![]() |
![]() |
Tags |
cd-hit, clustering failed, fasta, sequence |
Thread Tools | |
|
|