CDHIT Doesn't report the total number of sequences correctly, any known fix?
I am using CDHIT to reduce redundacy in a dataset of 20405 peptides, CDHIT seem to work fine but it identifies only 18404 peptides as shown in the output code below:
Code:
Program: CDHIT, V4.8.1 (+OpenMP), Nov 13 2019, 13:22:53 Command: cdhit i BD_final_con_nombres.fasta o BDpos.fa c 0.9 g 1 T 0 M 0 n 5 Started: Mon Dec 16 16:51:57 2019 ================================================================ Output  total number of CPUs in the system is 12 Actual number of CPUs to be used: 12 total seq: 18404 longest and shortest : 300 and 11 Total letters: 737624 Sequences have been sorted Approximated minimal memory consumption: Sequence : 3M Buffer : 12 X 10M = 129M Table : 2 X 65M = 131M Miscellaneous : 0M Total : 263M Table limit with the given memory limit: Max number of representatives: 744016 Max number of word counting entries: 14908239 # comparing sequences from 0 to 1314 . new table with 840 representatives # comparing sequences from 1314 to 2534  994 remaining sequences to the next cycle  new table with 187 representatives # comparing sequences from 1540 to 2744  1023 remaining sequences to the next cycle  new table with 117 representatives # comparing sequences from 1721 to 2912  1010 remaining sequences to the next cycle  new table with 110 representatives # comparing sequences from 1902 to 3080  996 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 2084 to 3249  962 remaining sequences to the next cycle  new table with 123 representatives # comparing sequences from 2287 to 3438  953 remaining sequences to the next cycle  new table with 116 representatives # comparing sequences from 2485 to 3622  958 remaining sequences to the next cycle  new table with 117 representatives # comparing sequences from 2664 to 3788  935 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 2853 to 3963  932 remaining sequences to the next cycle  new table with 124 representatives # comparing sequences from 3031 to 4129  891 remaining sequences to the next cycle  new table with 113 representatives # comparing sequences from 3238 to 4321  700 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 3621 to 4676  844 remaining sequences to the next cycle  new table with 115 representatives # comparing sequences from 3832 to 4872  822 remaining sequences to the next cycle  new table with 154 representatives # comparing sequences from 4050 to 5075  760 remaining sequences to the next cycle  new table with 127 representatives # comparing sequences from 4315 to 5321  768 remaining sequences to the next cycle  new table with 138 representatives # comparing sequences from 4553 to 5542  737 remaining sequences to the next cycle  new table with 118 representatives # comparing sequences from 4805 to 5776  727 remaining sequences to the next cycle  new table with 111 representatives # comparing sequences from 5049 to 6002  707 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 5295 to 6231  651 remaining sequences to the next cycle  new table with 127 representatives # comparing sequences from 5580 to 6496  629 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 5867 to 6762  563 remaining sequences to the next cycle  new table with 115 representatives # comparing sequences from 6199 to 7070  585 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 6485 to 7336  521 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 6815 to 7642  545 remaining sequences to the next cycle  new table with 116 representatives # comparing sequences from 7097 to 7904  514 remaining sequences to the next cycle  new table with 127 representatives # comparing sequences from 7390 to 8176  550 remaining sequences to the next cycle  new table with 110 representatives # comparing sequences from 7626 to 8395  551 remaining sequences to the next cycle  new table with 123 representatives # comparing sequences from 7844 to 8598  529 remaining sequences to the next cycle  new table with 118 representatives # comparing sequences from 8069 to 8807  465 remaining sequences to the next cycle  new table with 139 representatives # comparing sequences from 8342 to 9060  438 remaining sequences to the next cycle  new table with 140 representatives # comparing sequences from 8622 to 9320  431 remaining sequences to the next cycle  new table with 130 representatives # comparing sequences from 8889 to 9568  392 remaining sequences to the next cycle  new table with 117 representatives # comparing sequences from 9176 to 9835  377 remaining sequences to the next cycle  new table with 114 representatives # comparing sequences from 9458 to 10097  364 remaining sequences to the next cycle  new table with 130 representatives # comparing sequences from 9733 to 10352  373 remaining sequences to the next cycle  new table with 122 representatives # comparing sequences from 9979 to 10580 .......... 10000 finished 5044 clusters  326 remaining sequences to the next cycle  new table with 113 representatives # comparing sequences from 10254 to 10836  296 remaining sequences to the next cycle  new table with 124 representatives # comparing sequences from 10540 to 11101  285 remaining sequences to the next cycle  new table with 107 representatives # comparing sequences from 10816 to 11358  260 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 11098 to 11619  245 remaining sequences to the next cycle  new table with 130 representatives # comparing sequences from 11374 to 11876  277 remaining sequences to the next cycle  new table with 157 representatives # comparing sequences from 11599 to 12085  246 remaining sequences to the next cycle  new table with 146 representatives # comparing sequences from 11839 to 12307  223 remaining sequences to the next cycle  new table with 146 representatives # comparing sequences from 12084 to 12535  225 remaining sequences to the next cycle  new table with 128 representatives # comparing sequences from 12310 to 12745  225 remaining sequences to the next cycle  new table with 117 representatives # comparing sequences from 12520 to 12940  184 remaining sequences to the next cycle  new table with 108 representatives # comparing sequences from 12756 to 13159  190 remaining sequences to the next cycle  new table with 131 representatives # comparing sequences from 12969 to 13357  180 remaining sequences to the next cycle  new table with 122 representatives # comparing sequences from 13177 to 13550  154 remaining sequences to the next cycle  new table with 129 representatives # comparing sequences from 13396 to 13753  167 remaining sequences to the next cycle  new table with 102 representatives # comparing sequences from 13586 to 13930  149 remaining sequences to the next cycle  new table with 115 representatives # comparing sequences from 13781 to 14111  143 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 13968 to 14284  99 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 14185 to 14486  112 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 14374 to 14661  69 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 14592 to 14864  78 remaining sequences to the next cycle  new table with 118 representatives # comparing sequences from 14786 to 15044  76 remaining sequences to the next cycle  new table with 115 representatives # comparing sequences from 14968 to 15213  72 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 15141 to 15374  51 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 15323 to 15543  53 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 15490 to 15698  9 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 15689 to 15882 .................... new table with 89 representatives # comparing sequences from 15882 to 16062  1 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 16061 to 16228  2 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 16226 to 16381  11 remaining sequences to the next cycle  new table with 100 representatives # comparing sequences from 16370 to 16515 .................. new table with 90 representatives # comparing sequences from 16515 to 16649 ................... new table with 77 representatives # comparing sequences from 16649 to 16774 .................. new table with 73 representatives # comparing sequences from 16774 to 16890 ................... new table with 57 representatives # comparing sequences from 16890 to 16998 .................. new table with 56 representatives # comparing sequences from 16998 to 17098 .................. new table with 59 representatives # comparing sequences from 17098 to 17191 ................... new table with 63 representatives # comparing sequences from 17191 to 17277 ................. new table with 47 representatives # comparing sequences from 17277 to 17357 ................ new table with 49 representatives # comparing sequences from 17357 to 17431 .................. new table with 42 representatives # comparing sequences from 17431 to 18404 ..................... new table with 536 representatives 18404 finished 9584 clusters Approximated maximum memory consumption: 265M writing new database writing clustering information program completed ! >header SEQUENCE Also the command: Code:
grep c '>' BD_final_con_nombres.fasta Does anybody know any way to fix this? 
