Seqanswers Leaderboard Ad

**seb567** · 07-25-2012, 04:42 AM

You can try digital normalization by the group of C. Titus Brown.

https://github.com/ged-lab/khmer

Originally posted by Lisa0508 View Post

Hi everyone,
I am a newbie to this forum. I have been dealing with the 100Gb data from the Illumina Hiseq 2000 recently. Before assembly, I want to remove some sequencing error or highly repetitive reads by counting the k-mer frequencies. I used Meryl to count the k-mers because it supported k-mer size larger than 32. I set the k-mer value to be 59 and obtained the output k-mers that counted more than 5 times. But after that, I totally had no idea about how to pick out reads where those low-abundant k-mers were from.

Shall I use the CD-hit-est-2D to align the 101bp reads against the low-abundant k-mers? In case that the k-mers (eg: 59-mer) as reference are shorter than the query101bp reads, will it work correctly to separate the 101bp reads into the matched fold or mismatched fold? Could someone kindly give me any suggestion? I am really lost.
Best regards

**Lisa0508** · 07-25-2012, 06:11 PM

Thanks a lot! I was thinking that my setting of k-mer size might be too high. A lot of k-mers were thus below the threshold. We discarded too many reads which led to a lower coverage. Although the khmer package doesn't support a k size more than 32, 31-mer might be enough to remove the sequencing errors. By the way, I had some trouble in installing the "Screed". I will try to figure it out. If still I can not figure it out, can you give me some advice?

Best Regards,
Lisa

**seb567** · 07-26-2012, 03:52 AM

I believe that the group that created khmer is working on a k>32 version.

For trouble with Screed, you can ask for advice in this thread I guess.

If you want to do a de novo assembly, but you have too much data, you may want to try Ray. Ray can run on several computers and is really easy to install and use. I am a coauthor of Ray,

Originally posted by Lisa0508 View Post

Thanks a lot! I was thinking that my setting of k-mer size might be too high. A lot of k-mers were thus below the threshold. We discarded too many reads which led to a lower coverage. Although the khmer package doesn't support a k size more than 32, 31-mer might be enough to remove the sequencing errors. By the way, I had some trouble in installing the "Screed". I will try to figure it out. If still I can not figure it out, can you give me some advice?

Best Regards,
Lisa

Sébastien Boisvert

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

How to remove the reads whose k-mers are more or less than an abundance threshold

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News