![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RTA v2.8 : Conflicts with low complexity sequence | nickp | Illumina/Solexa | 2 | 06-04-2014 10:19 AM |
Changing dNTP Flow Order for Low-Complexity Template Regions | SeqNerd | Ion Torrent | 9 | 01-16-2012 07:28 AM |
PE sequencing of a lib with ONE end high and the other low complexity | ein_io | Illumina/Solexa | 4 | 12-01-2011 06:54 PM |
Sequencing low complexity libraries: effects on data | casbon | Illumina/Solexa | 7 | 09-06-2011 12:51 AM |
Help:primer and low complexity sequence filter | alvin1982 | Illumina/Solexa | 0 | 04-21-2010 08:05 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: CA Join Date: Sep 2009
Posts: 11
|
![]()
This question was asked elsewhere but I have the same question as below, any help.
1) I have many "low-complexity" reads. Some are simply polyA, polyC, > etc. But some others are runs of "ATATAT" or "CACACACA", etc. Previously > > I would have used "dust" on the command line to filter out this kind of > read in a fasta file. Any ideas on how to achieve similar functionality > in the ShortRead world? |
![]() |
![]() |
![]() |
#2 |
Member
Location: milan, italy Join Date: Aug 2008
Posts: 22
|
![]()
Look at :
https://stat.ethz.ch/pipermail/bioc-...ch/000191.html http://www.mail-archive.com/bioc-sig.../msg00148.html Though not a ShortRead package: http://genome.gsc.riken.jp/osc/engli...rc/tagdust.tgz Cheers |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Israel Join Date: Feb 2012
Posts: 2
|
![]()
Hi,
I am looking for a definition of low complexity reads for reads of variable lengths (about 100 nucleotides long). Right now, I am using the following definition: - Divide a read in subsegments of 32 nucleotides. (last subsegment is overlapping one before last) - Count number of unique tri-nucleotides in each segment. - If number of unique tri-nucleotides is smaller than 5, then the segment is of "low complexity" - If there is at least one "low complexity" segment, the read is considered "low complexity" Comments regarding the relevancy of this definition would be appreciated. Regards, Michael. |
![]() |
![]() |
![]() |
#4 | |
wiki wiki
Location: Cambridge, England Join Date: Jul 2008
Posts: 266
|
![]() Quote:
Why 5? How does your operational definition compare to SEG / Dust? |
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Israel Join Date: Feb 2012
Posts: 2
|
![]()
Hi Dan,
I am not familiar with the definition of SEG / Dust. Where can I find some details about it? Would you suggest another limit than 5 unique tri-nucleotides (higher?lower?)? Regards, Michael. Last edited by mdaskal; 02-05-2012 at 07:07 AM. |
![]() |
![]() |
![]() |
#6 |
wiki wiki
Location: Cambridge, England Join Date: Jul 2008
Posts: 266
|
![]()
I guess they are both in PubMed? Internet is slow here atm or else I'd link...
I don't suggest an alternative without data... so my question really was, did you analyse / benchmark your metric? i.e. what fraction of reads are low complexity at 3, 4, 5, 6, etc... |
![]() |
![]() |
![]() |
Thread Tools | |
|
|