Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sklages
    replied
    There a still two things unclear:
    a) cap3 32/64bit?
    b) what type of input data?

    Does this " >2,50,000" mean 250K or 2.5M input reads?

    Another recommendation for a clustering program may be "wcd" (http://code.google.com/p/wcdest/).
    For 454-generated data (as well as for sanger data) I use MIRA3 (http://www.chevreux.org/projects_mira.html) ..

    cheers,
    Sven

    Leave a comment:


  • darked89
    replied
    Bharat,

    you may try one of sequence clustering programs:

    uclust: http://www.drive5.com/uclust/
    CD-HIT: http://www.bioinformatics.org/cd-hit/

    I have used uclust on rather small set (30k gss NCBI FastA sequences) but it did run without any problems.

    Darek Kedra

    Leave a comment:


  • gpertea
    replied
    Bharat,
    Dividing the input arbitrarily like that doesn't sound right, you're likely to get redundant or incomplete contigs etc.
    TGICL or other clustering tool should be used to partition the input data if the assembler is not able to do it by itself..
    I can provide some limited assistance for TGICL even though I haven't used it in a while and I thought it would be deprecated by now (I wrote those scripts many years ago when I was working on EST clustering myself). So if you can't find a better assembly solution for your data I would suggest you could try fixing TGICL to make it work for you. You can start by looking at all the err_* files left around by TGICL (not only in the main working directory, but also look into the asm_* subdirectories), look in there for any suspicious error messages, perhaps you'll find the exact cause of TGICL failure and address that (or let me know what errors you see there and perhaps I can help with fixing them).
    Last edited by gpertea; 01-23-2010, 06:45 PM.

    Leave a comment:


  • Bharat
    replied
    Would it be a good approach if I divide my data sets in to files of 50,000 sequences each and then I use CAP3 and concatinate the all resultant singiltons and contig files???

    Please advice me.

    Leave a comment:


  • Bharat
    replied
    I am using transcriptiomic datasets. I have treated my data with "seqclean" that removed the vector contaminants from the set of sequences. I have also tried TGICL... It generates the ACE file, cluster file and contig file but it is unable to generate singiltons file. In error reporting file the message displays in last as follows

    "The clusters are stored in file 'My_data_set.fasta_cl_clusters'

    >>> --- ASSEMBLE [My_data_set.fasta] started at Jan 14 12:06:58 2010

    Process terminated with an error, at step 'ASSEMBLE'!
    tgicl (My_data_set.fasta) encountered an error at step ASSEMBLE
    Working directory was /root/Desktop/Software_Collection/Assembly/tgicl_linux."


    As per I think, there is some problem in last step which is unable to make singiltons file.
    secondly, this sort of message is displayed when I am using large as well as small data sets.

    Similar log and results when I swithched to OpenSUSE Operating system machine.

    I would also like to know, Is there any matrix constructed with our n number of sequences when we use the TGCIL and CAP3 softwares, so that it take lot of time ???
    Last edited by Bharat; 01-22-2010, 08:39 PM.

    Leave a comment:


  • jjohnson
    replied
    What type of data is it? 454 I assume, maybe Sanger? There are several option out there for WGS - I don't have much experience with CAP3. Newbler or Celera Assembler could prove to be viable alternatives.

    Leave a comment:


  • gpertea
    replied
    Obviously CAP3 is running out of memory with your data, too many input reads at once. What kind of data is it, genomic or transcriptome ?

    Either way, you might want to try some cleaning and clustering first (with a package like TGICL) so you assemble the clusters instead of the whole thing at once (unless all those reads are suppose to assemble into a single chromosome). Also I hope you are using the 64bit version of CAP3 that is able to make use of that memory (check http://seq.cs.iastate.edu/cap3.html for the latest version for your platform, if you haven't done it already, download the version for 64-bit Linux system with an Intel processor).

    Leave a comment:


  • Bharat
    started a topic Requesting advice regarding CAP3

    Requesting advice regarding CAP3

    Hello everybody,

    Now a days, I am working with CAP3 program which assembles the sequences into contigs and singletons. When i use the program with small data set i.e. upto 50,000 sequences input file, the program works perfect and give me the output files. But when I use large data sets i.e. >2,50,000 sequence input file, the command prompt displays

    $ ran out of memory, -195789398 bytes requested.

    sombody please explain me about this message and possible solution as I have to use even more more large data sets

    I am using RH Linux operating system with 1 TB hardisk, 8 GB RAM and Quad core Xeon Proecessor.

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:29 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-15-2024, 06:35 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-14-2024, 02:44 PM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-11-2024, 06:55 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Working...
X