Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How long does velvetg take for large genomes?

    Hi All,
    I have a single lane of Illumina HiSeq 2000 101-bp paired end reads on a single genome (hop--est. genome size 2.8 Gb) along with two RNA-Seq experiments (same conditions as genome seq) that I'm attempting to assemble using Velvet (in advance, please don't criticize use of velvet...I will be using other assembly packages in future). All three "experiments" were done on different genotypes. I've run just the genome sequence data with all ambiguities removed as "single" reads and have successfully completed runs with velvet. It took approximately 35 hours to complete assembly--but of course, assembly using the above settings was not great (only 1/3 genome covered with N50=270). I've also run the RNA-Seq experiments as "single" reads and have seen velvetg run to completion in a similar amount of time.

    I have now processed all reads to remove all orphaned reads resulting from paired end read processing (removed all ambiquities and trimmed) and combined these orphaned reads into a single fastq.gz file. I submitted all the processed paired end read files (as "shortPaired" reads) along with orphaned read files (as "short" reads) from genome sequence and two RNA-seq experiments back on Tuesday (April 2nd) on a 1000 Gb RAM machine and velvetg has been running ever since then (96 hours). Using "top" command on UNIX machine has showed significant changes in amount of RAM used through the early parts of assembly but the last 2 days have shown a consistent use of 640 Gb RAM with one processor running at 100 %.

    My question is this, is this length of time normal for velvetg for assembling such a large dataset or has velvetg just run into a continuous loop and it will never run to completion?
    Last edited by genetics_jo; 04-05-2014, 07:36 AM.

  • #2
    I think if you re-compile velvet with 'OPENMP=1' it will be able to use more than 1 processor.

    Comment


    • #3
      Originally posted by mastal View Post
      I think if you re-compile velvet with 'OPENMP=1' it will be able to use more than 1 processor.
      It's already compiled to use multiple cores and did so the first two days of velvetg...plus used up to 880 Gb RAM during that time. Now it's only running on one core and using 660 Gb RAM. Someone said it's "trimming" now and thus not showing lots of changes in RAM or core use??

      Comment


      • #4
        Where is the velvet stderr output going?
        Can you tell from that whether it's making any progress or just stuck?

        Comment


        • #5
          Originally posted by mastal View Post
          Where is the velvet stderr output going?
          Can you tell from that whether it's making any progress or just stuck?
          Unfortunately, I submitted the job as an SGE_Batch script and cannot see the output to determine if it's stuck. In the past when things have gone awry, velvet has simply shut down.

          Biggest thing I need to know is if this length of time (>4 days) is normal for a large genome assembly?

          Comment


          • #6
            Originally posted by genetics_jo View Post
            Unfortunately, I submitted the job as an SGE_Batch script and cannot see the output to determine if it's stuck. In the past when things have gone awry, velvet has simply shut down.

            Biggest thing I need to know is if this length of time (>4 days) is normal for a large genome assembly?
            I am more familiar with LSF (which has "bpeek" command to look at the output of ongoing jobs). I do not think this is an analogous command (besides qstat -f -u username) available in SGE.

            You can check the specific nodes where your job is running to see if velvet related processes are actively running.

            If is difficult to define "normal" since all clusters have different hardware, queue limits and your dataset is unique to you. If the job is "running" (i.e. not suspended or otherwise) then best thing to do is wait.
            Last edited by GenoMax; 04-06-2014, 04:11 AM.

            Comment


            • #7
              Thanks Genomax! I'll hold my finger off the trigger (qdel) as long as possible. My one-week reservation of the cluster runs out on Tuesday so hopefully things will resolve by then.

              Comment


              • #8
                There may be a way to check on progress without the program output. Try doing
                Code:
                ls -al
                every hours or so to see which file is growing and if new files are being added. This will tell you what work is being done.

                Comment


                • #9
                  That's a good idea, but in my experience, velvetg creates most of the output files just before the run finishes.

                  Comment


                  • #10
                    Originally posted by mastal View Post
                    That's a good idea, but in my experience, velvetg creates most of the output files just before the run finishes.
                    That's also what I've observed with the previous runs of velvet. The program is still "running" but RAM use and % of processor have remained the same now for several days. Would have thought if it wasn't going to work it would have crashed?

                    Comment


                    • #11
                      I agree. I think if it's got through the part where it uses lots of memory and lots of processors without crashing it should be OK. Just keep your fingers crossed that it finishes before the time you have booked on the cluster runs out. I haven't assembled large genomes, so have no idea how long it should take, but I think it is not unusual for some assemblers to run for many days.

                      Comment


                      • #12
                        One other question...I've seen some folks say the paired end fastq files need to be merged together into a single file for "shortPaired" use in Velvet...and seen some say that the two paired end files need to be kept separate and let velvet read and coordinate reads. Which one is it? For example if I have files Humulus_lane1_read1_1.fastq and Humulus_lane1_read1_2.fastq, should these two files be merged together or kept separately for velvet to work properly?

                        Comment


                        • #13
                          [QUOTE=genetics_jo;137017]Unfortunately, I submitted the job as an SGE_Batch script and cannot see the output to determine if it's stuck. In the past when things have gone awry, velvet has simply shut down.

                          Biggest thing I need to know is if this length of time (>4 days) is normal for a large genome assembly?[/QUOT
                          I have also recently used velvet for illumina reads but it took few seconds for me to generate assembly...it's a bacterial sequencing and small genome of course!!! but after seeing ur post I am doubting on my assembly time...plz suggest something!!!

                          Comment


                          • #14
                            @genetics_jo
                            in recent versions of velvet you can use either method, but you need to use the right parameters.

                            If you leave the reads in separate files, you should add the flag -separate,
                            so you would have

                            Code:
                            velveth .....   -fastq -shortPaired -separate read1.fastq  read2.fastq
                            If you don't use the '-separate' flag, then you need to produce a file where the reads are interleaved, using one of the shuffleSequences scripts that are in the contrib subdirectory in velvet.

                            Code:
                            velveth  ..... -fastq -shortPaired read1read2_shuffled.fastq
                            By the way, did your run finish before the allotted time ran out?

                            Comment


                            • #15
                              @paa6
                              If you're using a very powerful computer, and have only a relatively small number of reads for a small genome, velvet will run very quickly.

                              As long as it produced the right output files, it should be OK.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X