Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by sklages View Post
    But format of index files has not changed from version 1 to 2?
    Unfortunately it has. The index contains extra information about the reference and with isaac2 that information has changed. Specifically, in the isaac2 index we are keeping track for each position in the reference genome if there are similar sequences elsewhere in the reference.

    Comment


    • #17
      I did not specify a value for seed-length so the process is creating all possible combinations [--annotation-seed-lengths arg (=16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80]. It looks like the end may be in sight today for the process I am running since the files for 80 are being made now.

      @sven: Expect a multi-day turnaround.

      Comment


      • #18
        I haven't neither .. should use 32.
        But .. I am optmistic :-)

        Comment


        • #19
          @Semyon/Come: Can one of you confirm if the following files represent the correct isaac2 index for hg19 genome? My isaac-sort-reference job appeared to have finished (no errors) but these are the only files I see in the top level directory (Temp directory is still there with files within)
          Code:
          1.1G 2uniqueness.16bpb.gz
           47G kmer-positions-32-0.dat
           50K sorted-reference.xml

          Comment


          • #20
            Originally posted by sklages View Post
            OK .. index creation is running for hg19 ... I'll report back tomorrow.
            Well, .. for now .. the server crashed overnight, just three hours ago ..
            We now have to investigate what event caused this crash. Maybe it is just "Murphy's Law" .. we'll see.

            Comment


            • #21
              Originally posted by sklages View Post
              Well, .. for now .. the server crashed overnight, just three hours ago ..
              We now have to investigate what event caused this crash. Maybe it is just "Murphy's Law" .. we'll see.
              Well, .. it was indeed Murphy's law :-)
              We had a failure on a network interface .. that made at least one process going frenzy and pushed the load beyond 1000...

              So I'll restart indexing today.

              Comment


              • #22
                Originally posted by GenoMax View Post
                @Semyon/Come: Can one of you confirm if the following files represent the correct isaac2 index for hg19 genome? My isaac-sort-reference job appeared to have finished (no errors) but these are the only files I see in the top level directory (Temp directory is still there with files within)
                Code:
                1.1G 2uniqueness.16bpb.gz
                 47G kmer-positions-32-0.dat
                 50K sorted-reference.xml
                This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?

                All the kmers are indexed in on single data file (kmer-positions-32-0.dat), which is not a very good thing as it prevents parallelisation when searching for mapping candidates.

                You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.

                Comment


                • #23
                  Originally posted by craczy View Post
                  This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?
                  Thanks for confirming that. I had only done this

                  Code:
                  $ isaac-sort-reference -g /path_to/HG19_UCSC/Sequence/WholeGenomeFasta/genome.fa -o .
                  Is there a better command-line for future reference?

                  Originally posted by craczy View Post
                  You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.
                  I did the isaac-pack-reference thinking that it would "compress" the index but nothing appeared to change except the date stamps.

                  Update: I think I need to move the "Temp" directory out of the way (just realized that and trying it now) for "pack-reference" to work.

                  Comment


                  • #24
                    Well, I can confirm that.

                    It took ~64h on a 48 core "Opteron 6176 SE" (fast local storage, RAID) to build a hg19 index.

                    Code:
                    isaac-sort-reference --genome-file fa_hg19/genome.fa --jobs 1 --output-directory iSAAC2Index.32 --quiet
                    The result is:
                    Code:
                    938M 2015.07.27 06:21:35 2uniqueness.16bpb.gz
                     42G 2015.07.27 06:54:45 kmer-positions-32-0.dat
                     15K 2015.07.27 06:54:51 sorted-reference.xml
                    8.0K 2015.07.27 06:54:51 Temp
                    with 'Temp' being 1.1TiB (!) in size ... (btw, why don't you clean Temp automatically after successfully finishing a job?).

                    Comment


                    • #25
                      @come:

                      I tried the "isaac-unpack-reference" (relevant part of the command line below)

                      Code:
                      $ isaac-unpack-reference -j 8 -w 6 -i .
                      Resulted in this error

                      Code:
                      tar: .: Cannot read: Is a directory
                      tar: At beginning of tape, quitting now
                      tar: Error is not recoverable: exiting now
                      make: *** [Temp/sorted-reference.xml] Error 2
                      @sven: Can you see if it works for you?

                      BTW: "Temp" directory is required for the unpack-reference.

                      Comment


                      • #26
                        Just tried,
                        Code:
                        isaac-unpack-reference -j 1 -w 6 -i . --dry-run
                        This (basically) results in this error:
                        Code:
                        warning: failed to load external entity "Temp/sorted-reference.xml"
                        unable to parse Temp/sorted-reference.xml
                        warning: failed to load external entity "Temp/sorted-reference.xml"
                        unable to parse Temp/sorted-reference.xml
                        Without dry-run:
                        Code:
                        isaac-unpack-reference -j 1 -w 6 -i .
                        tar fails:
                        Code:
                        tar -C Temp --touch -xvf .
                        tar: .: Cannot read: Is a directory
                        tar: At beginning of tape, quitting now
                        tar: Error is not recoverable: exiting now
                        make: *** [Temp/sorted-reference.xml] Error 2
                        Even when I copy sorted-reference.xml to Temp, I get an error:

                        Code:
                        make[1]: Entering directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32'
                        make[1]: *** No rule to make target `Temp/genome.fa', needed by `/path/to/iSAACindexBuildDir/iSAAC2Index.32/genome.fa'.  Stop.
                        make[1]: Leaving directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32'
                        make: *** [all] Error 2

                        Comment


                        • #27
                          Originally posted by GenoMax View Post
                          BTW: "Temp" directory is required for the unpack-reference.
                          That's funny though .. under normal circumstances I'd remove this folder as it occupies quite a lot of disk space ..

                          Comment


                          • #28
                            @sven: A new thread has been created for posts related to isaac2 genome index creation.

                            Comment


                            • #29
                              The input file should be the 'sorted-reverence.xml', not the current directory:

                              This should work:

                              Code:
                              isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
                              Remember to remove the already existing Temp directory, if any

                              Come

                              Comment


                              • #30
                                Originally posted by craczy View Post
                                The input file should be the 'sorted-reverence.xml', not the current directory:

                                This should work:

                                Code:
                                isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
                                Remember to remove the already existing Temp directory, if any

                                Come
                                This is not working for me:

                                Code:
                                tar: This does not look like a tar archive
                                tar: Skipping to next header
                                tar: Read 4461 bytes from ./sorted-reference.xml
                                tar: Error exit delayed from previous errors
                                make: *** [Temp/sorted-reference.xml] Error 2

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X