SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Isaac2 genome index creation (http://seqanswers.com/forums/showthread.php?t=61539)

sklages 07-24-2015 03:14 AM

Quote:

Originally Posted by sklages (Post 178003)
Well, .. for now .. the server crashed overnight, just three hours ago ..
We now have to investigate what event caused this crash. Maybe it is just "Murphy's Law" .. we'll see.

Well, .. it was indeed Murphy's law :-)
We had a failure on a network interface .. that made at least one process going frenzy and pushed the load beyond 1000...

So I'll restart indexing today.

craczy 07-24-2015 10:02 AM

Quote:

Originally Posted by GenoMax (Post 177997)
@Semyon/Come: Can one of you confirm if the following files represent the correct isaac2 index for hg19 genome? My isaac-sort-reference job appeared to have finished (no errors) but these are the only files I see in the top level directory (Temp directory is still there with files within)
Code:

1.1G 2uniqueness.16bpb.gz
 47G kmer-positions-32-0.dat
 50K sorted-reference.xml


This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?

All the kmers are indexed in on single data file (kmer-positions-32-0.dat), which is not a very good thing as it prevents parallelisation when searching for mapping candidates.

You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.

GenoMax 07-24-2015 10:29 AM

Quote:

Originally Posted by craczy (Post 178035)
This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?

Thanks for confirming that. I had only done this

Code:

$ isaac-sort-reference -g /path_to/HG19_UCSC/Sequence/WholeGenomeFasta/genome.fa -o .
Is there a better command-line for future reference?

Quote:

Originally Posted by craczy (Post 178035)
You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.

I did the isaac-pack-reference thinking that it would "compress" the index but nothing appeared to change except the date stamps.

Update: I think I need to move the "Temp" directory out of the way (just realized that and trying it now) for "pack-reference" to work.

sklages 07-26-2015 11:13 PM

Well, I can confirm that.

It took ~64h on a 48 core "Opteron 6176 SE" (fast local storage, RAID) to build a hg19 index.

Code:

isaac-sort-reference --genome-file fa_hg19/genome.fa --jobs 1 --output-directory iSAAC2Index.32 --quiet
The result is:
Code:

938M 2015.07.27 06:21:35 2uniqueness.16bpb.gz
 42G 2015.07.27 06:54:45 kmer-positions-32-0.dat
 15K 2015.07.27 06:54:51 sorted-reference.xml
8.0K 2015.07.27 06:54:51 Temp

with 'Temp' being 1.1TiB (!) in size ... (btw, why don't you clean Temp automatically after successfully finishing a job?).

GenoMax 07-27-2015 04:56 AM

@come:

I tried the "isaac-unpack-reference" (relevant part of the command line below)

Code:

$ isaac-unpack-reference -j 8 -w 6 -i .
Resulted in this error

Code:

tar: .: Cannot read: Is a directory
tar: At beginning of tape, quitting now
tar: Error is not recoverable: exiting now
make: *** [Temp/sorted-reference.xml] Error 2

@sven: Can you see if it works for you?

BTW: "Temp" directory is required for the unpack-reference.

sklages 07-27-2015 05:45 AM

Just tried,
Code:

isaac-unpack-reference -j 1 -w 6 -i . --dry-run
This (basically) results in this error:
Code:

warning: failed to load external entity "Temp/sorted-reference.xml"
unable to parse Temp/sorted-reference.xml
warning: failed to load external entity "Temp/sorted-reference.xml"
unable to parse Temp/sorted-reference.xml

Without dry-run:
Code:

isaac-unpack-reference -j 1 -w 6 -i .
tar fails:
Code:

tar -C Temp --touch -xvf .
tar: .: Cannot read: Is a directory
tar: At beginning of tape, quitting now
tar: Error is not recoverable: exiting now
make: *** [Temp/sorted-reference.xml] Error 2

Even when I copy sorted-reference.xml to Temp, I get an error:

Code:

make[1]: Entering directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32'
make[1]: *** No rule to make target `Temp/genome.fa', needed by `/path/to/iSAACindexBuildDir/iSAAC2Index.32/genome.fa'.  Stop.
make[1]: Leaving directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32'
make: *** [all] Error 2


sklages 07-27-2015 11:02 AM

Quote:

Originally Posted by GenoMax (Post 178105)
BTW: "Temp" directory is required for the unpack-reference.

That's funny though .. under normal circumstances I'd remove this folder as it occupies quite a lot of disk space ..

GenoMax 07-27-2015 05:33 PM

@sven: A new thread has been created for posts related to isaac2 genome index creation.

craczy 07-28-2015 07:07 AM

The input file should be the 'sorted-reverence.xml', not the current directory:

This should work:

Code:

isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
Remember to remove the already existing Temp directory, if any

Come

GenoMax 07-28-2015 11:58 AM

Quote:

Originally Posted by craczy (Post 178173)
The input file should be the 'sorted-reverence.xml', not the current directory:

This should work:

Code:

isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
Remember to remove the already existing Temp directory, if any

Come

This is not working for me:

Code:

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Read 4461 bytes from ./sorted-reference.xml
tar: Error exit delayed from previous errors
make: *** [Temp/sorted-reference.xml] Error 2


craczy 07-28-2015 01:18 PM

Quote:

Originally Posted by GenoMax (Post 178199)
This is not working for me:

Code:

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Read 4461 bytes from ./sorted-reference.xml
tar: Error exit delayed from previous errors
make: *** [Temp/sorted-reference.xml] Error 2


My mistake. Apologies. It is not the sorted-reference.xml but the tarball created by 'isaac-pack-reference':

Code:

rm -rf Temp
isaac-unpack-reference -j 1 -w 6 -i packed-reference.tar.gz


GenoMax 07-29-2015 04:03 AM

Commands used for the final steps in a nutshell.

Code:

$ isaac-pack-reference -j 1 -r ./sorted-reference.xml -o ./packed-reference.tar.gz

$ isaac-unpack-reference -j 1 -w 6 -i ./packed-reference.tar.gz

The end result was a set of 64 files

Quote:

kmer-positions-32-00.dat through kmer-positions-32-63.dat
And one

Code:

2uniqueness.16bpb.gz
file.

I have started a new isaac2 genome creation job for the MM9 genome with -w 6 option upfront.

sklages 07-29-2015 04:29 AM

Got the same just 5 minutes ago :-)

So the default for isaac-sort-reference should be changed or, alternatively, it should always be called with '--mask-width 6'.

GenoMax 08-03-2015 12:41 PM

I had started an isaac2 index creation job for mm9 genome (with -w 6). It has been running for a week and still making files in Temp directory.

craczy 08-20-2015 03:17 PM

In an attempt to make it easier to use Isaac2, we will make the packed index reference for commonly used genomes on BaseSpace. At the moment, the only 2 genomes available are hg19 and mm9. Feel free to request other genomes.

Also, the issues and recommendations around indexing genomes are summarized on the isaac2 github wiki page "Reference Indexes".

The link to the already indexed genomes in basespace might change in the future, please refer to the wiki page on github for updates.

Hopefully, this will help.

Come

sklages 08-29-2016 01:50 AM

Hallo again ;-)

we are now with Isaac3. Cool .. ;-)

Creating indices for grch38 and grcm38 leaves some open questions:

I have run index creation as follows (mask-width 0 is the default, I just put it there as a "reminder" for future index creation runs):

Code:

isaac-sort-reference \
  --output-directory iSAACindex \
  --jobs 1 \
  --mask-width 0 \
  --genome-file genome.fa

That left me with exact 3 files and a 1.1TiB Temp folder:

Code:

-rw-rw-r-- 1 klages klages 618M 2016.08.26 01:05:08 2repeatness.8bpb.gz
-rw-rw-r-- 1 klages klages 678M 2016.08.25 22:19:13 2uniqueness.8bpb.gz
-rw-rw-r-- 1 klages klages 108K 2016.08.26 01:05:09 sorted-reference.xml
drwxrwxr-x 2 klages klages 8.0K 2016.08.26 01:05:09 Temp

make reported
Code:

[all]    INFO: All done!
At least it is "packable" by isaac-pack-reference.

hg19-packed-reference.tar.gz from BaseSpace (btw, would be fine to have some grch38/grcm38 though) shows:

Code:

-rwxr-x--- rpetrovski/aladdin 644685308 2014-11-19 21:38 2uniqueness.16bpb.gz
-rw-r--r-- rpetrovski/aladdin 386961748 2014-11-20 13:03 neighbors-1or2-16.1bpb
-rw-r--r-- rpetrovski/aladdin 386961748 2014-11-20 13:06 neighbors-1or2-32.1bpb
-rwxr-xr-- rpetrovski/aladdin 3157608038 2014-11-20 12:53 genome.fa
-rw-r--r-- rpetrovski/aladdin      48044 2014-11-20 12:54 sorted-reference.xml

* Is that a complete and valid index??
* Do I still need Temp for any task after index creation?
* What are the differences compared to isaac2 indices?

best,
Sven

fznajar 10-23-2018 08:14 AM

Dear all,
Can iSAAC work on mac os platform?


All times are GMT -8. The time now is 08:54 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.