SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to index a genome for BLAST Avro1986 Bioinformatics 2 12-10-2012 12:49 PM
Index between genome and mRNA xinhaiping Genomic Resequencing 0 06-20-2011 09:14 AM
VCF index creation doesn't finish Yilong Li Bioinformatics 0 04-05-2011 06:01 AM
Bfast index creation guillaum Bioinformatics 3 04-02-2010 08:40 AM
BFast index creation & other SOLiD difficulties keebs42 Bioinformatics 9 02-09-2010 08:13 PM

Reply
 
Thread Tools
Old 07-24-2015, 02:14 AM   #21
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by sklages View Post
Well, .. for now .. the server crashed overnight, just three hours ago ..
We now have to investigate what event caused this crash. Maybe it is just "Murphy's Law" .. we'll see.
Well, .. it was indeed Murphy's law :-)
We had a failure on a network interface .. that made at least one process going frenzy and pushed the load beyond 1000...

So I'll restart indexing today.
sklages is offline   Reply With Quote
Old 07-24-2015, 09:02 AM   #22
craczy
Junior Member
 
Location: San Diego, CA, USA

Join Date: Jan 2010
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
@Semyon/Come: Can one of you confirm if the following files represent the correct isaac2 index for hg19 genome? My isaac-sort-reference job appeared to have finished (no errors) but these are the only files I see in the top level directory (Temp directory is still there with files within)
Code:
1.1G 2uniqueness.16bpb.gz
 47G kmer-positions-32-0.dat
 50K sorted-reference.xml
This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?

All the kmers are indexed in on single data file (kmer-positions-32-0.dat), which is not a very good thing as it prevents parallelisation when searching for mapping candidates.

You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.
craczy is offline   Reply With Quote
Old 07-24-2015, 09:29 AM   #23
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by craczy View Post
This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?
Thanks for confirming that. I had only done this

Code:
$ isaac-sort-reference -g /path_to/HG19_UCSC/Sequence/WholeGenomeFasta/genome.fa -o .
Is there a better command-line for future reference?

Quote:
Originally Posted by craczy View Post
You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.
I did the isaac-pack-reference thinking that it would "compress" the index but nothing appeared to change except the date stamps.

Update: I think I need to move the "Temp" directory out of the way (just realized that and trying it now) for "pack-reference" to work.
GenoMax is offline   Reply With Quote
Old 07-26-2015, 10:13 PM   #24
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Well, I can confirm that.

It took ~64h on a 48 core "Opteron 6176 SE" (fast local storage, RAID) to build a hg19 index.

Code:
isaac-sort-reference --genome-file fa_hg19/genome.fa --jobs 1 --output-directory iSAAC2Index.32 --quiet
The result is:
Code:
938M 2015.07.27 06:21:35 2uniqueness.16bpb.gz
 42G 2015.07.27 06:54:45 kmer-positions-32-0.dat
 15K 2015.07.27 06:54:51 sorted-reference.xml
8.0K 2015.07.27 06:54:51 Temp
with 'Temp' being 1.1TiB (!) in size ... (btw, why don't you clean Temp automatically after successfully finishing a job?).
sklages is offline   Reply With Quote
Old 07-27-2015, 03:56 AM   #25
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@come:

I tried the "isaac-unpack-reference" (relevant part of the command line below)

Code:
$ isaac-unpack-reference -j 8 -w 6 -i .
Resulted in this error

Code:
tar: .: Cannot read: Is a directory
tar: At beginning of tape, quitting now
tar: Error is not recoverable: exiting now
make: *** [Temp/sorted-reference.xml] Error 2
@sven: Can you see if it works for you?

BTW: "Temp" directory is required for the unpack-reference.
GenoMax is offline   Reply With Quote
Old 07-27-2015, 04:45 AM   #26
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Just tried,
Code:
isaac-unpack-reference -j 1 -w 6 -i . --dry-run
This (basically) results in this error:
Code:
warning: failed to load external entity "Temp/sorted-reference.xml"
unable to parse Temp/sorted-reference.xml
warning: failed to load external entity "Temp/sorted-reference.xml"
unable to parse Temp/sorted-reference.xml
Without dry-run:
Code:
isaac-unpack-reference -j 1 -w 6 -i .
tar fails:
Code:
tar -C Temp --touch -xvf .
tar: .: Cannot read: Is a directory
tar: At beginning of tape, quitting now
tar: Error is not recoverable: exiting now
make: *** [Temp/sorted-reference.xml] Error 2
Even when I copy sorted-reference.xml to Temp, I get an error:

Code:
make[1]: Entering directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32'
make[1]: *** No rule to make target `Temp/genome.fa', needed by `/path/to/iSAACindexBuildDir/iSAAC2Index.32/genome.fa'.  Stop.
make[1]: Leaving directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32'
make: *** [all] Error 2
sklages is offline   Reply With Quote
Old 07-27-2015, 10:02 AM   #27
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Quote:
Originally Posted by GenoMax View Post
BTW: "Temp" directory is required for the unpack-reference.
That's funny though .. under normal circumstances I'd remove this folder as it occupies quite a lot of disk space ..
sklages is offline   Reply With Quote
Old 07-27-2015, 04:33 PM   #28
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@sven: A new thread has been created for posts related to isaac2 genome index creation.
GenoMax is offline   Reply With Quote
Old 07-28-2015, 06:07 AM   #29
craczy
Junior Member
 
Location: San Diego, CA, USA

Join Date: Jan 2010
Posts: 8
Default

The input file should be the 'sorted-reverence.xml', not the current directory:

This should work:

Code:
isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
Remember to remove the already existing Temp directory, if any

Come
craczy is offline   Reply With Quote
Old 07-28-2015, 10:58 AM   #30
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by craczy View Post
The input file should be the 'sorted-reverence.xml', not the current directory:

This should work:

Code:
isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
Remember to remove the already existing Temp directory, if any

Come
This is not working for me:

Code:
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Read 4461 bytes from ./sorted-reference.xml
tar: Error exit delayed from previous errors
make: *** [Temp/sorted-reference.xml] Error 2
GenoMax is offline   Reply With Quote
Old 07-28-2015, 12:18 PM   #31
craczy
Junior Member
 
Location: San Diego, CA, USA

Join Date: Jan 2010
Posts: 8
Default

Quote:
Originally Posted by GenoMax View Post
This is not working for me:

Code:
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Read 4461 bytes from ./sorted-reference.xml
tar: Error exit delayed from previous errors
make: *** [Temp/sorted-reference.xml] Error 2
My mistake. Apologies. It is not the sorted-reference.xml but the tarball created by 'isaac-pack-reference':

Code:
rm -rf Temp
isaac-unpack-reference -j 1 -w 6 -i packed-reference.tar.gz
craczy is offline   Reply With Quote
Old 07-29-2015, 03:03 AM   #32
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Commands used for the final steps in a nutshell.

Code:
$ isaac-pack-reference -j 1 -r ./sorted-reference.xml -o ./packed-reference.tar.gz

$ isaac-unpack-reference -j 1 -w 6 -i ./packed-reference.tar.gz
The end result was a set of 64 files

Quote:
kmer-positions-32-00.dat through kmer-positions-32-63.dat
And one

Code:
2uniqueness.16bpb.gz
file.

I have started a new isaac2 genome creation job for the MM9 genome with -w 6 option upfront.

Last edited by GenoMax; 07-29-2015 at 03:11 AM.
GenoMax is offline   Reply With Quote
Old 07-29-2015, 03:29 AM   #33
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Got the same just 5 minutes ago :-)

So the default for isaac-sort-reference should be changed or, alternatively, it should always be called with '--mask-width 6'.
sklages is offline   Reply With Quote
Old 08-03-2015, 11:41 AM   #34
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I had started an isaac2 index creation job for mm9 genome (with -w 6). It has been running for a week and still making files in Temp directory.
GenoMax is offline   Reply With Quote
Old 08-20-2015, 02:17 PM   #35
craczy
Junior Member
 
Location: San Diego, CA, USA

Join Date: Jan 2010
Posts: 8
Default

In an attempt to make it easier to use Isaac2, we will make the packed index reference for commonly used genomes on BaseSpace. At the moment, the only 2 genomes available are hg19 and mm9. Feel free to request other genomes.

Also, the issues and recommendations around indexing genomes are summarized on the isaac2 github wiki page "Reference Indexes".

The link to the already indexed genomes in basespace might change in the future, please refer to the wiki page on github for updates.

Hopefully, this will help.

Come
craczy is offline   Reply With Quote
Old 08-29-2016, 12:50 AM   #36
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

Hallo again ;-)

we are now with Isaac3. Cool .. ;-)

Creating indices for grch38 and grcm38 leaves some open questions:

I have run index creation as follows (mask-width 0 is the default, I just put it there as a "reminder" for future index creation runs):

Code:
isaac-sort-reference \
  --output-directory iSAACindex \
  --jobs 1 \
  --mask-width 0 \
  --genome-file genome.fa
That left me with exact 3 files and a 1.1TiB Temp folder:

Code:
-rw-rw-r-- 1 klages klages 618M 2016.08.26 01:05:08 2repeatness.8bpb.gz
-rw-rw-r-- 1 klages klages 678M 2016.08.25 22:19:13 2uniqueness.8bpb.gz
-rw-rw-r-- 1 klages klages 108K 2016.08.26 01:05:09 sorted-reference.xml
drwxrwxr-x 2 klages klages 8.0K 2016.08.26 01:05:09 Temp
make reported
Code:
[all]    INFO: All done!
At least it is "packable" by isaac-pack-reference.

hg19-packed-reference.tar.gz from BaseSpace (btw, would be fine to have some grch38/grcm38 though) shows:

Code:
-rwxr-x--- rpetrovski/aladdin 644685308 2014-11-19 21:38 2uniqueness.16bpb.gz
-rw-r--r-- rpetrovski/aladdin 386961748 2014-11-20 13:03 neighbors-1or2-16.1bpb
-rw-r--r-- rpetrovski/aladdin 386961748 2014-11-20 13:06 neighbors-1or2-32.1bpb
-rwxr-xr-- rpetrovski/aladdin 3157608038 2014-11-20 12:53 genome.fa
-rw-r--r-- rpetrovski/aladdin      48044 2014-11-20 12:54 sorted-reference.xml
* Is that a complete and valid index??
* Do I still need Temp for any task after index creation?
* What are the differences compared to isaac2 indices?

best,
Sven
sklages is offline   Reply With Quote
Old 10-23-2018, 07:14 AM   #37
fznajar
Member
 
Location: Oklahoma

Join Date: Jan 2012
Posts: 34
Default

Dear all,
Can iSAAC work on mac os platform?
fznajar is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:32 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO