Originally posted by sklages
View Post
Unconfigured Ad
Collapse
X
-
Unfortunately it has. The index contains extra information about the reference and with isaac2 that information has changed. Specifically, in the isaac2 index we are keeping track for each position in the reference genome if there are similar sequences elsewhere in the reference.
-
-
I did not specify a value for seed-length so the process is creating all possible combinations [--annotation-seed-lengths arg (=16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80]. It looks like the end may be in sight today for the process I am running since the files for 80 are being made now.
@sven: Expect a multi-day turnaround.
Comment
-
-
@Semyon/Come: Can one of you confirm if the following files represent the correct isaac2 index for hg19 genome? My isaac-sort-reference job appeared to have finished (no errors) but these are the only files I see in the top level directory (Temp directory is still there with files within)
Code:1.1G 2uniqueness.16bpb.gz 47G kmer-positions-32-0.dat 50K sorted-reference.xml
Comment
-
-
-
Well, .. it was indeed Murphy's law :-)Originally posted by sklages View PostWell, .. for now .. the server crashed overnight, just three hours ago ..
We now have to investigate what event caused this crash. Maybe it is just "Murphy's Law" .. we'll see.
We had a failure on a network interface .. that made at least one process going frenzy and pushed the load beyond 1000...
So I'll restart indexing today.
Comment
-
-
This looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?Originally posted by GenoMax View Post@Semyon/Come: Can one of you confirm if the following files represent the correct isaac2 index for hg19 genome? My isaac-sort-reference job appeared to have finished (no errors) but these are the only files I see in the top level directory (Temp directory is still there with files within)
Code:1.1G 2uniqueness.16bpb.gz 47G kmer-positions-32-0.dat 50K sorted-reference.xml
All the kmers are indexed in on single data file (kmer-positions-32-0.dat), which is not a very good thing as it prevents parallelisation when searching for mapping candidates.
You can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.
Comment
-
-
Thanks for confirming that. I had only done thisOriginally posted by craczy View PostThis looks correct, but surprising. Did you specify something like "-w 1" on the command line by any chance?
Is there a better command-line for future reference?Code:$ isaac-sort-reference -g /path_to/HG19_UCSC/Sequence/WholeGenomeFasta/genome.fa -o .
I did the isaac-pack-reference thinking that it would "compress" the index but nothing appeared to change except the date stamps.Originally posted by craczy View PostYou can use the "isaac-pack-reference" and then "isaac-unpack-reference -w 6" to split the index into smaller files without having to re-doing the reference sorting.
Update: I think I need to move the "Temp" directory out of the way (just realized that and trying it now) for "pack-reference" to work.
Comment
-
-
Well, I can confirm that.
It took ~64h on a 48 core "Opteron 6176 SE" (fast local storage, RAID) to build a hg19 index.
The result is:Code:isaac-sort-reference --genome-file fa_hg19/genome.fa --jobs 1 --output-directory iSAAC2Index.32 --quiet
with 'Temp' being 1.1TiB (!) in size ... (btw, why don't you clean Temp automatically after successfully finishing a job?).Code:938M 2015.07.27 06:21:35 2uniqueness.16bpb.gz 42G 2015.07.27 06:54:45 kmer-positions-32-0.dat 15K 2015.07.27 06:54:51 sorted-reference.xml 8.0K 2015.07.27 06:54:51 Temp
Comment
-
-
@come:
I tried the "isaac-unpack-reference" (relevant part of the command line below)
Resulted in this errorCode:$ isaac-unpack-reference -j 8 -w 6 -i .
@sven: Can you see if it works for you?Code:tar: .: Cannot read: Is a directory tar: At beginning of tape, quitting now tar: Error is not recoverable: exiting now make: *** [Temp/sorted-reference.xml] Error 2
BTW: "Temp" directory is required for the unpack-reference.
Comment
-
-
Just tried,
This (basically) results in this error:Code:isaac-unpack-reference -j 1 -w 6 -i . --dry-run
Without dry-run:Code:warning: failed to load external entity "Temp/sorted-reference.xml" unable to parse Temp/sorted-reference.xml warning: failed to load external entity "Temp/sorted-reference.xml" unable to parse Temp/sorted-reference.xml
tar fails:Code:isaac-unpack-reference -j 1 -w 6 -i .
Even when I copy sorted-reference.xml to Temp, I get an error:Code:tar -C Temp --touch -xvf . tar: .: Cannot read: Is a directory tar: At beginning of tape, quitting now tar: Error is not recoverable: exiting now make: *** [Temp/sorted-reference.xml] Error 2
Code:make[1]: Entering directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32' make[1]: *** No rule to make target `Temp/genome.fa', needed by `/path/to/iSAACindexBuildDir/iSAAC2Index.32/genome.fa'. Stop. make[1]: Leaving directory `/path/to/iSAACindexBuildDir/iSAAC2Index.32' make: *** [all] Error 2
Comment
-
-
This is not working for me:Originally posted by craczy View PostThe input file should be the 'sorted-reverence.xml', not the current directory:
This should work:
Remember to remove the already existing Temp directory, if anyCode:isaac-unpack-reference -j 1 -w 6 -i sorted-reference.xml
Come
Code:tar: This does not look like a tar archive tar: Skipping to next header tar: Read 4461 bytes from ./sorted-reference.xml tar: Error exit delayed from previous errors make: *** [Temp/sorted-reference.xml] Error 2
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, 06-05-2026, 10:09 AM
|
0 responses
10 views
0 reactions
|
Last Post
by SEQadmin2
06-05-2026, 10:09 AM
|
||
|
Started by SEQadmin2, 06-04-2026, 08:59 AM
|
0 responses
21 views
0 reactions
|
Last Post
by SEQadmin2
06-04-2026, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
28 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
22 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
Comment