![]() |
![]() |
#1 |
Junior Member
Location: Netherlands Join Date: Jul 2010
Posts: 5
|
![]()
Hi all,
I'm trying to get lastz to compare some sequences to the human reference genome. This is the command I'm using: lastz /path/to/data/fastafile.fasta[multiple] /path/to/reference/reference_human.fa --format=difference --identity=90 --coverage=50 --output=/path/to/data/lastzout.fa This gives the following error: FAILURE: call to realloc failed to allocate 135087680 bytes, for add_segment Does anyone know what this means and what I can do about it? |
![]() |
![]() |
![]() |
#2 |
Member
Location: Switzerland Join Date: May 2010
Posts: 19
|
![]()
Hi Esther,
I am getting the identical error message when trying to align 454 run with ~8000 reads on an 80kb reference sequence, using: ./lastz brca1.fasta 454run.fna --format=sam --step=10 --seed=match12 --notransition --exact=20 --noytrim --match=1,5 --ambiguous=n --coverage=90 --identity=95 Is there anyone out there who knows what we do wrong? I would like to replicate the Lastz parameters used on the public Galaxy server, does anyone know where I can find which parameters they use? Cheers, David |
![]() |
![]() |
![]() |
#3 |
Member
Location: Europe Join Date: May 2013
Posts: 53
|
![]()
I am using lastz for piler and my command is
$lastz unigenes.fa[multiple][unmask] --format=maf > unilastz.maf FAILURE: bad fasta character in unigenes.fa, >Contig10: Y my contig10 is >Contig10 TGAAAACTAATATATATATATGTAGCATACCTGGATTCAACTTTACTGGTGGTTCAACAT GAATCGTTCAAACAATTCCGCCTGTGCTTTTAGTTTCTCCAATTCGCCATTAGTCTTCTT ATTTTCCTCCTTTTGAGCCTCCAACTCCATCTTGAGGGTCTGAATTTCTTCTTTAGCACT CTCAAGCTGAGTTTGAGTCTCATTTATCCCAAAATAGCTATAGCGACGGGGCACCTTCAC CCCCCAACCCATACCTTTTTGATACCCTGATCTCACACCGAGAGTCTCAACCATAATCTC CTCTTCAGTCTTCGAATTTTCTCCATCCATCATAGACGATTCCTCCATTTCAGCCATTTT CTCCTGAAATTTATTATTAATATATGTGAAATTATGGTAAAAGAATAGTTCCAGAATTAT GGTAAAAGAATTGTTGAGGTGTTGAAACACAAGAGGTTTGGTGCAAAATTCCCTATTTTG TGTAGAGCAACTTACATATGTCATTTCAGCCTCTTCGTTCACGAATTTATTCACGAATTG ACCCTTCCTGCGAACATGATTATCCTTCCACAACAGAATCCTACTACGATTAACCTATAA GGAAAACATATACAATTACTAAGTTTCTTACCTCATCACAATATCCAGTATAGATATCTG AATTCAAAAATAGAGCACAAATATAGAAATATTAGATTTCAAAAATTGAAAGTACATACC ATGCTCCGGTAGCGAACAAGGTTTCTGGTCCTTCCTGATGTGTGGGGAAAAAGCATGCAT TTTCTATTTGCTTTATTGATGGCAGACCTCATCTACATCATAAACAATGAACAACAAATC AATTTTTTTTATTTAATAAGAACCTTTGGCCAAATATAATCCCAGCATCCATACAATTAT CTTTTGTCGCAGAAAAAAAGGAGAAACAGAGAGAGTGTTAGAATAAAAAGGCATACCTGA AAACGATCAGACTCAACGTGCTCACATAGTGCATGCCAACTCTCAACTGTGAGTTCTCTC CGCGGAATTGAATTTCTAGGTTGAAGCCCCTTGTTCTTCATATTTGTGTGGATCTTGCCC AACCTATATCTCCAATTTCTGTAATTTGTCTGGAGTGCTTGATCTATTATAGTCTGGATT CGCCAACTTGACAGGTCTTCAATCTCGAACCATTGCTAGTAATCAAAATATCTCTATCAT TACTTGTATATATAAAATAAAAAGAAAAACACTTGGCTTATTTGTTCGAGGAGAATAAAA AACTCACAAACTAAGGTGATTAAAAAGTAAAAATTACCAATAAATTTTCTAGACAAGCTT GCTTATTTGTACTAGAAACTTCCTTCCACTTTGACACGCCTTCAATATCCGCATGTGCTC TTACAGTAACCCCAATAGCATCAGCAAACTCTCCGGTATATTGTGTTGGTATTGTCTAAC AAGGATTAAAAACTAATTTCATCTTCCCATTCTGCGAAACAACATAATAATTAGAGAGTT AAATTCATAAAACAAGATCTCCAATTTTCACAGATTAGAATATTGTACCCGTTTCACCAT CTTATCGAGGTGCAGGCCTCTACCCGGACCGTGAGTATTCTGTTTAGGCTTTGATCCTAA ATAAAAAGTTATTGAAATAGTGTTAATGTTCCCTAAAGATACTAAACTAAATAAATATGT TCCGCAAAGAATAAAGAATAGAATTAAAAAAAAAAAATGAAACCAAACAACATACCAATG GTTGTTCCTGCAGATCCTTCTAGAGTTGCTGAGACAGGAATAGATGGAGGCTGACGCACA CTTAAATTAGGAGATGAAGCAGTGGACCTTAGAGTATTTCCAGCAGTGGTGGATGTGGAA GTTGACATATTAGATGCTACATATTTGGATTTTGAAGCAGCTGGAATTGGTTGTTTATTA ACCACTGAAGTGGAAGGCCTGGAAGTTGACGGCTGGAATTTCAAAGTTGGTGGAATTGAT GCAACATTTCCAGACGTAACATGTACCTGGGGCTGTTTATTAACCACTGTAGTAGAAGGT CTGGAAGTTGACGGTTGGAATTTCGAAGTTGGTGGAATTGATGCAACATTTCTAGTCGTA ACATGTACTTGGGGTCGTACAAACCTCTTGAACTTTGGTCCTCCTGACATCTGAAAAAAT AAAAGACAAAAAGATATCAGTATGCATGATATAAAGATGAAGAGTTAAAGTATATGCTAC ATATTCAGTAAAGTGTATTACATTGAATAGGCCGAAAAACAGTTAAAAAAGAAAAACCAA AGCTATAACAGTTCTTGCATATTCTCTTATAACTAATAAAGAATGAGTTAATAAAAATAA ATCCTCCTCTTCTTCTTCTTCTTCTTGTTCATTTCTTCTTCTTCTTCTTCCTCCTCCTCC TCTTCTTCTTCCTCCTCTTCTCCCTCCTCCTCATATGATTGTTCAGTATTTTGATTTCTG GACTTTCTAATGCTTTCAAAGATCATCCTATCAAGCTCTTCAGGTTCTAAATCGTCCCTT GCCAATTGCCGTCGACCATTCTGCTCACTTGCTTCTTCAATTGGATCAACACGAGTAGAT TCATATTGCTGAACAACCTCTGCCACAAATTCTTGTTCATTAATGGTTGTTTTGTCCACA ACATCATTTACAGCTGATTCACAAACATCCCACAAATGCCTATGTGAAACTCTCTGCACA ACCTTCCAATTTTTACCCAGTTTTGTGTCATTCACATAAAACACTTGTTGTGCCTGGACA GGTAATATAAATGGTTGCTCCTTGTACCACCTAGAGCTAACATCAACACTAATGAAATGT TTATCCTCTTGGATAAATTTATTACTTCCAGAGTTGAACCACTCACAATCAAACAAAACC ACTTTTCTACCACCAATATATGTTAACTCCACAACCTTTCTCAAGAAACCGAAAAACTCA ACCGGCTTCCTCTTATGAAATCCTCCAACTACAAGGCCACTATTTTGAGTTTTCGTCGTC TGTCACAATCCCTTGTATGAAACCGGACCTGATTCACCATGCATCCATTATAAAAATCTG AGCTCAGCCTCGAAGGACCATTTGCTAAAGCCCACAGCTCAGTTGTAATTTTGTCTGCTT TCTGAAAATGCAACAGACCCATCTACAAGATAATATAATAGTTAGAACAAAGTATTCTGC TAAAATCATTTAAGTTCAGAGTATAAGTTATCAACTAAAGGAAATAAACTAACATGTTTT TAAACCACTGTGGGAAGGTGTTTTTCTGCCTTGGGTCAAGTTCATCCATGTTACCCACTT CTTCTACGGTAATTCCTTGCAATTCAGCTCTATGCATTCTGAAAAGGGCATAACCATTAG ATTACAAATAAGTTTATGGTTGTCAAGTGTCAACAAAACACTATGTATTACCACAGTGAA CGGGTACATCATACTGCCTCAATCCAAACATGGAATAACATCATAGTTACTTAAGTTTTA CTATCAACAGTAACTATCTAAAATCAGCAGCAGGTATATGCCTAAAGATACTGGAACAAA TCAGTAACTATCTAAAATCAGCAGCAGGTATATGCCCAAAGATACTGGTAGTCATAGTTC AGACAAAACTCGTAGACATAAGATAACAATAGAGAGAGATAAGTATTTCTTACTTTAGAT ATGGCTCAACTTCATCACAATTGCACATTATGTACCAATGTGCCTCATCTAATTCTCCTC TAGTCAAAGAACACTTGGGGCTAACTTCCCCAAATGGACGAGTTCTCATTGAAAATACAG AAATTCCTTCCTTTTGTTGACCATAGACACCATCAAAATTCCGTGGAAGGCAATTAAATT TAGTTTCCATGTCCATAGGCAAATGCATAGAAAAAAATGTGATGTTCTCATGCACCAAGT ATTGTTTAGCGATGGAAGCTTCAGGATGATTTCTATTTCGGATAAAGCCTGTGAGAGATC CCAGAAACCTAACAATAAGAAGGAAAGTAAACACCGTAGAATCAGAATCATGATACAGTT AAAATCAGAATCATTGCATCATATATATCTCAAAAATTTCACCTGGCAGAAGCCGCATAA GATGTTCCTTAGACACAATATATATCTCAAAAATTAAGGCTTGAAATCAAGTAAAAATGC AAAATAAATCAAGGCCAAATAAAATGTTACTCTGATTTCATACCACAAACAACACAACTT GTATGAGTACAGATCAGCAGCCAACTTAATAGACCAGAACACAAACTTAATAGACCAGAA CACAGAATCACAAACTTATTCGACAATCACACGAAGGGTCACAATAAAAACTAACCTTTC AAAGGGATACATCCAGCGAAATTGCACGGGTCCTGCAATCTTAGCCTCTTGGGCCAAGTG GACAGATAAATGTACCATAATTGTGAAAAAGTAAGGTGGAAATATCTTCTCCAACTTACA TAAGATAATTGGTATATTTTCATCCAACCGATTCACATCTTCGTCATACAAGGTCTTGGA ACATAATTGCTTAAAGTAATCTCCTAATTCACATAAAGCGTCCTTCACTTCCTTGGTCAA GTAACCACGAATTCCAGCAGGAATTAATCTCTGGAGTAGAATATGACAGTCATGGGACTT CAGTCGTGATAATTTATCCTTTTTCACACGTGCTGAGATGTTAGAAGCAAACCCATCAGG AAACTTAACTAACTTCAGAAACTCCAAGACTTCTCCTCTTTCACTTGGGGATAAGGTAAA ACATGCAGGTGGCCTGAATATTTTATTGCCATTATCTTGCAGGTGCAGCTCCTTCCTAAT ACCCATATCGCGTAAATCTTCTCTTGCAACATTAGAGTCTTTTGTTTTCCCTTTAATATC AAATAATGTACCAAGGACATTATCACATATATTCTTCTCAACGTGCATCGGATCGAGATT GTGCCTTATCTTCAACATGTGCCAATATGGCAGCTGGTAGAAACCACTTATTTTATCCCA ATTTACTTGATCCGAATCACGCTTTCTTTTACGATTTCCAGCTGCAATGAACACATCTTG CAATGAATTAAGCCAATACAATATTTCTTCACCTGAAAGTTGTTTAGGTTTCTCTCTATG CTCCTCTTTACCATTGAAAGTTAAACTTGTTCCTGCGGAATGGGTGTCTTTCGGGTAGAA AACGACGATGTCCCATGTAACATATCTTTGTTCGTAATTGTGTAGAATCTGTCTCATCAA GACAGCATGGGCATGCATAATAACCTGATGCTTGCCACCCTGACAAAAGACTATAACCAG GCAAGTCGCTTATGGTCCACAACAACACAACACGCATTAGAAACTTCTCTTTTGTGAGAG CATCATAAGTTTCGATCCCAACATCCCATAACTCCTTCAAATCTTCGATTAAGGGCTGCA AATACACATCAATGTTTTTCCCAGGATAATTTGGGCCCGGTATAAGCAGCGACAACATAA GAAATGGTTCACGCATACAATCATAAGAGGGCAGATTATAAACAACAAGAATCACGGGCC AAATGGTATGCGAATTCCCGATGTCTGACCATGGAGTGAATCCATCACATGCTAAACCCA ACCTAACATTCCGTGCATCCTTAGAAAACTCAGGAAATAGACTATCAAGGTGTTTCCAAG CCAATGAATCGGCTGGATGTCTTGAGACTCCATCCTCAACTTTGCGTTTATCTTTGTGCC AACGCATTTTTGCAGCCGTTTTGGTGAATGCAAAGAGTCTTTGCAACCGGGGCTTCAAGG GGAAGTATCTCAATTTCTTATGAGCAACTCTCTTTATCTTACTACCAAGCGGCTGCCACC TAGATTCTTTACAAACTGGACACTCGTTCATATTTTCATTCTCTTTCCGGAATAATACAC AACCGTTCTTACATGCATCAATCTCTTCATACCCTAGTCCAAGTTCACGCAGCTCCTTCT CAGCACCATATAATGACGATGGAAGAGTTTCACCTTCGGGTAAGATGCCTTTTATAAAGT CCAGCAGCATACTGAAAGACTTGTCCGTCCATTTATCCTTAACTTTCAGATGTAGCAGTT TTAGGGTAGCTAAAAGTGATGAAGTCTTACAGCCCGGATATAATTCATTTCTGGCATCAC CCACCAACTTCTGGAATCTTTCTTCTTCTTCTTCTTCATAGTCGCTGTCCACATGTTGAC CTTCAACCAATTCATCCAAGTACCCATTGATCGAACACATTAAATTCGACATCATCATTA CCTTGATCATTTTCCTCATAATCATCAACACCCTCTGTTTCTGCTAGGAACTCATTTATC TCGTCCTCGGATGGTAAACGAACATCATTATCAGTATCCTCATCATTTATATCAAGCTCT TTCTCTCCATGATTGTTCCAAAGTACATATGTACGAGAAAATCCATTAACGTGTAAATCC TTTTCAACCTCATCTATGGGCTTGTAGAATGAGTTGTTGCAAACCACACATGGACATTTA ATCATGTTATCACTCCCCATCACATTATCAGCAGCAAACATAAGAAAACATTGCACACCA TCCATATAAGAATTCGAAAACTTATTATGCTCTCTAATCCAACTTTTGTCCATTATCATT ACGTTGTACCTAAGTAATAAATAGCTCAGGAAAAATATAACTCTGGCAAGTAGCTCAGGC CTCAAATTAATAAATACCATATAAACATACACAAGTCAAATAATATCTACATAAGTAGAG GAACAATCTGAAGCCAAATAATAAACATAGATACAAATATTCATCTAGAACTTATCAATA TACACATACAGAAACGTGCCTAAATCATAAATTTTAGGAGAATTGAACTACATCAAAGTT TGAAGTACATCATACCTTCTCTAACAAAGGACACAGCACAAACTTCGGATTTGGATAATG ATTTGATCAAAGCTTGCTCCTCCAGATTCAAACGCAGCCACACTTCACAGTTGCTAGCCG ATTCAAACGCGCCACCAGATTCAAACGCGCCACCAGATTGAAACACGATCCATCAGACGC CTATCATTAACCTTCCTTTCCTGAATCTCCGATTGATAGTGTTGTACTGTTGTGTCGTGT GATTAGATTTAGATCGATCGAAATTAGGGTTTTAGTAAAAGGGATTTTTGTTTGTGACTG TGGGTTTCTGTCTAAGTAACGAAGAAGATGAAGTTATCTGATTCCCTTTTTAAAACTTAT CCTTAATTGCAAAAAAGCCCAAAAGTGTTTCTTATTAACAAATATGCCGCCGAATTTTCT TTCTTTCACAATTTTCAAATTTTATCATCTAAGAGCCTATTTTTCTATGATAATTTATTA TATTTGTCATTCTAAAATATTTTTATTTGATAATTTAGTCTAAAAATGTCATCGTAAAGT TTAATTGTGGTGACAAGTTTTAAGGTGCCACCTAATAGTTACTTTTATTTGATAATTTAY CCTAATGTCATTTAAAACTAGTCATCTAACATCCTTTT can anyone tell me why it is showing error and where it is wrong..?? Thanks.... |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
As the error says you have the character "Y" in the second last line of the contig you have posted above.
You may have to specify option "--ambiguous=iupac" for lastz. That comes with a caveat as explained in the README so be sure to check the explanation for that specific option here: http://www.bx.psu.edu/~rsharris/last...z-1.02.23.html |
![]() |
![]() |
![]() |
#5 |
Member
Location: Europe Join Date: May 2013
Posts: 53
|
![]()
Thank U ...
![]() |
![]() |
![]() |
![]() |
#6 |
Member
Location: Europe Join Date: May 2013
Posts: 53
|
![]()
Hi,
I am using lastz with this command but it is running from last 4 days and it is around 73.4 gb now. I am not able to understand why it is still running or i did wrong. Can any body guide me ? lastz unigenes.fa[multiple] unigenes.fa --notrivial --format=maf --ambiguous=iupac > unilastz.maf Thanks.. ![]() |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,089
|
![]()
Are you searching the "unigene.fa" file against itself? If so how big is that file?
As indicated in the help page for lastz comparing entire chromosomes only takes a few hours. So 4 days seems long. At this point in time you are too far into the computation to cancel and investigate other options like (--notransition or --step) which are supposed to lower sensitivity and run times. If you are using a cluster you can always start another job with these options and see if second job finishes before the first one. |
![]() |
![]() |
![]() |
#8 |
Member
Location: Europe Join Date: May 2013
Posts: 53
|
![]()
Thank U ...
![]() ya I am searching the "unigene.fa" file against itself and my file size is 334mb. I am running lastz on my system which is 16gb of ram and 8cpu. |
![]() |
![]() |
![]() |
#9 | |
Member
Location: USA Join Date: Mar 2014
Posts: 13
|
![]() Quote:
Sorry for this late reply. I am new to seqanswers today. Are you still having this problem? If so, please email me so I can figure out why this is happening (you can find my email address in the lastz readme). That error report is a failsafe and probably means that memory fragmentation is happening. I've never actually seen that error manifest in my own use. My best guess as to the cause is very high repeat content. But that's just a guess. Bob H (lastz author) |
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: USA Join Date: Mar 2014
Posts: 13
|
![]() Quote:
I'm very surprised this could be happening with such a small problem. Could you email me directly (my address is in the lastz readme)? I'd like to run the same sequences here so I can figure out how that memory allocation condition is being triggered. Bob H (lastz author) |
|
![]() |
![]() |
![]() |
#11 | |
Member
Location: USA Join Date: Mar 2014
Posts: 13
|
![]() Quote:
My best guess when this happens is that the input sequences contain a lot of unmasked repetitive sequence. The simplest (probable) solution is to split the target into several runs, for example, have a separate run for each of some subset of the target sequences. The other possibility would be to softmask repeats the sequences (assuming they aren't already), but this can be time consuming as well. In detail. Internally, lastz processes one query at a time. It identifies HSPs (high scoring ungapped alignments) between the query and all the target sequences, collects these into a table, and then proceeds to the gapped alignments stage. There is one segment in the table for each HSP. The amount of memory allocated for the table grows as more segments are needed. If allocation fails, a message is generated indicating a memory allocation failure in add_segment. Even if such a large memory request could be satisfied, this wouldn't really solve the problem. Most likely the program wouldn't finish in a timely fashion. The cause of such a large number of HSPs is usually something like an un-masked microsatellite, a long sequence with a short repeat period, e.g. AGTAGTAGTAGTAGT. These are problematic since, in the HSP finding stage, every shift by a multiple of the period looks as good as any other. If your lastz parameters allow for high divergence then highly-diverged microsatellites will have a similar effect. If you are running several queries, it is often just a few queries that contain the problematic element. You can find out which query is being processed at the point of failure by adding --progress to the command line. Bob H (lastz author) |
|
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: indianapolis Join Date: Jul 2015
Posts: 2
|
![]()
Hi all,
I am graduate student in bioinformatics.I am working with Lastz, trying to compare two dna sequence files of sparrow genome. my command is lastz GCF_000385455.1_Zonotrichia_albicollis-1.0.1_genomic.fa[multiple] Capensis_both_Out.txt.scafSeq --format=difference --output=/net/home/skothapalli/result.fa output was lastz: No match. what does this mean? Can someone help me out! thanks in advance, Sam |
![]() |
![]() |
![]() |
#13 | |
Member
Location: USA Join Date: Mar 2014
Posts: 13
|
![]()
This is apparently a message from your shell. The string "no match" isn't something lastz ever outputs.
I've seen this reported a couple times before, but I've never found a machine that does it. There is more information about that in this thread (search down the page for "no match"): http://seqanswers.com/forums/archive...p/t-20113.html My second post in that thread (05-15-2014, 07:28 AM) contains a workaround. Quote:
Bob H (lastz author) |
|
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: indianapolis Join Date: Jul 2015
Posts: 2
|
![]()
Hi Bob,
I ran ls -al 'which lastz' and my output was skothapalli@bioinformatics:/net/common/data/sparrow_project$ ls -al `which lastz` -rwxr-xr-x. 1 root root 435735 Jun 17 14:33 /usr/local/bin/lastz and I am working in the following path: /net/common/data/sparrow_project So,do I have to change path somewhere? thanks in advance, Sam |
![]() |
![]() |
![]() |
#15 |
Member
Location: Planet Earth, approximately Join Date: Feb 2014
Posts: 14
|
![]()
Well I just typed a longer reply but when I submitted it my browser failed. If this post is successful I will try again.
|
![]() |
![]() |
![]() |
#16 |
Member
Location: Planet Earth, approximately Join Date: Feb 2014
Posts: 14
|
![]()
Sam, I really don't have any idea why your shell is giving you that error.
From what you posted, it looks like the shell is finding the executable file, which ought to rule out path problems. That file has a reasaonble size, and it's set for "x" for all users. That all looks correct. My only other guess is that the executable was built for a different machine, but that's truly a wild ass guess (and I'd hope you'd get a more informative error message than "no match"). It's apparently a rare situation (3 reports in 5 years) and I've never been able to reproduce it. I've tried searching for what instances the shell would report "no match" but I've had no luck. What shell are you using? As anothe data point, if you try wrapping your command in in an executable text file, does that work? Bob H (lastz author) |
![]() |
![]() |
![]() |
#17 |
Member
Location: Planet Earth, approximately Join Date: Feb 2014
Posts: 14
|
![]()
With a little more searching I've found a plausible answer. I think your shell is trying to interpret [multiple] as a regular expression, as part of command-line filename expansion.
If that's the case, I don't know how to prevent it (my shell isn't doing that). You might try putting quotes or double quotes around [multiple], but I have my doubts that that will change anything. There's an alternative way to get that multiple option into lastz, without binding it to the file name. --action1=[multiple] But I think the shell is still going to try to expand anything in square brackets. It's going to take someone who understands your shell to solve this. Sorry. Bob H (lastz author) |
![]() |
![]() |
![]() |
#18 |
Member
Location: Planet Earth, approximately Join Date: Feb 2014
Posts: 14
|
![]()
I know this doesn't help you, but the next release of lastz (something greater than 1.03.73) will have a workaround for this, so that you can set that kind of file modifier without having to use square brackets. Under the unconfirmed idea that that is what the problem is.
Bob H (lastz author) |
![]() |
![]() |
![]() |
#19 |
Junior Member
Location: Switzerland Join Date: Aug 2015
Posts: 3
|
![]()
Hello,
I am also having some problems with the option "[multiple]". I am trying to run the following command: >lastz chicken.chrUn.fa[multiple] turkey.chr1.fa K=3000 L=2200 H=2000 M=50 format=axt > chicken.chrUn.turkey.chr1.axt where the file chicken.chrUn.fa contains hundreds of sequences/scaffolds and the file turkey.chr1.fa contains one single sequence (always softmasked). I tried this with small test files and everything worked fine but when running in the actual data I get this error: searching for matches in turkey.chr1.fa processing anchor #1 (of 63651) (8212754/78681682) 16394104/78681682 alignment block score=294106 at (8212350/78681278) 16393700/78681278 length 3543/3485 processing anchor #2 (of 63651) (15263154/71894445) 30514955/71894445 alignment block score=224990 at (15261448/71892751) 30513249/71892751 length 2933/3040 processing anchor #3 (of 63651) (19232279/25722160) 38458396/25722160 alignment block score=497244 at (19226538/25716168) 38452655/25716168 length 6286/6534 processing anchor #4 (of 63651) (12774217/202502915) 25532709/202502915 alignment block score=288391 at (12773292/202502037) 25531784/202502037 length 3719/3839 processing anchor #5 (of 63651) (17840002/111698882) 35673300/111698882 FAILURE: lookup_partition could not locate position 17833297 in chicken.chrUn.fa and it repeats consistently across all turkey chromosomes every time pointing to different locations in chicken.chrUn.fa. I do want to take advantage of the possibility to have more than one sequence in the target fasta but I continue to fail. So I wonder if someone would know how to fix this. Thanks! |
![]() |
![]() |
![]() |
#20 |
Member
Location: USA Join Date: Mar 2014
Posts: 13
|
![]()
Howdy, ereaye,
This indicates some sort of internal error in lastz. Internally it concatenates all the target sequences into one, is trying to convert a position in that concatenated sequence back to the original sequence coordinates, and something has gone wrong with that. I'd like to reproduce this here. Are these standard chicken and turkey assemblies? If so could you tell me which release/assembly they are? Or if perhaps turkey isn't available online, would it be possible to make that chromosome available somewhere? Also, what version of lastz are you running (lastz --version). Thanks, Bob H (lastz author) |
![]() |
![]() |
![]() |
Thread Tools | |
|
|