SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
MaSuRCA install error Volklor Bioinformatics 6 01-22-2014 06:10 AM
MaSurCa Assembler Coverage. J.David General 0 01-21-2014 09:57 AM
Tophat Error: Error: segment-based junction search failed with err =-6 sjnewhouse RNA Sequencing 8 03-19-2013 04:14 AM
tophat2 segment_juncs error: Error: segment-based junction search failed with err =-6 [email protected] Bioinformatics 1 04-16-2012 06:37 AM

Reply
 
Thread Tools
Old 02-11-2014, 09:05 AM   #1
bsp017
Member
 
Location: Bangor, UK

Join Date: Jul 2013
Posts: 16
Default MaSuRCA error

Hi everyone,

I've been trying to run MaSuRCA to de novo assemble my Illumina fastq files. However after I generate the assemble.sh script I get the following error:

./assemble.sh
Processing pe library reads
Unsupported input format for file '/home/frag_2.fastq'
Unsupported input format for file '/home/frag_1.fastq'
awk: fatal: division by zero attempted
Average PE read length
Illegal division by zero at -e line 1.
choosing kmer size of for the graph
MIN_Q_CHAR: 64
Creating mer database for Quorum.
Error correct PE.
Cutoff computation failed. Pass it explicitly with -p switch.
Error correction of PE reads failed. Check pe.cor.log.

My config file looked like this:

DATA
PE= pe 250 37 /home/frag_1.fastq /home/frag_2.fastq
END

PARAMETERS
GRAPH_KMER_SIZE=auto
USE_LINKING_MATES=1
NUM_THREADS=64
JF_SIZE=600000000
END

Any help would be greatly appreciated!

James
bsp017 is offline   Reply With Quote
Old 02-12-2014, 01:24 AM   #2
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 620
Default

How doyour fastq files look like? Seems that the assembler stumbles upon your input files .. maybe you can post a few lines ..
sklages is offline   Reply With Quote
Old 02-12-2014, 01:52 AM   #3
bsp017
Member
 
Location: Bangor, UK

Join Date: Jul 2013
Posts: 16
Default

Yes here is my fastq file:

@M00534:14:000000000-A47HD:1:1101:15878:1324 2:N:0:1
ATCCCGCAGGATGTGGAAAGCGCCGTCATGCCCGCAGAGCTACGGCCATTAACGCCAACCCGCGCCACACAAACCTACCCTTCTCCGGCCTCGGTCGAAAGCGCCGTGGCGCTGTTGCGTGCCGCGCGCAACCCGGGGATTCTGGCCGGGCACGGGGTTGCCAGAACCGGGCATGCGCCGGCGCTGGACGCCTTTGACTGGGGTTACGTCGTTTCGGTCGCACCACCTGTTATTGGGACGGGGGTGAGTCA
+
[email protected][email protected][email protected]<CCC>[email protected]<[email protected]@???CFF.9B-;-./[email protected]:B..-:@--;9;//./9.9---;:;/..:------;:..---..9;////:9-.-.:./.-;../.-..9----..../;//////.;[email protected];@@;../;/
@M00534:14:000000000-A47HD:1:1101:16163:1327 2:N:0:1
GATAGCCAAACGCTCAACCTTTGGGTACAACACCCCGGCCCGCCAGACAGAGGTCCCTTATACACACCCCACCCCGCCAAGCATACCGAAGATTGCCATCTCGGCGGCCGCCGCGTCGTTAAAAAAAAAACGCACGGCGCCCACAAACTGCTCCTCGGCCGGGTAGAACAGAATCCGCACCCGCCCGGGCGGTGGCGCTTCGCGACGTTGTAGGCATAGGTATGGTAGCACTAGCACAGTGTGTAGGGTAA
+
A1>A1F1111111A111100A1D000111D100BEA/AA//////A0/0000BF01F11112221B/B/>////>>/////>B1110/////B1B1B1BG1F0/////>-<><<-:;:;:..00;A-A?-.-------9-;--;9AF-;/9BFEBB----9----99BFF//B///-:-9--;---------;-------------9-;9-//////////;//-///////;9//9--////-/9/-///
@M00534:14:000000000-A47HD:1:1101:17525:1332 2:N:0:1
ATATGCTATCGCGCCAAAACCGTCGCTGAACGGCGTAAGCACATCACGCTTGATTTCAACCCGGAAGGCCTTGGTAATCAGGCAGCGATTGAACGTGCGATTGAATACTTCTTAAAAGACAGCCTGTCCGTGCATCACCCAC

I thought this might be the cause of the error message so I tried the test data from the MaSuRCA ftp site but got the same error.
bsp017 is offline   Reply With Quote
Old 02-12-2014, 02:03 AM   #4
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 620
Default

Looks like standard MiSeq output. But if you say that even their test data fails to assemble with masurka, you might have a problem with your installation of masurka?
You may want to check the assemble.sh script to see what masurka is trying to do.

If there is nothing obvious you should probably contact the authors ..
sklages is offline   Reply With Quote
Old 03-17-2014, 09:13 AM   #5
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

I replace 2:N:0:1 with \2 and it worked well for one genome. So it worth a try.
However, for different genome it generated similar error.

If you have able to resolve error, please post the possible solution here.
sagarutturkar is offline   Reply With Quote
Old 03-19-2014, 06:18 AM   #6
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default Solved

Hi,

I contacted developers regarding this and they suggested that read_names does not matter during pre-processing of data. He suggested me to perform a test with my fastq file:

Code:
file -b -i jumps.A.fastq
This gave me the results like:

Code:
text/x-python; charset=us-ascii
I emailed results to developers and they suggested that - the operating system thinks that your fastq file is a python code. This is not correct. The type should be text/plain.

Code:
The simple way to fix this:

Look at expand_fastq script under masurca bin folder and replace the line:

    (text/plain*)
with
    (text/*)
 
everything should work afterward.
After this change, I was able to run the assembler correctly with setting JF_SIZE=1800000000 value very high.

Thanks
Sagar
sagarutturkar is offline   Reply With Quote
Old 03-26-2014, 06:14 AM   #7
bsp017
Member
 
Location: Bangor, UK

Join Date: Jul 2013
Posts: 16
Default

Hi Sagar,

Thanks for your solution. This worked well on my local system. However when I amended the file on my univesities linux server I got an error. This was a different error to the original and is printed below:

Creating k-unitigs with k=81
[Wed Mar 26 12:34:33 GMT 2014] Computing super reads from PE
Linking PE reads 329036
[Wed Mar 26 12:35:06 GMT 2014] Celera Assembler
ovlMerThreshold=75
Overlap/unitig failed, check output under CA/ and runCA1.out

I have tried this several times, varying the parameters each time and using the masurca test data but I continue to get this error.

James
bsp017 is offline   Reply With Quote
Old 03-26-2014, 09:48 AM   #8
sagarutturkar
Member
 
Location: Tennessee, USA

Join Date: Sep 2010
Posts: 61
Default

I am not sure how to handle this specific error. However, I received some additional help from MaSuRCA as below:

Code:
MaSuRCA works best with coverage up to 150x.  To use 200x+ coverage data you need to set KMER_COUNT_THRESHOLD parameter in the config file to 2 or 3, or simply use part of the data.
You can check this parameters or write to authors with more specific questions.

Thanks
Sagar

Last edited by sagarutturkar; 03-26-2014 at 09:49 AM. Reason: minor
sagarutturkar is offline   Reply With Quote
Old 07-16-2014, 10:58 AM   #9
arthurmelo
Member
 
Location: Durham, NH, US

Join Date: Jul 2012
Posts: 19
Default

Hello people,
I have any problem with Masurca assembler. I think that problem is in Jellyfish step.
The error is related with close gaps: "Gap close failed, you can still use pre-gap close files under CA/9-terminator/. Check gepClose.err for problems"

Cheking gepClose.err ...

mkdir CA/10-gapclose
outputDirectory = CA/10-gapclose
/usr/local/MaSuRCA-2.0.3.1/bin/getEndSequencesOfContigs.perl /home/lays/jatoba/dados_norm/masurca/CA/9-terminator 51 200
/usr/local/MaSuRCA-2.0.3.1/bin/create_end_pairs.perl /home/lays/jatoba/dados_norm/masurca/CA/9-terminator 51 > /home/lays/jatoba/dados_norm/masurca/CA/10-gapclose/contig_end_pairs.51.fa
/usr/local/MaSuRCA-2.0.3.1/bin/create_end_pairs.perl /home/lays/jatoba/dados_norm/masurca/CA/9-terminator 200 > /home/lays/jatoba/dados_norm/masurca/CA/10-gapclose/contig_end_pairs.200.fa
/usr/local/MaSuRCA-2.0.3.1/bin/getMeanAndStdevForGapsByGapNumUsingCeleraAsmFile.perl /home/lays/jatoba/dados_norm/masurca/CA/9-terminator --contig-end-seq-file /home/lays/jatoba/dados_norm/masurca/CA/10-gapclose/contig_end_pairs.51.fa > gap.insertMeanAndStdev.txt
echo "cc 600 200" > meanAndStdevByPrefix.cc.txt
jellyfish count -s 7000000000 -C -t 12 -m 21 -L 100 -o restrictKmers /home/lays/jatoba/dados_norm/masurca/pA.renamed.fastq /home/lays/jatoba/dados_norm/masurca/pB.renamed.fastq /home/lays/jatoba/dados_norm/masurca/pC.renamed.fastq /home/lays/jatoba/dados_norm/masurca/pD.renamed.fastq
ln -sf restrictKmers_0 restrictKmers
jellyfish dump -L 1000 restrictKmers -c > highCountKmers.txt
jellyfish count -s 1 -C -t 12 -m 21 -o fishingAll /home/lays/jatoba/dados_norm/masurca/CA/10-gapclose/contig_end_pairs.200.fa
terminate called after throwing an instance of 'jellyfish::file_parser::FileParserError'
what(): Empty input file '/home/lays/jatoba/dados_norm/masurca/CA/10-gapclose/contig_end_pairs.200.fa'
child died with signal 6, with coredump

The algorithm doesn't create the file "contig_end_paird.200.fa". What it means?
Someone can help me?

Thanks a lot ...
arthurmelo is offline   Reply With Quote
Old 06-06-2015, 05:17 AM   #10
freestile
Member
 
Location: Chile

Join Date: Aug 2014
Posts: 11
Default

Quote:
Originally Posted by bsp017 View Post
Hi Sagar,

Thanks for your solution. This worked well on my local system. However when I amended the file on my univesities linux server I got an error. This was a different error to the original and is printed below:

Creating k-unitigs with k=81
[Wed Mar 26 12:34:33 GMT 2014] Computing super reads from PE
Linking PE reads 329036
[Wed Mar 26 12:35:06 GMT 2014] Celera Assembler
ovlMerThreshold=75
Overlap/unitig failed, check output under CA/ and runCA1.out

I have tried this several times, varying the parameters each time and using the masurca test data but I continue to get this error.

James

You can solve the problem ?? Im stopped in the same point.

regards
freestile is offline   Reply With Quote
Old 05-09-2016, 09:08 PM   #11
khyatiC
Junior Member
 
Location: india

Join Date: Aug 2014
Posts: 6
Default

Hello,

I am using MaSurCA to assemble plant genome, with 454, pacbio and Illumina PE and MP data. When I run the assembly i get this error:

[Mon May 2 13:51:44 IST 2016] Processing pe library reads
[Mon May 2 13:51:44 IST 2016] Processing sj library reads
Invalid fastq format (Unexpected end of file) in file '/dev/fd/62' around position -1
Invalid fastq format (Unexpected end of file) in file '/dev/fd/62' around position -1
[Mon May 2 14:32:31 IST 2016] Processing pe library reads
[Mon May 2 14:32:31 IST 2016] Processing sj library reads
Average PE read length 166
choosing kmer size of 70 for the graph
MIN_Q_CHAR: 33
[Mon May 2 19:59:07 IST 2016] Creating mer database for Quorum.
[Mon May 2 21:00:21 IST 2016] Error correct PE.
[Wed May 4 23:54:47 IST 2016] Error correct JUMP.
[Thu May 5 03:34:52 IST 2016] Estimating genome size.
Estimated genome size: 1469569156
[Thu May 5 04:52:19 IST 2016] Creating k-unitigs with k=70
panic: memory wrap at -e line 1, <> line 3749709508.
END failed--call queue aborted, <> line 3749709508.
[Thu May 5 21:52:26 IST 2016] Creating k-unitigs with k=31
[Fri May 6 11:54:54 IST 2016] Filtering JUMP.
Assuming outtie orientation
Chimeric/Redundant jump reads:
43056 chimeric_sj.txt
5337014 redundant_sj.txt
5380070 total
Found extra chimeric mates:
37278 work2.1/readsToExclude.txt
[Sun May 8 01:29:48 IST 2016] Creating FRG files
[Sun May 8 03:20:10 IST 2016] Computing super reads from PE
Super reads failed, check super1.err and files in ./work1/

As mentioned in this blog i also checked file -b -i option which is text/plain ; charset=us-ascii for all fastq files.

Can anyone help me out here?
khyatiC is offline   Reply With Quote
Old 05-09-2016, 09:14 PM   #12
khyatiC
Junior Member
 
Location: india

Join Date: Aug 2014
Posts: 6
Default

My config file looks like this:

DATA
PE= pa 250 100 300bp_R1.fastq 300bp_R2.fastq
PE= pb 300 150 500bp_R1.fastq 500bp_R2.fastq
PE= pc 400 300 Miseq_R1.fastq Miseq_R2.fastq

JUMP= sa 1700 1000 2kb1_R1.fastq 2kb1_R2.fastq
JUMP= ha 1700 1000 2kb2_R1.fastq 2kb2_R2.fastq
JUMP= ga 1700 1000 2kb3_R1.fastq 2kb3_R2.fastq
JUMP= sb 1700 1000 2kb4_R1.fastq 2kb4_R2.fastq
JUMP= hb 1700 1000 2kb5_R1.fastq 2kb5_R2.fastq
JUMP= gb 1700 1000 2kb6_R1.fastq 2kb6_R2.fastq
JUMP= sc 1700 1000 2kb7_R1.fastq 2kb7_R2.fastq
JUMP= hc 1700 1000 2kb8_R1.fastq 2kb8_R2.fastq

JUMP= gc 2500 1000 4kb1_R1.fastq 4kb1_R2.fastq
JUMP= sd 2500 1000 4kb2_R1.fastq 4kb2_R2.fastq
JUMP= hd 2500 1000 4kb3_R1.fastq 4kb3_R2.fastq
JUMP= gd 2500 1000 4kb4_R1.fastq 4kb4_R2.fastq

JUMP= se 3000 1000 6kb1_R1.fastq 6kb1_R2.fastq
JUMP= he 3000 1000 6kb2_R1.fastq 6kb2_R2.fastq
JUMP= ge 3000 1000 6kb3_R1.fastq 6kb3_R2.fastq

JUMP= mc 5500 1000 8kb1_R1.fastq 8kb1_R2.fastq
JUMP= md 5500 1000 8kb2_R1.fastq 8kb2_R2.fastq
JUMP= me 11500 1000 20kb1_R1.fastq 20kb1_R2.fastq
JUMP= mf 11500 1000 20kb2_R1.fastq 20kb2_R2.fastq


OTHER=SG_combined.frg
OTHER=IIWSK1V01.frg
OTHER=IIWSK1V02.frg
OTHER=IJK1LD202.frg
OTHER=pacbio.frg
END

PARAMETERS
#this is k-mer size for deBruijn graph values between 25 and 101 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE=auto
#set this to 1 for Illumina-only assemblies and to 0 if you have 1x or more long (Sanger, 454) reads, you can also set this to 0 for large data sets with high jumping clone coverage, e.g. >50x
USE_LINKING_MATES=0
#this parameter is useful if you have too many jumping library mates. Typically set it to 60 for bacteria and something large (300) for mammals
LIMIT_JUMP_COVERAGE= 300
#these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically. for mammals do not set cgwErrorRate above 0.15!!!
CA_PARAMETERS = ovlMerSize=30 cgwErrorRate=0.15 ovlMemory=4GB
#minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if coverage >100
KMER_COUNT_THRESHOLD = 1
#auto-detected number of cpus to use
NUM_THREADS=58
#this is mandatory jellyfish hash size
JF_SIZE=35000000000
#this specifies if we do (1) or do not (0) want to trim long runs of homopolymers (e.g. GGGGGGGG) from 3' read ends, use it for high GC genomes
DO_HOMOPOLYMER_TRIM=0
END
khyatiC is offline   Reply With Quote
Old 05-09-2016, 11:02 PM   #13
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 620
Default

Well, looks funny ;-)

What system you are running on? Mac/Linux? How much memory?
Is your data stored locally or on a NFS mount?
What is your expected coverage?

Is this the very first output containing an error message?

The first error seems to be related to reading fastq data from the pipes.
Check the fastq files that these are not abnormally truncated.

Something like
Code:
tail -n 4 FILE.fq
The second error (3 days later!?) implies that there is something that wants to allocate more memory than is available on your system. Check if you have enough memory. -> top
It still continues ...

Another three days later it finally dies in another step of the pipeline ...

hmm, no solution, bu try to search at the very beginning ...
sklages is offline   Reply With Quote
Old 05-10-2016, 12:26 AM   #14
khyatiC
Junior Member
 
Location: india

Join Date: Aug 2014
Posts: 6
Default

Hi sklages,

I am running the script on linux system. The data is stored on Cluster. Expected coverage is 50x.

Yes this is the very first output error message.

I checked the fastq files. They are not abnormally truncated. Regarding memory, the requirements specified in the manual are matched. So I not sure whether that will be an issue.
khyatiC is offline   Reply With Quote
Old 05-10-2016, 02:12 AM   #15
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 620
Default

How are you accessing the data? Via NFS? I am asking as we had some issues with I/O in our cluster environment.

Just have a look at how much memory your job consumes. You seem to run 58 threads, .. no idea if the unitigging step is multi-threaded and how much RAM it really needs in total. Then you'll see if this is an issue.

You should probably start from step "Creating k-unitigs with k=70" .. all intermediate files should be available ..
sklages is offline   Reply With Quote
Old 05-10-2016, 02:29 AM   #16
khyatiC
Junior Member
 
Location: india

Join Date: Aug 2014
Posts: 6
Default

I am running the script on cluster itself. It has 1TB Ram and 64 cores of which 58 are provided for masurca to run.
How to restart from step "Creating k-unitigs with k=70" ? I am not sure how to do that.

Thanks.
khyatiC is offline   Reply With Quote
Old 05-10-2016, 02:43 AM   #17
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 620
Default

Still you need to check how much RAM your 58 threads consume on your machine.
Check if IO bandwidth is sufficient; are you the only one reading from / writing to the filesystem with your job?

Restarting is described in http://www.genome.umd.edu/docs/MaSuR...StartGuide.pdf -- "Restarting failed assembly".
sklages is offline   Reply With Quote
Old 05-10-2016, 03:30 AM   #18
khyatiC
Junior Member
 
Location: india

Join Date: Aug 2014
Posts: 6
Default

Yes, I am the only one reading from / writing to the file system with this job. I have re-ran the assembly with Creating k-unitigs with k=70 step. Lets see.

Thanks.
khyatiC is offline   Reply With Quote
Old 05-15-2016, 10:23 PM   #19
khyatiC
Junior Member
 
Location: india

Join Date: Aug 2014
Posts: 6
Default

Hi sklages,

Is there any parameter to define how much memory the process can take or how much memory we want to allot for Masurca? Since I think as you mentioned before also, the error is regarding memory.

Thanks.
khyatiC is offline   Reply With Quote
Reply

Tags
bacterial assembly, masurca

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:13 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO