SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERROR: [bcf_sync] incorrect number of fields.. vinodhsri Bioinformatics 21 08-11-2017 01:42 PM
Error: number of labels must match number of conditions in cuffdiff bvk Bioinformatics 12 05-20-2015 11:41 AM
Error: number of labels must match number of conditions shpyang Bioinformatics 2 08-15-2013 01:45 PM
PubMed: Ray: simultaneous assembly of reads from a mix of high-throughput sequencing Newsbot! Literature Watch 0 03-01-2011 10:30 AM

Reply
 
Thread Tools
Old 09-21-2015, 04:54 AM   #1
standonn
Member
 
Location: UK

Join Date: Nov 2014
Posts: 14
Default Ray Error: miscount of the number of reads

Dear all,

I am trying to run Ray to de novo assemble a nematode genome.
I run into the following error:

Code:
mpirun \
-n 32 \
/mnt/Programs/Ray-2.3.1/Ray \
-k 81 \
-o rayk81 \
-p ../../clean_reads/PE_1.noCont.ec.fa ../../clean_reads/PE_2.noCont.ec.fa \
-p ../../clean_reads/MP3_1.noCont.ec.fa ../../clean_reads/MP3_2.noCont.ec.fa \
-p ../../clean_reads/MP5_1.noCont.ec.fa ../../clean_reads/MP5_2.noCont.ec.fa \
-p ../../clean_reads/MP8_1.noCont.ec.fa ../../clean_reads/MP8_2.noCont.ec.fa

[.....]

Rank 7: File ../../clean_reads/MP8_2.noCont.ec.fa (Number 7) has 10233322 sequences
Rank 6: File ../../clean_reads/MP8_1.noCont.ec.fa (Number 6) has 10231913 sequences
Rank 5: File ../../clean_reads/MP5_2.noCont.ec.fa (Number 5) has 10722610 sequences
Rank 2: File ../../clean_reads/MP3_1.noCont.ec.fa (Number 2) has 14151655 sequences
Rank 4: File ../../clean_reads/MP5_1.noCont.ec.fa (Number 4) has 10722031 sequences
Rank 3: File ../../clean_reads/MP3_2.noCont.ec.fa (Number 3) has 14152522 sequences
Rank 0: File ../../clean_reads/PE_1.noCont.ec.fa (Number 0) has 100860164 sequences
Rank 1: File ../../clean_reads/PE_2.noCont.ec.fa (Number 1) has 100860164 sequences
Rank 0 wrote rayk81/NumberOfSequences.txt
Rank 0 wrote rayk81/SequencePartition.txt

Rank 0 : Error, ../../clean_reads/MP3_1.noCont.ec.fa contains 14151655 sequences and ../../clean_reads/MP3_2.noCont.ec.fa contains 14152522 sequences (must be the same)
The problem detected by Ray (not the same number of sequences in the left and right read files) is wrong. Actually Ray does not seem to correctly count the sequences:

Code:
grep -c '^>' ../../clean_reads/MP3_1.noCont.ec.fa
9763950
grep -c '^>' ../../clean_reads/MP3_2.noCont.ec.fa  
9763950
A head of my read files looks completely normal (regular multiline fasta).
I have also run other assemblers successfully on this data, so I know there is no format problem with the files.

Any insights on what could be causing this problem?

Best Wishes,
Sophie
standonn is offline   Reply With Quote
Reply

Tags
de novo assemby, genome assembly, ray

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:22 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO