SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PacBio RS quality score output definition (FASTQ) vamiyn Pacific Biosciences 3 07-16-2013 02:44 AM
DeNovo assembly using pacBio data krittika.sasmal Pacific Biosciences 50 06-05-2013 09:56 AM
how to use mira to assemble the fastq generated by 454 sequencing dingkai0564 Bioinformatics 6 05-26-2013 01:12 PM
Announcement: MIRA V3.2.0rc1 with PacBio support BaCh Bioinformatics 0 07-04-2010 11:28 PM
A qusetion about denovo assembly 454 sequence using MIRA kentnf Bioinformatics 7 04-24-2009 05:36 AM

Reply
 
Thread Tools
Old 07-21-2014, 01:25 AM   #1
haudi
Junior Member
 
Location: Austria

Join Date: Jul 2014
Posts: 6
Default MIRA 4.0 denovo PacBio FastQ

Hello!
I'm new to the field of Bioinformatics (I'm studying Molecular Biology in my 3rd year) and I'm currently doing an internship at a company.
I got FastQ and FASTA (Pacbio) files and should do a de-novo assambly (of Aeromonas salmonicida pectinolytica) with them. The files are 400mb each and have about 68.000 reads size 35-18.000 bases. I first tried the pacbio smrtanalays/portal tool. But I need bad.h5 data for this, which i don't have. So I am now using Mira 4.0.

Syntax:
Quote:
./mira manifest.conf>log_assembly.txt
Manifest:
Quote:
project = MyFirstAssembly
job = genome,denovo,draft
parameters = PCBIOHQ_SETTINGS -CO:mrpg=5
readgroup = L4466_Track data = XX.fastq XX2.fastq technology = sanger
segment_placement= FR
output:
Quote:
On: Linux vk10464 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 18:00:34 UTC 2012 x86_64 GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compiled with ENABLE64 activated.
Runtime settings (sorry, for debug):
Size of size_t : 8
Size of uint32 : 4
Size of uint32_t: 4
Size of uint64 : 8
Size of uint64_t: 8
Current system: Linux annapurna 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


Fatal error (may be due to problems of the input data or parameters):

********************************************************************************
* Oooops, the readgroup 'L4466_Track data = XX.fastq *
* XX2.fastq technology=sanger' has no sequencing *
* technology defined, nor is it defined as reference (which would excuse the *
* missing technology definition). *
********************************************************************************
->Thrown: void ReadGroupLib::fillInSensibleDefaults(rgid_t libid)
->Caught: main

Aborting process, probably due to error in the input data or parametrisation.
Please check the output log for more information.
For help, please write a mail to the mira talk mailing list.
Subscribing / unsubscribing to mira talk, see: http://www.freelists.org/list/mira_talk

CWD: /home/haudum/Project/Program/Mira/mira_4.0.2_linux-gnu_x86_64_static/bin
Thank you for noticing that this is *NOT* a crash, but a
controlled program stop.
Your system seems to be older or have some quirks with locale settings.
Using the LC_ALL=C workaround.
If you don't want that, fix your system ;-)
Failure, wrapped MIRA process aborted.
But it fails every time. It sounds like mira doesn't recognize the technology..i also tried pcbiohq which also did't work!

Thank you very much everyone for your help. I'm a really beginner in this topic.
haudi is offline   Reply With Quote
Old 07-21-2014, 01:12 PM   #2
JohnN
Member
 
Location: Toronto

Join Date: Jan 2011
Posts: 30
Default

I recommend two things:

1. Try the mira list where they are very helpful: http://www.freelists.org/list/mira_talk

2. Ask your PacBio sequencing provider for the metadata.xml, bas.h5 and bax.h5 files and run them through the SMRTportal.

I'm sorry I cannot be more helpful but it's a start.
JohnN is offline   Reply With Quote
Old 07-22-2014, 04:18 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Your manifest as shown is missing some new lines - the sequencing type of the read group should be on its own line for example.
maubp is offline   Reply With Quote
Old 07-22-2014, 05:41 AM   #4
haudi
Junior Member
 
Location: Austria

Join Date: Jul 2014
Posts: 6
Default

Thanks a lot for your help!

I tired it with:
Quote:
project = MyFirstAssembly
job = genome,denovo,accurate
parameters = COMMON_SETTINGS -NW:cmrnl=no SANGER_SETTINGS -CO:mrpg=5
readgroup = Sanger
data =xxxx.fastq xxxx2.fastq
technology = sanger
rename_prefix=HWI-ST330:422:C4AVHACXX clostraur
and it runs for 6h now...hope the result is ok then.

Silly question: To view the results..should i use gap4 or gap5 or is there any other program better?

yours,
haudi
haudi is offline   Reply With Quote
Old 07-22-2014, 06:30 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Good luck

I personally convert MIRA version 4 output to SAM (using mira_convert) and then into a sorted index BAM file using samtools (optionally with 'samtools depad'). Then you can use the BAM viewer of your choice, e.g. Tablet should show MIRA's contig annotation.

If you intend to edit your alignment, gap5 is probably the best choice.
maubp is offline   Reply With Quote
Old 07-22-2014, 11:09 PM   #6
haudi
Junior Member
 
Location: Austria

Join Date: Jul 2014
Posts: 6
Default

I don't know the results yet..mira ran now for 24h and take up 85% = 40GB of ram...i thought todue the small genome size that it wont take that many.

Ok with gap5 i can edit the alignment...think i have a lot at it first and hopefully the alignment is good

edit: i will now test it on a 500gb ram cluster. Does anyone know how to tell mira how many cpu's it should use?

Last edited by haudi; 07-23-2014 at 12:31 AM.
haudi is offline   Reply With Quote
Old 07-23-2014, 02:05 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

You can set the number of threads in the MIRA v4 manifest, or at the command line, e.g. for eight threads use:

$ mira -t 8 my_manifest.txt

See http://mira-assembler.sourceforge.ne...ideToMIRA.html

Note that not all parts of MIRA take advantage of multiple threads.
maubp is offline   Reply With Quote
Old 07-24-2014, 12:00 PM   #8
lhon
Junior Member
 
Location: bay area

Join Date: Dec 2011
Posts: 5
Default PBcR

Hi, from the looks of it, you probably have uncorrected PacBio reads as input, but Mira 4.0 only can assemble PacBio reads that have gone through some kind of preassembly/correction. See here:

http://mira-assembler.sourceforge.ne...sect_pd_pacbio

To assemble from the subreads.fastq directly, I would suggest trying PBcR, a tool that is part of Celera Assembler:

http://wgs-assembler.sourceforge.net...index.php/PBcR

In particular, the 8.2 beta should let you comfortably assemble your genome on a single node using the MHAP algorithm.

The bas.h5 files would be required for polishing to get high consensus accuracy (by running through Quiver).
lhon is offline   Reply With Quote
Old 07-27-2014, 11:38 PM   #9
haudi
Junior Member
 
Location: Austria

Join Date: Jul 2014
Posts: 6
Default

thanks!
How do i know if i have corrected or uncorrected reads?
haudi is offline   Reply With Quote
Old 07-29-2014, 10:10 AM   #10
lhon
Junior Member
 
Location: bay area

Join Date: Dec 2011
Posts: 5
Default subreads

One way to tell is the uncorrected files will have the word "subreads" in the filename, such as filtered_subreads.fasta . A subread corresponds to a single pass across some or all of the physical insert.
lhon is offline   Reply With Quote
Old 07-29-2014, 11:05 PM   #11
haudi
Junior Member
 
Location: Austria

Join Date: Jul 2014
Posts: 6
Default

ok i havejust 2 subread files :-/ i searched but did't find any program to convert them to corrected reads.(pacBioToCA needs long and short reads). Does anyone have a good solution for my problem?
My MIRA output folder hast different mafs. Which one is the right one? *.LargeContigs_out.maf,*.out.maf
When i use Tablet to show my results..i nearly have no more than 2 alignment 'reads'(?) and over 2200 contigs. The examples from Tablet have much higher rate.

i also used celera (runCA) to assambly my reads. now i have asm data. Can i use ca2ace.pl?
Thanks everyone!

Last edited by haudi; 07-30-2014 at 01:01 AM. Reason: additional information added
haudi is offline   Reply With Quote
Old 07-30-2014, 12:39 PM   #12
lhon
Junior Member
 
Location: bay area

Join Date: Dec 2011
Posts: 5
Default PBcR

Use PBcR as per my earlier post to correct and then assemble the subreads. The latest versions can do self-correction, which is equivalent to the preassembly step in HGAP.
lhon is offline   Reply With Quote
Old 08-01-2014, 12:08 AM   #13
haudi
Junior Member
 
Location: Austria

Join Date: Jul 2014
Posts: 6
Default

Thanks again. Sry for the amount of questions but I'm really new to the topic and don;t know what is possible. I read through the MIRA manual but the connection between Celera Mira and other programs is still a little bit hard.

It worked and now I ran MIRA with the 2 self corrected fast files. How can I influence (for example with the manifest file) the fact that i get lots of contigs(=973) size from 2,300,000bp to 600bp. Are Contigs pieces which cannot be aligned?
I already know the genome size be cause its from Aeromonas salmonicida. How can I use a scaffold telling Mira to align the contains?

Last edited by haudi; 08-01-2014 at 01:51 AM. Reason: defined contig amount
haudi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO