Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MIRA 4.0 denovo PacBio FastQ

    Hello!
    I'm new to the field of Bioinformatics (I'm studying Molecular Biology in my 3rd year) and I'm currently doing an internship at a company.
    I got FastQ and FASTA (Pacbio) files and should do a de-novo assambly (of Aeromonas salmonicida pectinolytica) with them. The files are 400mb each and have about 68.000 reads size 35-18.000 bases. I first tried the pacbio smrtanalays/portal tool. But I need bad.h5 data for this, which i don't have. So I am now using Mira 4.0.

    Syntax:
    ./mira manifest.conf>log_assembly.txt
    Manifest:
    project = MyFirstAssembly
    job = genome,denovo,draft
    parameters = PCBIOHQ_SETTINGS -CO:mrpg=5
    readgroup = L4466_Track data = XX.fastq XX2.fastq technology = sanger
    segment_placement= FR
    output:
    On: Linux vk10464 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 18:00:34 UTC 2012 x86_64 GNU/Linux
    Compiled in boundtracking mode.
    Compiled in bugtracking mode.
    Compiled with ENABLE64 activated.
    Runtime settings (sorry, for debug):
    Size of size_t : 8
    Size of uint32 : 4
    Size of uint32_t: 4
    Size of uint64 : 8
    Size of uint64_t: 8
    Current system: Linux annapurna 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


    Fatal error (may be due to problems of the input data or parameters):

    ********************************************************************************
    * Oooops, the readgroup 'L4466_Track data = XX.fastq *
    * XX2.fastq technology=sanger' has no sequencing *
    * technology defined, nor is it defined as reference (which would excuse the *
    * missing technology definition). *
    ********************************************************************************
    ->Thrown: void ReadGroupLib::fillInSensibleDefaults(rgid_t libid)
    ->Caught: main

    Aborting process, probably due to error in the input data or parametrisation.
    Please check the output log for more information.
    For help, please write a mail to the mira talk mailing list.
    Subscribing / unsubscribing to mira talk, see: http://www.freelists.org/list/mira_talk

    CWD: /home/haudum/Project/Program/Mira/mira_4.0.2_linux-gnu_x86_64_static/bin
    Thank you for noticing that this is *NOT* a crash, but a
    controlled program stop.
    Your system seems to be older or have some quirks with locale settings.
    Using the LC_ALL=C workaround.
    If you don't want that, fix your system ;-)
    Failure, wrapped MIRA process aborted.
    But it fails every time. It sounds like mira doesn't recognize the technology..i also tried pcbiohq which also did't work!

    Thank you very much everyone for your help. I'm a really beginner in this topic.

  • #2
    I recommend two things:

    1. Try the mira list where they are very helpful: http://www.freelists.org/list/mira_talk

    2. Ask your PacBio sequencing provider for the metadata.xml, bas.h5 and bax.h5 files and run them through the SMRTportal.

    I'm sorry I cannot be more helpful but it's a start.

    Comment


    • #3
      Your manifest as shown is missing some new lines - the sequencing type of the read group should be on its own line for example.

      Comment


      • #4
        Thanks a lot for your help!

        I tired it with:
        project = MyFirstAssembly
        job = genome,denovo,accurate
        parameters = COMMON_SETTINGS -NW:cmrnl=no SANGER_SETTINGS -CO:mrpg=5
        readgroup = Sanger
        data =xxxx.fastq xxxx2.fastq
        technology = sanger
        rename_prefix=HWI-ST330:422:C4AVHACXX clostraur
        and it runs for 6h now...hope the result is ok then.

        Silly question: To view the results..should i use gap4 or gap5 or is there any other program better?

        yours,
        haudi

        Comment


        • #5
          Good luck

          I personally convert MIRA version 4 output to SAM (using mira_convert) and then into a sorted index BAM file using samtools (optionally with 'samtools depad'). Then you can use the BAM viewer of your choice, e.g. Tablet should show MIRA's contig annotation.

          If you intend to edit your alignment, gap5 is probably the best choice.

          Comment


          • #6
            I don't know the results yet..mira ran now for 24h and take up 85% = 40GB of ram...i thought todue the small genome size that it wont take that many.

            Ok with gap5 i can edit the alignment...think i have a lot at it first and hopefully the alignment is good

            edit: i will now test it on a 500gb ram cluster. Does anyone know how to tell mira how many cpu's it should use?
            Last edited by haudi; 07-23-2014, 12:31 AM.

            Comment


            • #7
              You can set the number of threads in the MIRA v4 manifest, or at the command line, e.g. for eight threads use:

              $ mira -t 8 my_manifest.txt

              See http://mira-assembler.sourceforge.ne...ideToMIRA.html

              Note that not all parts of MIRA take advantage of multiple threads.

              Comment


              • #8
                PBcR

                Hi, from the looks of it, you probably have uncorrected PacBio reads as input, but Mira 4.0 only can assemble PacBio reads that have gone through some kind of preassembly/correction. See here:



                To assemble from the subreads.fastq directly, I would suggest trying PBcR, a tool that is part of Celera Assembler:



                In particular, the 8.2 beta should let you comfortably assemble your genome on a single node using the MHAP algorithm.

                The bas.h5 files would be required for polishing to get high consensus accuracy (by running through Quiver).

                Comment


                • #9
                  thanks!
                  How do i know if i have corrected or uncorrected reads?

                  Comment


                  • #10
                    subreads

                    One way to tell is the uncorrected files will have the word "subreads" in the filename, such as filtered_subreads.fasta . A subread corresponds to a single pass across some or all of the physical insert.

                    Comment


                    • #11
                      ok i havejust 2 subread files :-/ i searched but did't find any program to convert them to corrected reads.(pacBioToCA needs long and short reads). Does anyone have a good solution for my problem?
                      My MIRA output folder hast different mafs. Which one is the right one? *.LargeContigs_out.maf,*.out.maf
                      When i use Tablet to show my results..i nearly have no more than 2 alignment 'reads'(?) and over 2200 contigs. The examples from Tablet have much higher rate.

                      i also used celera (runCA) to assambly my reads. now i have asm data. Can i use ca2ace.pl?
                      Thanks everyone!
                      Last edited by haudi; 07-30-2014, 01:01 AM. Reason: additional information added

                      Comment


                      • #12
                        PBcR

                        Use PBcR as per my earlier post to correct and then assemble the subreads. The latest versions can do self-correction, which is equivalent to the preassembly step in HGAP.

                        Comment


                        • #13
                          Thanks again. Sry for the amount of questions but I'm really new to the topic and don;t know what is possible. I read through the MIRA manual but the connection between Celera Mira and other programs is still a little bit hard.

                          It worked and now I ran MIRA with the 2 self corrected fast files. How can I influence (for example with the manifest file) the fact that i get lots of contigs(=973) size from 2,300,000bp to 600bp. Are Contigs pieces which cannot be aligned?
                          I already know the genome size be cause its from Aeromonas salmonicida. How can I use a scaffold telling Mira to align the contains?
                          Last edited by haudi; 08-01-2014, 01:51 AM. Reason: defined contig amount

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X