Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembling .sff files from 454 and finishing

    Hi can anybody suggest good assembly programs, other than Newbler and MIRA, which can use .sff files directly as an input file, not fasta.

    Also, I have generated an .ace file from newbler which is not fully compatible with consed (I can open the file in consed but for some reason the contig number look different). Could anybody suggest good programs, which I can use to finish a 454 generated genome? something that will allow me to view the scaffolds and join or break where needed.
    I've tried consed and staden, any others would be greatly appreciated!!


    Thanks in advance!

    Raj

  • #2
    ...I was informed yesterday that the new version of consed (v18) should now be fully compatible with 454 data.
    Also, with proposed release of Gap5, this too should also resolve the incompatibility issues, many programs seem to have when trying to finish 454 generated data.

    Using MIRA and Newbler, seem to be the best methods for assembling 454 data, so that the pair end data can be fully taken advantage of.

    Finishing is still the bottleneck for which, i hope the new versions of Consed and Gap can resolve...

    Comment


    • #3
      yes, consed 18 is out for few weeks, you need update for phrap as well.
      I did not have any problems with installation (32-bit Fedora 10)

      anyway, it does not perform de novo assembly of 454 reads, right? however it reads Newbler .ssf files, and allows assemble 454 reads to the reference sequence.

      please correct me when I am wrong...

      Comment


      • #4
        .. and it can directly read newbler created ace files. So if you like newbler, no problem.
        Maybe it's a good starting point for finishing a (shotgun) project if there is no sanger
        backbone.

        A good alternative might be MIRA which writes a CAF file (which can be easily converted
        to gap4). But gap4 might slow down if you have a huge dataset ...

        For larger assemblies you might want to have a look at Celera Assembler which in our
        hands makes a good job with sanger/454(FLX) hybrid assemblies in the bacterial genome
        size range.

        Just my 2p,
        Sven

        Comment


        • #5
          assembly issues

          Has anyone assembled 454 data with consed package version 19? I'm having some issues with reading of the .sff files and wondering if anyone has completed an assembly of 454 data (not using Roche software produced .ace files). I'm using "add454Reads.perl reference.ace sff.fof reference.fa", where the fof specifies the location and sff files to assembly, but although the script runs, I get an error "doesn't existile /shared/BNFinal/mapping/consed/sff_dir/FPDLD6P02.sff", and the 454 reads are not brought into the assembly; it basically assembles with only the reference sequence. Someone mentioned needing to update phrap, which I will look into, but any other thoughts on this?
          Thanks,
          Liz

          Comment


          • #6
            Hi Liz,

            Originally posted by mjleaks View Post
            Has anyone assembled 454 data with consed package version 19? I'm having some issues with reading of the .sff files and wondering if anyone has completed an assembly of 454 data (not using Roche software produced .ace files). I'm using "add454Reads.perl reference.ace sff.fof reference.fa", where the fof specifies the location and sff files to assembly, but although the script runs, I get an error "doesn't existile /shared/BNFinal/mapping/consed/sff_dir/FPDLD6P02.sff", and the 454 reads are not brought into the assembly; it basically assembles with only the reference sequence. Someone mentioned needing to update phrap, which I will look into, but any other thoughts on this?
            Thanks,
            Liz
            Well. it seems that there is no /shared/BNFinal/mapping/consed/sff_dir/FPDLD6P02.sff .. have you checked the location of your SFF file(s)?

            You should update to the current version of phrap, as cross_macch is updated as well. Phrap is not involved in the task of aligning 454 reads against your refseq; cross_match is used for that.

            cheers,
            Sven

            Comment


            • #7
              hi Sven. Thanks for the post. I checked that a few times to make sure I'm not going crazy and yes the sff file is where I specified in the fof. Here are the steps I'm following. Any help much appreocated:

              1.Ran gsMapper (through UI) using the option to create a Complete consed folder

              2.Deleted the .consedrc file that Newbler created in edit_dir (per v19 instructions)

              3.Deleted the phd.ball link in edit_dir (per v19 instructions)

              4.Checked that the current version of sff2scf is the one to be used. Type "sff2scf -v". It gives "080714"

              5.Created an .ace file from appropriate fasta format reference sequence: fasta2Ace.perl reference.fa

              6.Created a sff.fof containing the name of the appropriate sff files - used a single .sff file. The sff.fof therefore contains only the name of the .sff file “ FMAAUWB12.sff “; no path etc.. The sff.fof file is - located in edit_dir and from here the FMAAUWB12.sff file is in ../sff_dir

              7.Add reads from edit_dir directory run: add454Reads.perl reference.ace sff.fof reference.fa

              8.Get:
              doesn't existile FMAAUWB12.sff
              0.0 minutes to until done with alignments
              now using alignments to add reads to ace file
              executing: /usr/local/genome/bin/consed -ace reference.ace -addReads alignmentFiles090603_134426.fof -chem 454
              -addReads will be run.
              no ~/.consedrc file so no user resources will be used--that's ok
              no ./.consedrc file so no project-specific resources--that's ok
              couldn't open readOrder.txt--that's ok
              50% done. 1 reads read so far...
              Now setting quality values
              opening ../phdball_dir/phd.ball.1
              read phd files in ../phdball_dir/phd.ball.1 found: 1 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 2 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 3 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 4 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 5 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 6 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 7 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 8 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 9 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 1000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 2000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 3000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 4000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 5000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 6000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 7000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 8000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 9000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 10,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 20,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 30,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 40,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 50,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 60,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 70,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 80,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 90,000 totals: used: 0 need: 1
              read phd files in ../phdball_dir/phd.ball.1 found: 100,000 totals: used: 0 need: 1
              Number of phd blocks used from ../phdball_dir/phd.ball.1: 0
              exception thrown: RatReninRegion has no phd file

              ace file: RatReninRegion.ace
              Version 19.0 (090206)
              RatReninRegion has no phd file

              Version 19.0 (090206)
              ace file: RatReninRegion.ace
              Number of individual phd files read: 0
              Total reads in assembly: 1
              Finished setting quality values in 3 seconds
              total errors on consed startup: 1
              now saving assembly... 3
              writing ./RatReninRegion.ace.1
              See new ace file RatReninRegion.ace.1
              done 0
              0.0 minutes cross_match and fasta time
              0.1 minutes consed time
              0.1 minutes total time

              Again, any assistance much appreciated,
              Liz

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X