Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple read mapping with the same read set with newbler

    Hi everyone,

    I'm using newbler to map around 2,25 millions reads on around 20 000 different references and as it takes a lot of computing time I want to shorten the computing steps to make for each mapping. I would like to shorten the indexing step. As I'm using the same set of reads for every mapping I was wondering if it is possible to save the indexing made at the first mapping and reuse this information for the following steps instead of recomputing it each time. I already save some time by reusing the trimmed reads from the first mapping but it's still taking a lot of time. If there are other ways to make newbler faster I'd be glad to here about it!

    Vince

  • #2
    This may be possible. If you set up newbler not using runMapping but with
    newMapping
    addRun
    setRef
    runProject

    then you may be able to copy the relevant files afterwards, run
    removeRun
    addRun
    setRef (again) for the next reference. But, I've never tried this. See http://contig.wordpress.com/2010/06/...novo-assembly/ for an explanation tailored towards runAssembly (I don't have described runMapping on my blog...)

    Alternatively, if setRef is not possible more than one time, you may need to edit some xml file with the new reference...

    Comment


    • #3
      I tried some things using newMapping, but none seemed to work.

      removeRun doesn't work with the reference sequence, but reusing setRef did the job. But still the read indexing step is made and takes a lot a time.

      I tried to trim my read in the first assembly and use those trimmed reads with the -notrim option and still the indexing step was still there.

      I tried to use the previously made 454TrimStatus.txt and place it in the folder of my new assembly, but it was overwriten. I did this because this file is the only one (excluding 454NewblerProgress.txt) to be edited during the indexing step (including the hidden files).

      I wonder if its possible to tell newbler to use a previously made 454TrimStatus.txt instead of making a new one for each mapping. As I'm always using the same set of reads the indexing should always give the same result...

      Comment


      • #4
        I know it may look strange to do all those mappings using a same set of reads and thousands of different references. But what I really want to do is to correct PACBIO sequencing reads using newbler. I already used PacbiotoCA and it did a great job, but newbler has always gived me better assemblies than any assemblers I used. So I want to map my illumina reads on my PACBIO reads to correct them and see if those corrected read will give me better assemblies than the ones from pacbiotoCA. Because I need complete genome sequences and I want to close as many gaps as possible in my assemblies before sequencing the gaps the hard way using sanger reads.

        Comment


        • #5
          I like your idea... :-)

          What about this strategy:
          - map all Illumina reads to all PacBio reads using blasr from PacBio (https://github.com/PacificBiosciences/blasr) in one go; make sure to get at least the top X hits, where X is something like 2x your coverage in PacBio reads (each sequence from your reference will on average be represented as many times in the PacBio reads as your coverage of them)
          - make separate files with the hits for each PacBio read (parsing the sam file)
          - do a separate newbler mapping for each PacBio read with the Illumina reads that mapped to it using blasr

          Let me know if this works, it actually may!

          Comment


          • #6
            thanks for the help! In the end I parralelized the mapping by splitting them into 500 batch and I ran them on a super-computer. It still took around an entire day of computing but it worked and I was able to keep a lot more data than with PacBiotoCA. The reads are a little shorter because sometimes the reads were splitted because of the low coverage on some reads but overall I kept more bases and I got better draft assemblies using those corrected reads than the ones from PacBiotoCA!

            Comment


            • #7
              Looks like I need to give this a try...

              Now that you have your draft assembly, if you have something like 5-6x coverage in uncorrected PacBio reads, you could polish the bases in your contigs/scaffolds using Quiver...

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X