Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nextseq - too much data for base space and any de novo assemblers

    Dear all,
    this is my first post and generally will show you how fresh I am
    Till now I have prepared few WES sequencing using NextSeq machine and then fastq/ Isaac Enrichment workflow. But this is not my problem right now.

    Last time I prepared an additional group of 4 samples (bacterial genomes, estimated size near 4M bp). I used Nextera XT kit for libraries preparation, and do some calculations to add only 1% of all libraries for human WES sequencing in highoutput mode.
    And there was a mistake (happens) and my bacterial genomes have much more data than I thought and design. Every one after fastq gives me near 5Gb of data. I imported them do BaseSpace (cloud), (cause I have BSonsite) because I wanted to use free apps there for de novo assembly Like (Velvet or SPades) unfortunatelly these apps in BS are not able to use too much data (for Velvet coverage below 100, and for SPades below 1G of bases).
    Any suggestion?
    I performed analysis used another app (from DNAStar) and two of my bacterial DNA are similar to Bacillus cereus.
    In that case I wish to perform resequencing analysis too. What can you suggest to me?
    Regards,
    L.

  • #2
    You can subsample or normalize to reduce the data volume. Error correction also decreases data volume in kmer-space, and NextSeq has a high error rate, so that may be worthwhile. Velvet, Spades, or whatever will generally run much faster and use less memory once you have fewer reads with fewer errors. Those are free whether you use them through BaseSpace or not.

    As for analysis... what kind of analysis are you trying to do?

    Comment


    • #3
      @leiga: Are you comfortable working outside GUI on command line since there is going to be limitations on what you can do inside basespace at some point in time.

      Is your "basespace on site" install running on limited hardware? I am a bit surprised you are having trouble with 5Gb of data since the standard configuration included in this document looks decent: http://res.illumina.com/documents/pr...ace-onsite.pdf

      Comment


      • #4
        @GenoMax
        Yes, you are right, I think there is no limitation for data on "on site" platform from Illumina, however in the same time there are not too many applications there. That's why I upload data to online version of basespace. There are much more different apps, but with mentioned before limitation.

        Bacterial genome - sequencing de novo and resequencing.
        I think I do some steps forward. Now I am using newbie friendly software (UGENE) .
        I' ve choosen bowtie2 alignment engine, choose (probably similar to sequenced bacteria) refseq and finally yes, I have sequence of "my bacteria", however...
        1. I missed somewhere data about possible plasmids inclusions (when I performed de novo sequencing I found that some of parts of examined genome have been very similar to other plasmids which exist in Bacillus sp.). Do you know how can I save/retrive this kind of information?
        2. Which engine for automated annotation process you can recommend? I have list of cds in refseq which I choose. But I am not sure how can I use it in that case. Now I am waiting for response from BASys system for annotation of bacterial genomes.
        Last edited by leiga; 01-26-2015, 01:50 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        46 views
        0 likes
        Last Post seqadmin  
        Working...
        X