Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complete assemblies with raw data

    I'm looking for a few public data sets of genomic and transcriptomic assemblies (preferably complete) where the source reads for the assemblies are available for download. The more the better.

    Been trying to navigate my way through ncbi and ebi's websites with little success. It's always either or where I look.

    Would appreciate any nudge in the correct direction!
    Thanks

  • #2
    Your best bet is to find the papers you like and then backtrack and find the datasets at SRA using the SRA/GEO accession numbers.

    Comment


    • #3
      Was afraid of that, never hurt anyone to comb for quality papers though. Appreciate the answer.

      Having said that, from a general point of view, wouldn't it make sense to link these in the databases?

      Comment


      • #4
        Look at the first three datasets here: http://www.ncbi.nlm.nih.gov/sra/?term=platinum

        These are the "platinum" genomes that illumina made available for coriell samples.

        Are you looking for a specific genome otherwise a query like this brings up many datasets: http://www.ncbi.nlm.nih.gov/sra/?ter...ptome+assembly

        Comment


        • #5
          Thanks. While those read files look solid I was hoping to find such files along with de-novo assemblies made from them (am I not seeing them?)

          Why? Looking to assess the effects of raw data quality and characteristics to assembly results. Reproducing and comparing assemblies given different preprocessing and assembly methods to assess the overall quality and differences. While that can be done without looking at previous assemblies I'd find it more reassuring to do so, especially since they often contain manual gap filling etc.

          Comment


          • #6
            It's much simpler to study these things in the context of lower organisms, such as bacteria. Or, for the more aggressive... unicellular haploid eukaryotes. Then diploids such as small plants and animals.

            You can get a lot of raw data at JGI's mycocosm, phytozome, and other places on the website. Unfortunately we generally don't study animals, but there should be a lot of raw C.elegans and drosophila data floating around.

            To clarify, studying the effects of data quality and so forth on assembly is easiest in the context of low-repeat haploids, which means bacteria. You can also do it for low-het-rate diploids. The smaller the genome, the better.
            Last edited by Brian Bushnell; 11-14-2015, 10:57 PM.

            Comment


            • #7
              Thanks, and right, fungal and bacterial haploids would be more than enough.

              Not sure how to navigate JGI's website, returning 404's when I try to access the data for e.g. Amaranthus hypochondriacus.

              Afraid that for the model organisms, any assembly made would have benefited from earlier ones and I'd prefer not retracing the complexity involved in mapping assemblies. On the other hand those reads would suit well for treatment without comparisons to previous (direct) assemblies.

              Comment


              • #8
                It is probably going to be difficult to find both the raw data and the assemblies in public databases. Some people may submit both but most probably only submit the raw data since that is all the journals require.

                Another option could be to find the raw data/published papers that go with it and then ask the authors directly if they can share the assembly, if you can't find it in a public resource.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Working...
                X