Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complete assemblies with raw data

    I'm looking for a few public data sets of genomic and transcriptomic assemblies (preferably complete) where the source reads for the assemblies are available for download. The more the better.

    Been trying to navigate my way through ncbi and ebi's websites with little success. It's always either or where I look.

    Would appreciate any nudge in the correct direction!
    Thanks

  • #2
    Your best bet is to find the papers you like and then backtrack and find the datasets at SRA using the SRA/GEO accession numbers.

    Comment


    • #3
      Was afraid of that, never hurt anyone to comb for quality papers though. Appreciate the answer.

      Having said that, from a general point of view, wouldn't it make sense to link these in the databases?

      Comment


      • #4
        Look at the first three datasets here: http://www.ncbi.nlm.nih.gov/sra/?term=platinum

        These are the "platinum" genomes that illumina made available for coriell samples.

        Are you looking for a specific genome otherwise a query like this brings up many datasets: http://www.ncbi.nlm.nih.gov/sra/?ter...ptome+assembly

        Comment


        • #5
          Thanks. While those read files look solid I was hoping to find such files along with de-novo assemblies made from them (am I not seeing them?)

          Why? Looking to assess the effects of raw data quality and characteristics to assembly results. Reproducing and comparing assemblies given different preprocessing and assembly methods to assess the overall quality and differences. While that can be done without looking at previous assemblies I'd find it more reassuring to do so, especially since they often contain manual gap filling etc.

          Comment


          • #6
            It's much simpler to study these things in the context of lower organisms, such as bacteria. Or, for the more aggressive... unicellular haploid eukaryotes. Then diploids such as small plants and animals.

            You can get a lot of raw data at JGI's mycocosm, phytozome, and other places on the website. Unfortunately we generally don't study animals, but there should be a lot of raw C.elegans and drosophila data floating around.

            To clarify, studying the effects of data quality and so forth on assembly is easiest in the context of low-repeat haploids, which means bacteria. You can also do it for low-het-rate diploids. The smaller the genome, the better.
            Last edited by Brian Bushnell; 11-14-2015, 10:57 PM.

            Comment


            • #7
              Thanks, and right, fungal and bacterial haploids would be more than enough.

              Not sure how to navigate JGI's website, returning 404's when I try to access the data for e.g. Amaranthus hypochondriacus.

              Afraid that for the model organisms, any assembly made would have benefited from earlier ones and I'd prefer not retracing the complexity involved in mapping assemblies. On the other hand those reads would suit well for treatment without comparisons to previous (direct) assemblies.

              Comment


              • #8
                It is probably going to be difficult to find both the raw data and the assemblies in public databases. Some people may submit both but most probably only submit the raw data since that is all the journals require.

                Another option could be to find the raw data/published papers that go with it and then ask the authors directly if they can share the assembly, if you can't find it in a public resource.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Innovations in Spatial Biology
                  by seqadmin


                  Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                  3D Genomics
                  While spatial biology often involves studying proteins and RNAs in their...
                  Yesterday, 07:30 PM
                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-30-2024, 01:35 PM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Working...
                X