Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    Sample short read data set?

    Anyone have a sample data set that they'd be willing to share? I'll host!
  • apfejes
    Senior Member
    • Feb 2008
    • 236

    #2
    A sample of what type of data?
    The more you know, the more you know you don't know. —Aristotle

    Comment

    • ECO
      --Site Admin--
      • Oct 2007
      • 1360

      #3
      Ideally for my application, short read genomic data (ie, solexa/abi). I'm playing around with some of the software packages listed in this forum and need some data!

      Comment

      • apfejes
        Senior Member
        • Feb 2008
        • 236

        #4
        Sorry, I should have been clearer in my message: what experiment type? ChIP-Seq, Genome shotgun, Transcriptome shotgun, etc. Getting generic data is easy - getting results for a particular type of data may not be.
        The more you know, the more you know you don't know. —Aristotle

        Comment

        • ECO
          --Site Admin--
          • Oct 2007
          • 1360

          #5
          Genome shotgun, or better yet amplicon enriched genomic reads. ie. not transcriptome data nor ChIP.

          Actually now that I think about it it would be nice to have sample data sets for all applications, but I only have 6TB of bandwith per month!

          Comment

          • apfejes
            Senior Member
            • Feb 2008
            • 236

            #6
            Hrm.. I'm not sure we have many (good quality) genome shotgun data sets kicking around that I'd be able to get permission to release. At least, I personally don't have any, yet. I'll poke around, though, and maybe I can find something for you. If you don't mind poor quality, just for playing around, that might be feasible.
            The more you know, the more you know you don't know. —Aristotle

            Comment

            • ECO
              --Site Admin--
              • Oct 2007
              • 1360

              #7
              That would be great. I'm not concerned with base quality or recalling, etc. I'm working on analysis platforms...and it's no fun to just randomly generate short reads.

              Let me know!

              Comment

              • apfejes
                Senior Member
                • Feb 2008
                • 236

                #8
                Hi ECO,

                I brought this up yesterday, and was told that there is no point in making any data available. Supposedly there are open repositories (NCBI?) collecting and making this data available. I spent the last 15 minutes looking for said repositories, but couldn't find anything remotely like I expected.

                On the other hand, doing a Google search for ".seq.txt", which is the common file name of sequences produced using the Illumina pipeline, I came up with a set of Histone ChIP experiments that the BC Genome Science Centre has made available anyhow:



                I did confirm that they were intentionally released, so I'm sure there's no problem with using them. On the other hand, I don't know a lot about these particular sets of data. I do know they're not new: they were analysed a while ago, and I've seen several presentations on this information over the last year or so.

                The files themselves are post-base calling, but not yet aligned. They may be good for testing aligners or whole pipelines. (Then again, they're old, they may not be a good test for the latest Illumina software - Interested parties can try that themselves.)

                I suspect the wig files (where available) were created with Findpeaks 2.1.x, though I haven't verified this.

                Cheers,

                Anthony
                The more you know, the more you know you don't know. —Aristotle

                Comment

                • sci_guy
                  Member
                  • Jan 2008
                  • 83

                  #9
                  Data DVD

                  Applied Biosystems have a sample data DVD of S. suis reads together with a few compiled executables (for UNIX), some Perl code and a workflow document.
                  Data DVD

                  Comment

                  • ECO
                    --Site Admin--
                    • Oct 2007
                    • 1360

                    #10
                    Anthony & sci_guy, thanks much! I'll take a look!

                    I'll probably be starting a thread soon about the best OSS solutions for putting together one's own analysis platform. Not really to support an instrument, but to analyze a small number of runs for a specific project.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    48 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    107 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    125 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...