SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
raw sequence short read data sweet_dna_girl Bioinformatics 4 02-15-2012 11:42 PM
Short read benchmark data GerryB Bioinformatics 5 11-27-2010 03:07 PM
Reference-Free Validation of Short Read Data krobison Literature Watch 1 09-23-2010 05:40 PM
Paired end Short read data SS1234 Bioinformatics 6 06-09-2010 02:16 PM
deCODEme opens sample data set, check it out! ECO Personalized Genomics 0 01-21-2008 09:58 AM

Reply
 
Thread Tools
Old 02-25-2008, 07:06 PM   #1
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default Sample short read data set?

Anyone have a sample data set that they'd be willing to share? I'll host!
ECO is offline   Reply With Quote
Old 02-27-2008, 12:15 AM   #2
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

A sample of what type of data?
apfejes is offline   Reply With Quote
Old 02-27-2008, 08:32 AM   #3
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Ideally for my application, short read genomic data (ie, solexa/abi). I'm playing around with some of the software packages listed in this forum and need some data!
ECO is offline   Reply With Quote
Old 02-27-2008, 08:47 AM   #4
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Sorry, I should have been clearer in my message: what experiment type? ChIP-Seq, Genome shotgun, Transcriptome shotgun, etc. Getting generic data is easy - getting results for a particular type of data may not be.
apfejes is offline   Reply With Quote
Old 02-27-2008, 09:17 AM   #5
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Genome shotgun, or better yet amplicon enriched genomic reads. ie. not transcriptome data nor ChIP.

Actually now that I think about it it would be nice to have sample data sets for all applications, but I only have 6TB of bandwith per month!
ECO is offline   Reply With Quote
Old 02-27-2008, 12:25 PM   #6
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hrm.. I'm not sure we have many (good quality) genome shotgun data sets kicking around that I'd be able to get permission to release. At least, I personally don't have any, yet. I'll poke around, though, and maybe I can find something for you. If you don't mind poor quality, just for playing around, that might be feasible.
apfejes is offline   Reply With Quote
Old 02-27-2008, 12:45 PM   #7
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

That would be great. I'm not concerned with base quality or recalling, etc. I'm working on analysis platforms...and it's no fun to just randomly generate short reads.

Let me know!
ECO is offline   Reply With Quote
Old 02-28-2008, 12:48 PM   #8
apfejes
Senior Member
 
Location: Oakland, California

Join Date: Feb 2008
Posts: 236
Default

Hi ECO,

I brought this up yesterday, and was told that there is no point in making any data available. Supposedly there are open repositories (NCBI?) collecting and making this data available. I spent the last 15 minutes looking for said repositories, but couldn't find anything remotely like I expected.

On the other hand, doing a Google search for ".seq.txt", which is the common file name of sequences produced using the Illumina pipeline, I came up with a set of Histone ChIP experiments that the BC Genome Science Centre has made available anyhow:

http://www.bcgsc.ca/downloads/histone

I did confirm that they were intentionally released, so I'm sure there's no problem with using them. On the other hand, I don't know a lot about these particular sets of data. I do know they're not new: they were analysed a while ago, and I've seen several presentations on this information over the last year or so.

The files themselves are post-base calling, but not yet aligned. They may be good for testing aligners or whole pipelines. (Then again, they're old, they may not be a good test for the latest Illumina software - Interested parties can try that themselves.)

I suspect the wig files (where available) were created with Findpeaks 2.1.x, though I haven't verified this.

Cheers,

Anthony
__________________
The more you know, the more you know you don't know. —Aristotle
apfejes is offline   Reply With Quote
Old 02-28-2008, 04:52 PM   #9
sci_guy
Member
 
Location: Sydney, Australia

Join Date: Jan 2008
Posts: 83
Default Data DVD

Applied Biosystems have a sample data DVD of S. suis reads together with a few compiled executables (for UNIX), some Perl code and a workflow document.
Data DVD
sci_guy is offline   Reply With Quote
Old 02-28-2008, 07:53 PM   #10
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Anthony & sci_guy, thanks much! I'll take a look!

I'll probably be starting a thread soon about the best OSS solutions for putting together one's own analysis platform. Not really to support an instrument, but to analyze a small number of runs for a specific project.
ECO is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO