SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for Freelance Bioinformaticians Genohub Bioinformatics 0 10-15-2013 08:54 AM
Hello Bioinformaticians!! keysoon Introductions 0 11-15-2012 05:08 AM
Hello Fello Bioinformaticians! A1_UltiMA Introductions 0 08-27-2010 07:26 PM
2 Bioinformaticians wanted Inti Academic/Non-Profit Jobs 0 02-04-2009 07:29 AM
Szevasztok bioinformaticians! Blaize Introductions 0 09-16-2008 12:26 AM

Reply
 
Thread Tools
Old 02-25-2014, 08:28 AM   #1
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default Education problem: Clicktools for non-bioinformaticians

Hey @all,

first post here (hope that I can find the time to give some answers in other threads), and I come with a problem, which doesn't really fit anyhwere.

My problem is that I need to find some tools, which a) can be used by people who can't program and b) make a project out of this, which will keep someone busy for 6 weeks.

Why that? My university has a course, in which some of the bio* classes are merged. There might be some bioinformaticians in there, but most likely not.
The content of the course is...to work with a PhD student for the time of the course (6 weeks) on a specific problem, to get an insight into the field.
The students get assigned to the different projects, which the participating departments submitted. If someone doesn't get the project he/she wanted to, then he/she is randomly distributed.

That's in general not a bad idea.
Besides that in this case it is.
We're a systemsbiology department, our professor participates in that "course", so me and another PhD student now have to make up a project for 4 people, who most likely now Facebook and Google, and nothing else.
Last year we put them in front of PathwayTools, and let them curate some genomes.
That sort of works, but well...not great, and it's not fun, doesn't keep someone efficiently busy for 6 weeks, and we shouldn't do that again.

So I wonder if someone here might have some idea, what would be an efficient and interesting task for someone to do 6 weeks long (interesting = not counting the GC content of a genome by hand, or similar suggestions ^^).
We have different (meta)*omics datasets, but for my life, I don't know what I could let them do with it, given that they don't have any abilities to mass process them.
I still have 2.5 weeks time to think about something, but I'm a bit stuck.
The students probably have to waste one week to get into Linux, and 2 weeks of PathwayTools should again be possible, but I'd like to have something else before, or after that.
Obvious choice would be a genome assembly to get it into PathwayTools, but I think that directly fails again at the missing computer skills.

So...if anyone has an idea...it would be highly appreciated .
bastianwur is offline   Reply With Quote
Old 02-25-2014, 08:38 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

How about Galaxy?

There are also various packages aimed at biologists which are not free, open source, like Ingenuity (Pathway Analysis, Variant Analysis), Genious, Sequencher, Partek, CLCBio.
mastal is offline   Reply With Quote
Old 02-25-2014, 09:05 AM   #3
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

No money to spend, I'm afraid.
I know about Ingenuity, but AFAIK the...er...it targets roughly the same type of work as PathwayTools.
We have a running Galaxy server here, and *somewhere* in the next department (building is shared between 2, our department previously belonged to the other) is a CLC bio license.

I haven't used either yet.
Before I registered here I clicked through the available galaxy tools, and I'm not sure if there's a good way to make a longer workflow out of them, which will produce something valuable. (not meant as trashing Galaxy; the tools are useful, but can I spend weeks working with them, and mainly/only with them? That's not really the purpose of the Galaxy project, right?)

Last edited by bastianwur; 02-25-2014 at 09:10 AM.
bastianwur is offline   Reply With Quote
Old 02-25-2014, 10:23 AM   #4
mgogol
Senior Member
 
Location: Kansas City

Join Date: Mar 2008
Posts: 197
Default

You could do a lot with Galaxy + public data sets.
mgogol is offline   Reply With Quote
Old 03-02-2014, 09:54 AM   #5
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

I believe that, just would need some idea for the direction .
Guess I'll check the Galaxy website, to see if they have a list of publications where they're mentioned. At least somewhere there should be a longer workflow.
bastianwur is offline   Reply With Quote
Old 03-02-2014, 06:22 PM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

For six weeks? That's a long time to spend doing bioinformatics for a single thing, particularly if you're using toy datasets that only take a few minutes for different processes to complete. You're going to need to do multiple projects, or get some wet-lab work done in order to fill that time up.
gringer is offline   Reply With Quote
Old 03-03-2014, 12:59 AM   #7
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

Doesn't have to be small datasets.
We have some servers available, so assigning computing power to the students is not *that* much of a problem, but making them able to use it is one.

Multiple projects is a possibility, but it should at the end be one coherent study project.
Since we're already doing stuff in Pathway tools (most likely), we'll let them curate 2 genomes and compare them. Not great either, but uses up more time. Not all of it though, which is the problem.

Wet lab can't be done. We don't have the capacities for that. We don't have many wet lab people (group isn't that old, still getting built up), and they also get some students from that course.

If someone wants to tell me that this isn't a great setup for a course...yeah...I know, but can't do anything about it. Not my idea, not my choice :/.
bastianwur is offline   Reply With Quote
Old 03-03-2014, 02:41 PM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Doesn't have to be small datasets.
We have some servers available, so assigning computing power to the students is not *that* much of a problem, but making them able to use it is one.
The reason for small datasets is not the computing power issue, it's the problem of instant gratification. It is much more informative / educational if you can do multiple tasks within the course of one lesson. Waiting for 3 hours while your Bowtie mapping of one sample is carried out will lead quickly to boredom and lack of interest.
gringer is offline   Reply With Quote
Old 03-04-2014, 01:25 AM   #9
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

True, true.
Some of our inhouse samples are not *that* big though (MiSeq data), so mapping time doesn't have to be a problem.

My current plan (since it's getting urgent) is to
a) let them QC some raw data, with specific questions ("luckily" I have data with different sort of problems)
b) let them do an assembly. I already have a more complicated SOP, time not that much of a problem
c) differential expression analysis via cuffdiff (that should be doable as well, not sure if I can find some fitting data)
d) pathway tools.

Still probably not enough to consume all the time :/.
bastianwur is offline   Reply With Quote
Old 03-04-2014, 02:36 AM   #10
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

FWIW, you can stretch out the assembly part quite a lot:
  • try different error correction methods
  • different assemblers
  • different kmer values (for de-bruijn graph assemblers)
  • different post-assembly scaffolding
  • mapping reads back to the assembly
  • comparing the assembly with another similar reference genome
  • annotating
  • finding ORFs/transcripts
  • finding likely protein sequences, ....
gringer is offline   Reply With Quote
Old 03-04-2014, 02:56 AM   #11
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

Just had some crisis conference with the other guy, who's also supposed to do that, and we got into the same direction.

Problem for me personally is that it feels so redundant. I have the SOP for assembly, and we have an inhouse pipeline for annotation, so doing anything in there is useless...but well...it'll fill up time.

But okay, we'll do that.
We'll give them different assemblers, and let them compare the output (via e.g. Mauve and BRIG). Scaffolding + gapfilling would potentially be included in hat. Comparison to reference genomes as well.
Hadn't thought about different orf prediction programs, but can do as well.

Problem is just the mass comparisons...which they likely can't do...at least not when we get them. Gonna try to shove "Python for Biologists" in before that (just a few chapters), maybe they'll learn something out of it.

What do you mean error correction methods? (only thing which I don't really know about)

EDIT: Since they're probably not able to really use Linux, this means another day I'll have to use to set up one of the servers with ALL the programs. Hell lot of fun ^^.
bastianwur is offline   Reply With Quote
Old 03-04-2014, 03:18 AM   #12
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
Originally Posted by bastianwur View Post
What do you mean error correction methods? (only thing which I don't really know about)
Error correction prior to assembly tends to improve the assembly. The more recent assemblers have correction steps built into the standard process, but you can usually split that bit out to only do correction. Two assemblers that I can think of that are like this are SGA (a *very* manual process) and SPAdes. There's also Quake (only error correction), and probably many more.
gringer is offline   Reply With Quote
Old 03-04-2014, 09:24 AM   #13
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 521
Default

Sorry if this is a thread hijack, but I have a related question. I teach an undergrad course in genomics and as part of it they take a cheek swab and get 2-5M reads back. What are some fun, quick things that can be done with skim sequencing of 20 people? I'm asking here because some answers could be done in bastianwur's group with the existing data. The students are also not very computer savvy.

I'm planning to have them grab 100 reads or so and NCBI blast to check into any dominant microbial species in their mouth, align to mitochondrial genome and identify heteroplasmy, align to human and calculate Tajima's D between themselves, submit their alignments to the Ensembl variant effect predictor, maybe try to find some "heritage" SNPs that puts them on a map. Any other ideas? Ideally a web-based interface and a mix of immediate results (blast) with steps that may take a while to finish.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 03-04-2014, 12:18 PM   #14
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

For human DNA, 2-5M reads should be plenty to make a pseudo-SNPchip from NGS reads at sub-1X coverage. Map to the genome, convert to a VCF file to summarise SNPs, and impute other SNPs from a public data source (e.g. 1000genomes) -- it would probably make sense to do all that behind the scenes given a lack of savviness. After that you can do things like calculating class-wide allele frequencies, haplotype analysis, and maybe even ancestry estimation.
gringer is offline   Reply With Quote
Old 03-05-2014, 05:11 AM   #15
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

Avi Ma'ayan's systems biology course contains a lots of interesting homeworks and practices using web based tools.
TiborNagy is offline   Reply With Quote
Old 03-27-2014, 03:59 PM   #16
GenePool
Registered Vendor
 
Location: San Francisco, CA

Join Date: Mar 2014
Posts: 18
Default

Hi, you might want to check out GenePool as a resource for supporting genomics education. It is built with a very intuitive interface that makes working with genomics fairly approachable for students and folks with little to no command-line experience. Also, the free data available in the growing genomics reference library including various SRA sequence variation projects, TCGA RNA-Seq cohorts, and GTEx RNA-Seq data that make for quite compelling projects to use for teaching. We've heard of many courses that are generating sequence variation data for the students in the class room, and GenePool is also an excellent place to work with that data. It picks up data from BAMs, VCFs, and tab-delimited text files (supporting RNA-Seq, Genome variation, exome varation, more targeted panels, as well as other assays). Please feel free to shoot me a private message if you're interested in discussing how we could work together to power up genomics and informatics education.

If you're interested in learning more, please check out GenePool's growing genomics reference library that is freely available to the community: http://www.stationxinc.com/reference-library

Cheers!

------------------------------
GenePool is making genomics data management, analysis, and sharing easier!
Products @ www.stationxinc.com

Last edited by GenePool; 03-27-2014 at 04:25 PM. Reason: grammar change
GenePool is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO