Seqanswers Leaderboard Ad

**mastal** · 02-25-2014, 08:38 AM

How about Galaxy?

There are also various packages aimed at biologists which are not free, open source, like Ingenuity (Pathway Analysis, Variant Analysis), Genious, Sequencher, Partek, CLCBio.

**bastianwur** · 02-25-2014, 09:05 AM

No money to spend, I'm afraid.
I know about Ingenuity, but AFAIK the...er...it targets roughly the same type of work as PathwayTools.
We have a running Galaxy server here, and *somewhere* in the next department (building is shared between 2, our department previously belonged to the other) is a CLC bio license.

I haven't used either yet.
Before I registered here I clicked through the available galaxy tools, and I'm not sure if there's a good way to make a longer workflow out of them, which will produce something valuable. (not meant as trashing Galaxy; the tools are useful, but can I spend weeks working with them, and mainly/only with them? That's not really the purpose of the Galaxy project, right?)

**mgogol** · 02-25-2014, 10:23 AM

You could do a lot with Galaxy + public data sets.

**bastianwur** · 03-02-2014, 09:54 AM

I believe that, just would need some idea for the direction

.
Guess I'll check the Galaxy website, to see if they have a list of publications where they're mentioned. At least somewhere there should be a longer workflow.

**gringer** · 03-02-2014, 06:22 PM

For six weeks? That's a long time to spend doing bioinformatics for a single thing, particularly if you're using toy datasets that only take a few minutes for different processes to complete. You're going to need to do multiple projects, or get some wet-lab work done in order to fill that time up.

**bastianwur** · 03-03-2014, 12:59 AM

Doesn't have to be small datasets.
We have some servers available, so assigning computing power to the students is not *that* much of a problem, but making them able to use it is one.

Multiple projects is a possibility, but it should at the end be one coherent study project.
Since we're already doing stuff in Pathway tools (most likely), we'll let them curate 2 genomes and compare them. Not great either, but uses up more time. Not all of it though, which is the problem.

Wet lab can't be done. We don't have the capacities for that. We don't have many wet lab people (group isn't that old, still getting built up), and they also get some students from that course.

If someone wants to tell me that this isn't a great setup for a course...yeah...I know, but can't do anything about it. Not my idea, not my choice :/.

**gringer** · 03-03-2014, 02:41 PM

Doesn't have to be small datasets.
We have some servers available, so assigning computing power to the students is not *that* much of a problem, but making them able to use it is one.

The reason for small datasets is not the computing power issue, it's the problem of instant gratification. It is much more informative / educational if you can do multiple tasks within the course of one lesson. Waiting for 3 hours while your Bowtie mapping of one sample is carried out will lead quickly to boredom and lack of interest.

**bastianwur** · 03-04-2014, 01:25 AM

True, true.
Some of our inhouse samples are not *that* big though (MiSeq data), so mapping time doesn't have to be a problem.

My current plan (since it's getting urgent) is to
a) let them QC some raw data, with specific questions ("luckily" I have data with different sort of problems)
b) let them do an assembly. I already have a more complicated SOP, time not that much of a problem
c) differential expression analysis via cuffdiff (that should be doable as well, not sure if I can find some fitting data)
d) pathway tools.

Still probably not enough to consume all the time :/.

**gringer** · 03-04-2014, 02:36 AM

FWIW, you can stretch out the assembly part quite a lot:

try different error correction methods
different assemblers
different kmer values (for de-bruijn graph assemblers)
different post-assembly scaffolding
mapping reads back to the assembly
comparing the assembly with another similar reference genome
annotating
finding ORFs/transcripts
finding likely protein sequences, ....

**bastianwur** · 03-04-2014, 02:56 AM

Just had some crisis conference with the other guy, who's also supposed to do that, and we got into the same direction.

Problem for me personally is that it feels so redundant. I have the SOP for assembly, and we have an inhouse pipeline for annotation, so doing anything in there is useless...but well...it'll fill up time.

But okay, we'll do that.
We'll give them different assemblers, and let them compare the output (via e.g. Mauve and BRIG). Scaffolding + gapfilling would potentially be included in hat. Comparison to reference genomes as well.
Hadn't thought about different orf prediction programs, but can do as well.

Problem is just the mass comparisons...which they likely can't do...at least not when we get them. Gonna try to shove "Python for Biologists" in before that (just a few chapters), maybe they'll learn something out of it.

What do you mean error correction methods? (only thing which I don't really know about)

EDIT: Since they're probably not able to really use Linux, this means another day I'll have to use to set up one of the servers with ALL the programs. Hell lot of fun ^^.

**gringer** · 03-04-2014, 03:18 AM

Originally posted by bastianwur View Post

What do you mean error correction methods? (only thing which I don't really know about)

Error correction prior to assembly tends to improve the assembly. The more recent assemblers have correction steps built into the standard process, but you can usually split that bit out to only do correction. Two assemblers that I can think of that are like this are SGA (a *very* manual process) and SPAdes. There's also Quake (only error correction), and probably many more.

**SNPsaurus** · 03-04-2014, 09:24 AM

Sorry if this is a thread hijack, but I have a related question. I teach an undergrad course in genomics and as part of it they take a cheek swab and get 2-5M reads back. What are some fun, quick things that can be done with skim sequencing of 20 people? I'm asking here because some answers could be done in bastianwur's group with the existing data. The students are also not very computer savvy.

I'm planning to have them grab 100 reads or so and NCBI blast to check into any dominant microbial species in their mouth, align to mitochondrial genome and identify heteroplasmy, align to human and calculate Tajima's D between themselves, submit their alignments to the Ensembl variant effect predictor, maybe try to find some "heritage" SNPs that puts them on a map. Any other ideas? Ideally a web-based interface and a mix of immediate results (blast) with steps that may take a while to finish.

**gringer** · 03-04-2014, 12:18 PM

For human DNA, 2-5M reads should be plenty to make a pseudo-SNPchip from NGS reads at sub-1X coverage. Map to the genome, convert to a VCF file to summarise SNPs, and impute other SNPs from a public data source (e.g. 1000genomes) -- it would probably make sense to do all that behind the scenes given a lack of savviness. After that you can do things like calculating class-wide allele frequencies, haplotype analysis, and maybe even ancestry estimation.

**TiborNagy** · 03-05-2014, 05:11 AM

Avi Ma'ayan's systems biology course contains a lots of interesting homeworks and practices using web based tools.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Education problem: Clicktools for non-bioinformaticians

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News