Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Education problem: Clicktools for non-bioinformaticians

    Hey @all,

    first post here (hope that I can find the time to give some answers in other threads), and I come with a problem, which doesn't really fit anyhwere.

    My problem is that I need to find some tools, which a) can be used by people who can't program and b) make a project out of this, which will keep someone busy for 6 weeks.

    Why that? My university has a course, in which some of the bio* classes are merged. There might be some bioinformaticians in there, but most likely not.
    The content of the course is...to work with a PhD student for the time of the course (6 weeks) on a specific problem, to get an insight into the field.
    The students get assigned to the different projects, which the participating departments submitted. If someone doesn't get the project he/she wanted to, then he/she is randomly distributed.

    That's in general not a bad idea.
    Besides that in this case it is.
    We're a systemsbiology department, our professor participates in that "course", so me and another PhD student now have to make up a project for 4 people, who most likely now Facebook and Google, and nothing else.
    Last year we put them in front of PathwayTools, and let them curate some genomes.
    That sort of works, but well...not great, and it's not fun, doesn't keep someone efficiently busy for 6 weeks, and we shouldn't do that again.

    So I wonder if someone here might have some idea, what would be an efficient and interesting task for someone to do 6 weeks long (interesting = not counting the GC content of a genome by hand, or similar suggestions ^^).
    We have different (meta)*omics datasets, but for my life, I don't know what I could let them do with it, given that they don't have any abilities to mass process them.
    I still have 2.5 weeks time to think about something, but I'm a bit stuck.
    The students probably have to waste one week to get into Linux, and 2 weeks of PathwayTools should again be possible, but I'd like to have something else before, or after that.
    Obvious choice would be a genome assembly to get it into PathwayTools, but I think that directly fails again at the missing computer skills.

    So...if anyone has an idea...it would be highly appreciated .

  • #2
    How about Galaxy?

    There are also various packages aimed at biologists which are not free, open source, like Ingenuity (Pathway Analysis, Variant Analysis), Genious, Sequencher, Partek, CLCBio.

    Comment


    • #3
      No money to spend, I'm afraid.
      I know about Ingenuity, but AFAIK the...er...it targets roughly the same type of work as PathwayTools.
      We have a running Galaxy server here, and *somewhere* in the next department (building is shared between 2, our department previously belonged to the other) is a CLC bio license.

      I haven't used either yet.
      Before I registered here I clicked through the available galaxy tools, and I'm not sure if there's a good way to make a longer workflow out of them, which will produce something valuable. (not meant as trashing Galaxy; the tools are useful, but can I spend weeks working with them, and mainly/only with them? That's not really the purpose of the Galaxy project, right?)
      Last edited by bastianwur; 02-25-2014, 09:10 AM.

      Comment


      • #4
        You could do a lot with Galaxy + public data sets.

        Comment


        • #5
          I believe that, just would need some idea for the direction .
          Guess I'll check the Galaxy website, to see if they have a list of publications where they're mentioned. At least somewhere there should be a longer workflow.

          Comment


          • #6
            For six weeks? That's a long time to spend doing bioinformatics for a single thing, particularly if you're using toy datasets that only take a few minutes for different processes to complete. You're going to need to do multiple projects, or get some wet-lab work done in order to fill that time up.

            Comment


            • #7
              Doesn't have to be small datasets.
              We have some servers available, so assigning computing power to the students is not *that* much of a problem, but making them able to use it is one.

              Multiple projects is a possibility, but it should at the end be one coherent study project.
              Since we're already doing stuff in Pathway tools (most likely), we'll let them curate 2 genomes and compare them. Not great either, but uses up more time. Not all of it though, which is the problem.

              Wet lab can't be done. We don't have the capacities for that. We don't have many wet lab people (group isn't that old, still getting built up), and they also get some students from that course.

              If someone wants to tell me that this isn't a great setup for a course...yeah...I know, but can't do anything about it. Not my idea, not my choice :/.

              Comment


              • #8
                Doesn't have to be small datasets.
                We have some servers available, so assigning computing power to the students is not *that* much of a problem, but making them able to use it is one.
                The reason for small datasets is not the computing power issue, it's the problem of instant gratification. It is much more informative / educational if you can do multiple tasks within the course of one lesson. Waiting for 3 hours while your Bowtie mapping of one sample is carried out will lead quickly to boredom and lack of interest.

                Comment


                • #9
                  True, true.
                  Some of our inhouse samples are not *that* big though (MiSeq data), so mapping time doesn't have to be a problem.

                  My current plan (since it's getting urgent) is to
                  a) let them QC some raw data, with specific questions ("luckily" I have data with different sort of problems)
                  b) let them do an assembly. I already have a more complicated SOP, time not that much of a problem
                  c) differential expression analysis via cuffdiff (that should be doable as well, not sure if I can find some fitting data)
                  d) pathway tools.

                  Still probably not enough to consume all the time :/.

                  Comment


                  • #10
                    FWIW, you can stretch out the assembly part quite a lot:
                    • try different error correction methods
                    • different assemblers
                    • different kmer values (for de-bruijn graph assemblers)
                    • different post-assembly scaffolding
                    • mapping reads back to the assembly
                    • comparing the assembly with another similar reference genome
                    • annotating
                    • finding ORFs/transcripts
                    • finding likely protein sequences, ....

                    Comment


                    • #11
                      Just had some crisis conference with the other guy, who's also supposed to do that, and we got into the same direction.

                      Problem for me personally is that it feels so redundant. I have the SOP for assembly, and we have an inhouse pipeline for annotation, so doing anything in there is useless...but well...it'll fill up time.

                      But okay, we'll do that.
                      We'll give them different assemblers, and let them compare the output (via e.g. Mauve and BRIG). Scaffolding + gapfilling would potentially be included in hat. Comparison to reference genomes as well.
                      Hadn't thought about different orf prediction programs, but can do as well.

                      Problem is just the mass comparisons...which they likely can't do...at least not when we get them. Gonna try to shove "Python for Biologists" in before that (just a few chapters), maybe they'll learn something out of it.

                      What do you mean error correction methods? (only thing which I don't really know about)

                      EDIT: Since they're probably not able to really use Linux, this means another day I'll have to use to set up one of the servers with ALL the programs. Hell lot of fun ^^.

                      Comment


                      • #12
                        Originally posted by bastianwur View Post
                        What do you mean error correction methods? (only thing which I don't really know about)
                        Error correction prior to assembly tends to improve the assembly. The more recent assemblers have correction steps built into the standard process, but you can usually split that bit out to only do correction. Two assemblers that I can think of that are like this are SGA (a *very* manual process) and SPAdes. There's also Quake (only error correction), and probably many more.

                        Comment


                        • #13
                          Sorry if this is a thread hijack, but I have a related question. I teach an undergrad course in genomics and as part of it they take a cheek swab and get 2-5M reads back. What are some fun, quick things that can be done with skim sequencing of 20 people? I'm asking here because some answers could be done in bastianwur's group with the existing data. The students are also not very computer savvy.

                          I'm planning to have them grab 100 reads or so and NCBI blast to check into any dominant microbial species in their mouth, align to mitochondrial genome and identify heteroplasmy, align to human and calculate Tajima's D between themselves, submit their alignments to the Ensembl variant effect predictor, maybe try to find some "heritage" SNPs that puts them on a map. Any other ideas? Ideally a web-based interface and a mix of immediate results (blast) with steps that may take a while to finish.
                          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                          Comment


                          • #14
                            For human DNA, 2-5M reads should be plenty to make a pseudo-SNPchip from NGS reads at sub-1X coverage. Map to the genome, convert to a VCF file to summarise SNPs, and impute other SNPs from a public data source (e.g. 1000genomes) -- it would probably make sense to do all that behind the scenes given a lack of savviness. After that you can do things like calculating class-wide allele frequencies, haplotype analysis, and maybe even ancestry estimation.

                            Comment


                            • #15
                              Avi Ma'ayan's systems biology course contains a lots of interesting homeworks and practices using web based tools.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X