Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Starting out in Illumina NGS analysis

    Hi all,

    I am wondering what the best way to go about starting out in Illumina NGS analysis is. I have historically been involved in sequence analysis from Sanger sequencing but my experience in NGS is currently zero. I have come up with some thoughts/questions which I have included here. Please edit/let me know your thoughts.

    1) Choose desired applications to learn
    - Reference assembly
    - De novo assembly
    - RNA-Seq

    2) Determine the best/most cutting edge algorithm for each application
    - Can someone advise what these are?

    3) Acquire test data sets
    - Is there a resource for publically available NGS data?

    4) Analyse data
    - Is there a good resource for learnig this analysis?


    My ideas are still probably quite vague so forgive me. Any help would be SINCERELY appreciated.

    Thanks in advance,

    Gavin

  • #2
    Without wanting to sound patronising, how about starting with a scientific question you wish to answer? Sometimes the other questions become rather easier to answer when you do that.

    Comment


    • #3
      Although the answer to question 3 is yes, you can access the NCBI Short Read Archive or the EBI European Nucleotide Archive for example datasets.

      Comment


      • #4
        Lol

        Starting with a single scientific question would indeed be ideal but I don't actually have one.

        I have been tasked with 'gaining experience' in the 3 areas mentioned i.e. de novo and reference assembly and RNA-Seq.

        Initially I guess I was hoping that there may be preferred algorithms for each application. Having determined these I would try to get hold of some data sets, perform some initial exploration and then try to form an idea of a 'project' or question in each area.

        *Edit* Perhaps it would even be wise to get hold of some public data and try to re-perfom some published analysis on them?

        Comment


        • #5
          OK. It depends a bit on your organism of interest. Bacterial genomes like different tools to human genomes for example.

          But here are my tips:

          1) Alignment to reference. Check out BWA, Bowtie, SSAHA2, SAMtools, VarScan and their respective papers. Cutting edge is probably Burrows-Wheeler Transform.

          2) De novo assembly of short reads. Check out Velvet, SOAPdenovo. Cutting edge is probably de Bruijn graphs.

          3) RNA-Seq. Usually one or both of the above methods combined with a counting software. Don't do much of this but you could start with Anthony Fejes' FindPeaks.

          I guess if you wanted to set yourself a project to try learning this stuff you could start by trying to find variations in genomes from the public 1,000 genomes project.

          Comment


          • #6
            Thanks a lot - hopefully I'll manage to formulate some clear questions soon enough.

            Comment


            • #7
              For RNA-seq data, you could look at:

              tophat/cufflinks
              fastx
              R shortRead, DEGseq
              genome browsers: UCSC with bigWig/BAM, IGV
              Galaxy, BEDTools

              I think finding some public data and trying to do something with it is a good idea.

              Comment


              • #8
                We run a training course on the downstream analysis of next gen data, and the course material and example datasets (including ChIP-Seq and mRNA-Seq) are all available if you want to have a play with them.

                The course is oriented around our software, but you could try the example data in other packages as well and could hopefully pick up some useful hints from working through the exercises.

                http://www.bioinformatics.bbsrc.ac.u...g.html#seqmonk

                Comment


                • #9
                  Thanks a lot guys - this has been a real help

                  Comment


                  • #10
                    this post is very helpful for me as well!

                    Thanks.

                    Comment


                    • #11
                      @simon

                      Have you tried viewing your course SAM files in IGV?

                      I played around with SeqMonk for a while successfully but when I tried to view them in IGV to get a feel for another browser I have no luck.

                      Comment


                      • #12
                        Originally posted by gavin.oliver View Post
                        Have you tried viewing your course SAM files in IGV?

                        I played around with SeqMonk for a while successfully but when I tried to view them in IGV to get a feel for another browser I have no luck.
                        Sorry, no, I've never tried those files in IGV, but they're pretty standard SAM files. They're taken directly from TopHat and I'm sure plenty of other people will have used that with IGV.

                        Maybe if you post the errors you get someone with more IGV experience will chip in with a solution.

                        Comment


                        • #13
                          I also generated some SAM files with Tophat - I'm having no luck displaying them in IGV. There are no error messages - just no reads to be seen!

                          Comment


                          • #14
                            I'm just wondering, if people feel the best analysis approach is to stick to the command line algorithms like Bowtie etc and just use visualisation software towards the end of the process?

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 11:49 AM
                            0 responses
                            13 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-24-2024, 08:47 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            61 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X