Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gavin.oliver
    Senior Member
    • Jan 2010
    • 110

    Starting out in Illumina NGS analysis

    Hi all,

    I am wondering what the best way to go about starting out in Illumina NGS analysis is. I have historically been involved in sequence analysis from Sanger sequencing but my experience in NGS is currently zero. I have come up with some thoughts/questions which I have included here. Please edit/let me know your thoughts.

    1) Choose desired applications to learn
    - Reference assembly
    - De novo assembly
    - RNA-Seq

    2) Determine the best/most cutting edge algorithm for each application
    - Can someone advise what these are?

    3) Acquire test data sets
    - Is there a resource for publically available NGS data?

    4) Analyse data
    - Is there a good resource for learnig this analysis?


    My ideas are still probably quite vague so forgive me. Any help would be SINCERELY appreciated.

    Thanks in advance,

    Gavin
  • nickloman
    Senior Member
    • Jul 2009
    • 355

    #2
    Without wanting to sound patronising, how about starting with a scientific question you wish to answer? Sometimes the other questions become rather easier to answer when you do that.

    Comment

    • nickloman
      Senior Member
      • Jul 2009
      • 355

      #3
      Although the answer to question 3 is yes, you can access the NCBI Short Read Archive or the EBI European Nucleotide Archive for example datasets.

      Comment

      • gavin.oliver
        Senior Member
        • Jan 2010
        • 110

        #4
        Lol

        Starting with a single scientific question would indeed be ideal but I don't actually have one.

        I have been tasked with 'gaining experience' in the 3 areas mentioned i.e. de novo and reference assembly and RNA-Seq.

        Initially I guess I was hoping that there may be preferred algorithms for each application. Having determined these I would try to get hold of some data sets, perform some initial exploration and then try to form an idea of a 'project' or question in each area.

        *Edit* Perhaps it would even be wise to get hold of some public data and try to re-perfom some published analysis on them?

        Comment

        • nickloman
          Senior Member
          • Jul 2009
          • 355

          #5
          OK. It depends a bit on your organism of interest. Bacterial genomes like different tools to human genomes for example.

          But here are my tips:

          1) Alignment to reference. Check out BWA, Bowtie, SSAHA2, SAMtools, VarScan and their respective papers. Cutting edge is probably Burrows-Wheeler Transform.

          2) De novo assembly of short reads. Check out Velvet, SOAPdenovo. Cutting edge is probably de Bruijn graphs.

          3) RNA-Seq. Usually one or both of the above methods combined with a counting software. Don't do much of this but you could start with Anthony Fejes' FindPeaks.

          I guess if you wanted to set yourself a project to try learning this stuff you could start by trying to find variations in genomes from the public 1,000 genomes project.

          Comment

          • gavin.oliver
            Senior Member
            • Jan 2010
            • 110

            #6
            Thanks a lot - hopefully I'll manage to formulate some clear questions soon enough.

            Comment

            • mgogol
              Senior Member
              • Mar 2008
              • 197

              #7
              For RNA-seq data, you could look at:

              tophat/cufflinks
              fastx
              R shortRead, DEGseq
              genome browsers: UCSC with bigWig/BAM, IGV
              Galaxy, BEDTools

              I think finding some public data and trying to do something with it is a good idea.

              Comment

              • simonandrews
                Simon Andrews
                • May 2009
                • 870

                #8
                We run a training course on the downstream analysis of next gen data, and the course material and example datasets (including ChIP-Seq and mRNA-Seq) are all available if you want to have a play with them.

                The course is oriented around our software, but you could try the example data in other packages as well and could hopefully pick up some useful hints from working through the exercises.

                Comment

                • gavin.oliver
                  Senior Member
                  • Jan 2010
                  • 110

                  #9
                  Thanks a lot guys - this has been a real help

                  Comment

                  • glacierbird
                    Member
                    • Dec 2009
                    • 15

                    #10
                    this post is very helpful for me as well!

                    Thanks.

                    Comment

                    • gavin.oliver
                      Senior Member
                      • Jan 2010
                      • 110

                      #11
                      @simon

                      Have you tried viewing your course SAM files in IGV?

                      I played around with SeqMonk for a while successfully but when I tried to view them in IGV to get a feel for another browser I have no luck.

                      Comment

                      • simonandrews
                        Simon Andrews
                        • May 2009
                        • 870

                        #12
                        Originally posted by gavin.oliver View Post
                        Have you tried viewing your course SAM files in IGV?

                        I played around with SeqMonk for a while successfully but when I tried to view them in IGV to get a feel for another browser I have no luck.
                        Sorry, no, I've never tried those files in IGV, but they're pretty standard SAM files. They're taken directly from TopHat and I'm sure plenty of other people will have used that with IGV.

                        Maybe if you post the errors you get someone with more IGV experience will chip in with a solution.

                        Comment

                        • gavin.oliver
                          Senior Member
                          • Jan 2010
                          • 110

                          #13
                          I also generated some SAM files with Tophat - I'm having no luck displaying them in IGV. There are no error messages - just no reads to be seen!

                          Comment

                          • gavin.oliver
                            Senior Member
                            • Jan 2010
                            • 110

                            #14
                            I'm just wondering, if people feel the best analysis approach is to stick to the command line algorithms like Bowtie etc and just use visualisation software towards the end of the process?

                            Comment

                            Latest Articles

                            Collapse

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Today, 10:09 AM
                            0 responses
                            8 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, Yesterday, 08:59 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            22 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            19 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...