Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help in beginning to understand data analysis

    Hi all,

    I am newbie here to SEQanswers. I work on experimental part of Illumina library prep. And I am also beginning my first few steps towards analysis. So looking for help in this direction.
    Arun

  • #2
    You should probably get some basic skills in Perl / Python, Linux shell, and R.

    Then look at the How-to sections on the Wiki:

    Comment


    • #3
      While I agree with 'gringer' that skills in the computer basics (languages and shells) will be useful in the long run, I suspect that you can also go a long ways via the use of web-based tools such as 'Galaxy'. There are tutorials available on the main Galaxy web site which can get you started.

      Comment


      • #4
        Originally posted by arunkh View Post
        Hi all,

        I am newbie here to SEQanswers. I work on experimental part of Illumina library prep. And I am also beginning my first few steps towards analysis. So looking for help in this direction.
        Perl is nice, and it is what I started with years ago. Believe it or not, the book that did it all for me was "Beginning Perl for Bioinformatics". From this book I gathered a strong foundation in Perl programming. Nonetheless, as the years moved on I began to really care about speed and performance. I think C/C++ might be something you could consider looking into on the horizon. After all, datasets will only get larger, and those nice, conventional Perl scripts that annotated your SNPs/indels before might need to run much, much faster. That is if you are considering taking a programmatic approach per se. I don't think you need any sort of advanced statistical background unless you plan on working in a bioinformatics-shop doing research and publishing papers, but a sound understanding of the various distributions, and mean, s.d., median, IQRs, etc. would be helpful. R is far more powerful than Excel, and it's pretty easy to comprehend once you get to know it. Some great data resources are UCSC, NCBI, and EBI. If you want a one-stop place to analyze your data, there are some alright open-source web-apps like Galaxy, but there are far more powerful proprietary wares, and you really do pay for what you get as far as support, and functionality goes. Lastly, read articles. If you have access to journals, read everything related to exon capture, RNA-Seq, ChIP-Seq, resequencing, etc. Have fun with it. Best of luck.

        Comment


        • #5
          Arunkh,

          Since you are working with an Illumina instrument, I would also suggest that you try and take a look at the CASAVA pipeline that is used to analyze
          Illumina data

          Praful

          Comment


          • #6
            For analyzing data, you should have extensive knowledge about particular subject
            This is useful to take note of, but should probably be made more specific given that the OP has identified that work will be primarily relating to an Illumina sequencing system. Everything else in that post seemed like advertising (including the link, which doesn't seem to have anything to do with sequencing).

            Following on this track, I spent the first couple of weeks in my job reading papers and browsing SEQanswers. This was because I had very minimal knowledge of NGS (I was previously doing a bit of SNPchip analysis), and needed to find out the common gotchas in this line of work.

            However, I didn't really get my head around the process until I'd seen sequences from the first sequencing run (and had some incentive to produce usable results). If you can get your hands on some sequencing data beforehand (ideally previous stuff done by the institution you're at, but stuff from the SRA is much better than nothing), then have a go reanalysing that data first.

            Comment


            • #7
              I'd strongly agree with gringer, reading can be useful but getting your hands on the data or some sample data and spending a week playing with it is probably the easiest way to learn. Using some of the common open sources tools and visualising the outputs of these tools seems to really help.

              So ideally find a linux pc, install BWA or RTG (commercial but free for indv use, very easy to use), GATK or Freebayes (freebayes simpler for variant calling GATK has far more options and analyses), fastqc, picard tools, samtools, vcftools and a simple viewer like IGV. Get some data in Fastq files a fasta reference genome and have a go with the data.

              RTG has a decent little example included with there software and the GATK wiki has a couple of best practise workflows for sequence data.

              Comment


              • #8
                Whatever route you go in terms of analysis and programming, make sure to start with a program that can cut your input size down considerably. You want to be able to play with the steps in an analysis pipeline in real time and not wait for long times at each step while the programs crunch all of your data. Once you have refined a method using smaller datasets, and have scripted the pipeline, then you let it run on all your data while you are asleep or on weekend.

                Comment


                • #9
                  A lot of great advice and absolutely none of it sounds like advertising. Sorry gringer, really don't understand where you are coming from on this one unless there was some spam that has been deleted. My two cents is not to learn anything until you need to use it. Otherwise, it's all just too boring. Of course as scientists reading outside our expertise is a good practice, but learning skills with no experiment in mind is a waste of time.
                  --------------
                  Ethan

                  Comment


                  • #10
                    Hi all.... Sorry for the late reply was busy with the experiment so just couldnt find time to reply.

                    First of all let me thank all the above members for your invaluable advice... Surely got a few points to start with. Will be looking forward to have a lot of more discussions with all you members from now on :-)
                    Arun

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    7 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X