SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
ChIP-Seq: Enabling Data Analysis on High-Throughput Data in Large Data Depository Usi Newsbot! Literature Watch 1 04-18-2018 10:50 PM
The Beginning of the End for Exome Sequencing dongzw MGISEQ (FKA Complete Genomics) 9 07-17-2012 06:04 AM
To understand Punnett Squares ardmore General 2 08-31-2011 02:03 PM
How to understand the output of mpileup like this skblazer Bioinformatics 0 12-05-2010 11:43 AM
Help me understand MAQ indexing pieffe Bioinformatics 0 06-01-2009 08:09 AM

Reply
 
Thread Tools
Old 12-21-2011, 10:16 PM   #1
arunkh
Junior Member
 
Location: Bangalore

Join Date: Dec 2011
Posts: 4
Default Help in beginning to understand data analysis

Hi all,

I am newbie here to SEQanswers. I work on experimental part of Illumina library prep. And I am also beginning my first few steps towards analysis. So looking for help in this direction.
__________________
Arun
arunkh is offline   Reply With Quote
Old 12-22-2011, 12:03 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

You should probably get some basic skills in Perl / Python, Linux shell, and R.

Then look at the How-to sections on the Wiki:

http://seqanswers.com/wiki/How-to
gringer is offline   Reply With Quote
Old 12-22-2011, 08:03 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

While I agree with 'gringer' that skills in the computer basics (languages and shells) will be useful in the long run, I suspect that you can also go a long ways via the use of web-based tools such as 'Galaxy'. There are tutorials available on the main Galaxy web site which can get you started.
westerman is offline   Reply With Quote
Old 12-22-2011, 08:44 AM   #4
JohnK
Senior Member
 
Location: Los Angeles, China.

Join Date: Feb 2010
Posts: 106
Default

Quote:
Originally Posted by arunkh View Post
Hi all,

I am newbie here to SEQanswers. I work on experimental part of Illumina library prep. And I am also beginning my first few steps towards analysis. So looking for help in this direction.
Perl is nice, and it is what I started with years ago. Believe it or not, the book that did it all for me was "Beginning Perl for Bioinformatics". From this book I gathered a strong foundation in Perl programming. Nonetheless, as the years moved on I began to really care about speed and performance. I think C/C++ might be something you could consider looking into on the horizon. After all, datasets will only get larger, and those nice, conventional Perl scripts that annotated your SNPs/indels before might need to run much, much faster. That is if you are considering taking a programmatic approach per se. I don't think you need any sort of advanced statistical background unless you plan on working in a bioinformatics-shop doing research and publishing papers, but a sound understanding of the various distributions, and mean, s.d., median, IQRs, etc. would be helpful. R is far more powerful than Excel, and it's pretty easy to comprehend once you get to know it. Some great data resources are UCSC, NCBI, and EBI. If you want a one-stop place to analyze your data, there are some alright open-source web-apps like Galaxy, but there are far more powerful proprietary wares, and you really do pay for what you get as far as support, and functionality goes. Lastly, read articles. If you have access to journals, read everything related to exon capture, RNA-Seq, ChIP-Seq, resequencing, etc. Have fun with it. Best of luck.
JohnK is offline   Reply With Quote
Old 12-22-2011, 11:21 AM   #5
aggp11
Member
 
Location: Wisconsin

Join Date: Jun 2011
Posts: 87
Default

Arunkh,

Since you are working with an Illumina instrument, I would also suggest that you try and take a look at the CASAVA pipeline that is used to analyze
Illumina data

Praful
aggp11 is offline   Reply With Quote
Old 01-11-2012, 02:43 PM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Quote:
For analyzing data, you should have extensive knowledge about particular subject
This is useful to take note of, but should probably be made more specific given that the OP has identified that work will be primarily relating to an Illumina sequencing system. Everything else in that post seemed like advertising (including the link, which doesn't seem to have anything to do with sequencing).

Following on this track, I spent the first couple of weeks in my job reading papers and browsing SEQanswers. This was because I had very minimal knowledge of NGS (I was previously doing a bit of SNPchip analysis), and needed to find out the common gotchas in this line of work.

However, I didn't really get my head around the process until I'd seen sequences from the first sequencing run (and had some incentive to produce usable results). If you can get your hands on some sequencing data beforehand (ideally previous stuff done by the institution you're at, but stuff from the SRA is much better than nothing), then have a go reanalysing that data first.
gringer is offline   Reply With Quote
Old 01-13-2012, 03:10 PM   #7
aeonsim
Member
 
Location: Belgium

Join Date: Jun 2011
Posts: 45
Default

I'd strongly agree with gringer, reading can be useful but getting your hands on the data or some sample data and spending a week playing with it is probably the easiest way to learn. Using some of the common open sources tools and visualising the outputs of these tools seems to really help.

So ideally find a linux pc, install BWA or RTG (commercial but free for indv use, very easy to use), GATK or Freebayes (freebayes simpler for variant calling GATK has far more options and analyses), fastqc, picard tools, samtools, vcftools and a simple viewer like IGV. Get some data in Fastq files a fasta reference genome and have a go with the data.

RTG has a decent little example included with there software and the GATK wiki has a couple of best practise workflows for sequence data.
aeonsim is offline   Reply With Quote
Old 01-13-2012, 10:01 PM   #8
jiaco
Member
 
Location: GMT +1

Join Date: May 2010
Posts: 35
Default

Whatever route you go in terms of analysis and programming, make sure to start with a program that can cut your input size down considerably. You want to be able to play with the steps in an analysis pipeline in real time and not wait for long times at each step while the programs crunch all of your data. Once you have refined a method using smaller datasets, and have scripted the pipeline, then you let it run on all your data while you are asleep or on weekend.
jiaco is offline   Reply With Quote
Old 01-14-2012, 05:20 AM   #9
ETHANol
Senior Member
 
Location: Western Australia

Join Date: Feb 2010
Posts: 308
Default

A lot of great advice and absolutely none of it sounds like advertising. Sorry gringer, really don't understand where you are coming from on this one unless there was some spam that has been deleted. My two cents is not to learn anything until you need to use it. Otherwise, it's all just too boring. Of course as scientists reading outside our expertise is a good practice, but learning skills with no experiment in mind is a waste of time.
__________________
--------------
Ethan
ETHANol is offline   Reply With Quote
Old 01-20-2012, 01:47 AM   #10
arunkh
Junior Member
 
Location: Bangalore

Join Date: Dec 2011
Posts: 4
Default

Hi all.... Sorry for the late reply was busy with the experiment so just couldnt find time to reply.

First of all let me thank all the above members for your invaluable advice... Surely got a few points to start with. Will be looking forward to have a lot of more discussions with all you members from now on :-)
__________________
Arun
arunkh is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO