SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ngs data analysis bassu General 4 10-12-2015 04:28 AM
[NGS - analysis of gene expression data] Machine Learning + RNAseq data Chuckytah Bioinformatics 7 03-05-2012 04:16 AM
Experience with Illumina TruSeq kit for NGS hbn Illumina/Solexa 4 05-19-2011 08:27 PM
Strand SI introduces Avadis NGS. NGS analysis for the rest of us! Strand SI Vendor Forum 0 02-14-2011 11:19 AM
NGS data analysis survey steven Bioinformatics 7 09-21-2009 10:25 AM

Reply
 
Thread Tools
Old 12-07-2011, 10:35 AM   #1
CHoyt
Junior Member
 
Location: New York

Join Date: Dec 2011
Posts: 1
Default Looking for a few NGS-ers willing to share a bad experience about NGS data analysis

Hi, everyone! I'm looking for a few people who would be willing to share a bad experience regarding NGS data analysis. Any takers?

Thanks!
-Carlton
CHoyt is offline   Reply With Quote
Old 12-07-2011, 03:47 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

I'll bite, one of the less potentially embarrassing ones:

We're had several successful little projects doing de novo assembly of phage genomes with 454, but in one case all we got was host contaminant and what looked like human mitochondria. Moral: do more QC on the sample before sequencing. Otherwise you can waste your sequencing money & some analysis time.

Semi-anonymous user names may discourage posts though. I'm sure people here could share horror stories of colleagues coming to them with "We've just done some sequencing, could you assemble it for us please" with no idea of the scale of the problem nor how much analysis time they should have budgeted for. Probably the best warnings would be saved for off the record conversations at the pub/bar at conferences!
maubp is offline   Reply With Quote
Old 12-07-2011, 03:50 PM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.

e.g. http://seqanswers.com/forums/showthread.php?t=15896

Edit: To make my point more explicit (thanks Simon), the point is you should be diligent in your record keeping (electronic lab book or whatever works for you) and include the version number of key packages and datasets/databases since this can sometimes make a surprising difference to the results. This goes beyond high throughput sequencing, and applies to Bioinformatics as a whole.

Last edited by maubp; 12-08-2011 at 02:32 AM.
maubp is offline   Reply With Quote
Old 12-07-2011, 04:46 PM   #4
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

My favorite one of all time:
http://www.ncbi.nlm.nih.gov/pubmed/21102452

Check out supplementary table 1
adaptivegenome is offline   Reply With Quote
Old 12-08-2011, 01:10 AM   #5
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by maubp View Post
Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.
To try to make a wider point - this is why we advocate getting our users to visualise and explore their data. Running a tool, however good it may be, tends to make people too trusting in the results produced. If you can actually view those results in a number of different ways then you get a much better feel for how much confidence they can have in the hits they see.

For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.
simonandrews is offline   Reply With Quote
Old 12-08-2011, 04:24 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Quote:
Originally Posted by simonandrews View Post
For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.
Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.
maubp is offline   Reply With Quote
Old 12-09-2011, 02:52 PM   #7
polyatail
Member
 
Location: New York, NY

Join Date: Dec 2010
Posts: 25
Default

Quote:
Originally Posted by genericforms View Post
My favorite one of all time:
http://www.ncbi.nlm.nih.gov/pubmed/21102452

Check out supplementary table 1
Buccal swab?
polyatail is offline   Reply With Quote
Old 12-09-2011, 04:30 PM   #8
adaptivegenome
Super Moderator
 
Location: US

Join Date: Nov 2009
Posts: 437
Default

Quote:
Originally Posted by polyatail View Post
Buccal swab?
LOL! Must have been!
adaptivegenome is offline   Reply With Quote
Old 12-10-2011, 12:06 AM   #9
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by maubp View Post
Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.
As if to prove a point, I saw this tweet this morning.
simonandrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO