Unconfigured Ad

**maubp** · 12-07-2011, 03:47 PM

I'll bite, one of the less potentially embarrassing ones:

We're had several successful little projects doing de novo assembly of phage genomes with 454, but in one case all we got was host contaminant and what looked like human mitochondria. Moral: do more QC on the sample before sequencing. Otherwise you can waste your sequencing money & some analysis time.

Semi-anonymous user names may discourage posts though. I'm sure people here could share horror stories of colleagues coming to them with "We've just done some sequencing, could you assemble it for us please" with no idea of the scale of the problem nor how much analysis time they should have budgeted for. Probably the best warnings would be saved for off the record conversations at the pub/bar at conferences!

**maubp** · 12-07-2011, 03:50 PM

Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.

e.g. http://seqanswers.com/forums/showthread.php?t=15896

Edit: To make my point more explicit (thanks Simon), the point is you should be diligent in your record keeping (electronic lab book or whatever works for you) and include the version number of key packages and datasets/databases since this can sometimes make a surprising difference to the results. This goes beyond high throughput sequencing, and applies to Bioinformatics as a whole.

**adaptivegenome** · 12-07-2011, 04:46 PM

My favorite one of all time:

Checking your browser - reCAPTCHA

http://www.ncbi.nlm.nih.gov/pubmed/21102452

Check out supplementary table 1

**simonandrews** · 12-08-2011, 01:10 AM

Originally posted by maubp View Post

Another one for you (not first hand): We updated tool X and repeated the analysis and now all the results have changed almost beyond recognition. I can think of some threads here along those lines discussing differential gene expression from RNA-Seq data.

To try to make a wider point - this is why we advocate getting our users to visualise and explore their data. Running a tool, however good it may be, tends to make people too trusting in the results produced. If you can actually view those results in a number of different ways then you get a much better feel for how much confidence they can have in the hits they see.

For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.

**maubp** · 12-08-2011, 04:24 AM

Originally posted by simonandrews View Post

For example - you might find that changing an analysis threshold by a small amount can hugely change the number of hits you get, but if you can see a scatterplot of your data with the threshold you're using on the edge of a huge cloud of points then you can see exactly why this happens.

Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.

**polyatail** · 12-09-2011, 02:52 PM

Originally posted by genericforms View Post

My favorite one of all time:

Checking your browser - reCAPTCHA

http://www.ncbi.nlm.nih.gov/pubmed/21102452

Check out supplementary table 1

Buccal swab?

**adaptivegenome** · 12-09-2011, 04:30 PM

Originally posted by polyatail View Post

Buccal swab?

LOL! Must have been!

**simonandrews** · 12-10-2011, 12:06 AM

Originally posted by maubp View Post

Excellent advice. Another related point is to avoid pre-determined e-values as thresholds when they will alter radically based on things like dataset size (e.g. BLAST matches - whereas the bitscore is stable). i.e. A discriminatory e-value for one dataset can be quite inappropriate on another.

As if to prove a point, I saw this tweet this morning.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 48 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Looking for a few NGS-ers willing to share a bad experience about NGS data analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News