Multi-Genome Alignment for QC...

james hadfield

Moderator
Cambridge, UK
Community Forum

Join Date: Feb 2008

Posts: 220
- Share
- Tweet
#1

Multi-Genome Alignment for QC...

08-17-2010, 07:51 AM

In a previous post on our HiSeq I mentioned that we were running a multi-genome alignment (MGA) as a QC tool. Comments made me think it would be an interesting topic to post in the Bioinformatics section, not one I usually post in!

The work for this was done by Matt Edlridge, our head of bioinformatics. Big thanks to him for doing it!

The MGA takes a sample of sequence reads from a lane and aligns the first 36bp using Bowtie. The sampling allows the MGA to run fast and this is part of our normal data pipeline, we get to see the report in our LIMs alongside the Gerald report (which I think we will soon be ditching entirely).

Of course reads can align to multiple genomes (conserved regions). If this happens we assign the read to the genome with most reads. This approach should show up cases of genome contamination and maximise the difference between first and second genomes in the list.

We also use Exonerate to identify sequences containing Illumina adapters.

Currently we run against: Human, Mouse, Rat, Xenopus, Arabidopsis, C.elegans, Yeast, Bacteria and Viruses (the last two being amalgamations of >1500 genomes each). There are other genomes as well which are specific to the work for projects in our lab, I guess at some level it would be possible to run against all genomes?

The output is a descending list of genomes with the highest number of aligned reads expressed as a percentage. Hopefully the genome the user was expecting! We did have a case about three years ago where one user accidentally sequenced a genome to 80x coverage of an organism that was also growing in his lab. It took a little time to work out what was wrong with his experiment and I believe the data was handed over to that community. Serendipity at its best!
There are often un-aligned reads and the assumption initially was that these were junk low quality reads. Running this kind of aligner might allow us to see if that assumption is true but we have not looked at this at this time.

The reason I wanted this MGA in our pipeline was to see what amount of PhiX was in lanes where we had not actually put it. The assumption was that any sloppy practices in a lab where all flowcells are set up would be obvious in this instance. It was immediately clear that the level of PhiX ‘contamination’ from lane to lane was very low. We identified two or three flowcells where there was a potential issue but this was out of over many hundred. We were also able to get run reports and data from anther large centre nearby and they had similar results. All in all I was very happy with the low contamination from lane to lane and am very happy that the protocols are reasonably robust.

PhiX must be being breathed in as aerosols in labs the word over, might we get some Cronenberg style PhiX-Human hybrid. Let me know if you see one...

Let me know what you think.
Tags: None

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Multi-Genome Alignment for QC...

Latest Articles

ad_right_rmr

News