Converting an early genome assembly to current coordinates?

rareaquaticbadger

Junior Member

Join Date: Jun 2011

Posts: 6
- Share
- Tweet
#1

Converting an early genome assembly to current coordinates?

10-24-2013, 11:54 AM

Hi there,

I have been involved in a project where I have aligned my sequence data to MGSCv3, the first build of the mouse genome, which consists of ~250,000 contigs. My project is testing whether I can use sequencing technique to order and reconstitute these contigs into a more 'complete' genome.

As such, I would like to see how accurately I have ordered my MGSCv3-data by comparing it to the actual locations of each of the 250k contigs in the latest build of the mouse genome (GRCm38 / mm10). I initially did a 'dirty' approach of just taking the first 100nt of each contig, and performing a bwa aln to the latest build, but I would like to get more accurate localizations.

Initially I thought I could just find the current mm10 coordinates of the MGSCv3 accession numbers or gi numbers in NCBI, but I can't locate such a table.

Then I thought I could use LiftOver to find the coordinates, but the assembly versions don't go back far enough in UCSC (they only support liftOvers from mm7 onward). Then I tried BLAT or BLAST, but the online versions couldn't handle the number of records I want to analyze, and I couldn't find a good way to implement a local installation to do this.

Finally, I've been looking at NCBI remap, but again the web-based version cannot handle the number of records, and I can't find a way to implement this locally. Also, the identifiers for remap MGSCv3 are different to the identifiers I have. From the NCBI-downloaded build, each fasta region is in the format

"gi|20564479|emb|CAAA01000001.1|,9601"

while remap wants the location in the format

"chrMmUn_WIFeb01_42457:1 -9600"

I was wondering if this community has any ideas on how to convert bulk records from a very early reference assembly to a later version? Or if there are any repositories that would contain this information?

Any advice would be greatly appreciated!
Tags: build, liftover, reference, reference assembly, remap
GenoMax

Senior Member

Join Date: Feb 2008

Posts: 7140
- Share
- Tweet
#2

10-24-2013, 12:09 PM

LiftOver files for the older genome builds for mouse are available via UCSC archives: http://genome-archive.cse.ucsc.edu/downloads.html
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Converting an early genome assembly to current coordinates?

Comment

Latest Articles

ad_right_rmr

News