Randomly sampling loci from multiple resequenced

stickleback

Junior Member

Join Date: Feb 2013

Posts: 8
- Share
- Tweet
#1

Randomly sampling loci from multiple resequenced

02-04-2014, 04:49 PM

Hello all,

About to start on a bit of bioinformatics endeavour for my population genomics study and before I do I just wondered if anyone had any pointers/suggestions.

I have access to the resequenced genomes of ~25 individuals. While further along I want to do some more in-depth analysis, right now I would just like to randomly sample the genomes for independent loci to get some simple estimates of some basic population genomic parameters (i.e. theta). So I would ideally like to get loci 500-1000 bp, approximately 100 kb apart (to ensure independence).

At the moment, all of the genomes have been assembled and mapped to a reference genome. So my question is, what is the best way to go about extracting loci? One idea I had was to align the consensus sequences using a whole genome aligner and then use a tool like Phylomarker to extract loci from orthologous blocks.

However, since the genomes have all been aligned to the same reference sequence, that seems a bit computationally wasteful. My other idea was to take the BAM files from each of the alignments and extract loci fitting my requirement from those. For what it's worth, I'm not afraid of scripting in Perl or R (and maybe even Python) if it's required to get the job done.

Any input would be very much appreciated!
Tags: None
TiborNagy

Senior Member

Join Date: Mar 2010

Posts: 329
- Share
- Tweet
#2

02-05-2014, 02:58 AM

You can use your BAM files and use samtools to extract specific locations:
samtools view input.bam chr1:12311212-12311312
You can write a Perl script to set the parameters for samtools.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Randomly sampling loci from multiple resequenced

Comment

Latest Articles

ad_right_rmr

News