Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Randomly sampling loci from multiple resequenced

    Hello all,

    About to start on a bit of bioinformatics endeavour for my population genomics study and before I do I just wondered if anyone had any pointers/suggestions.

    I have access to the resequenced genomes of ~25 individuals. While further along I want to do some more in-depth analysis, right now I would just like to randomly sample the genomes for independent loci to get some simple estimates of some basic population genomic parameters (i.e. theta). So I would ideally like to get loci 500-1000 bp, approximately 100 kb apart (to ensure independence).

    At the moment, all of the genomes have been assembled and mapped to a reference genome. So my question is, what is the best way to go about extracting loci? One idea I had was to align the consensus sequences using a whole genome aligner and then use a tool like Phylomarker to extract loci from orthologous blocks.

    However, since the genomes have all been aligned to the same reference sequence, that seems a bit computationally wasteful. My other idea was to take the BAM files from each of the alignments and extract loci fitting my requirement from those. For what it's worth, I'm not afraid of scripting in Perl or R (and maybe even Python) if it's required to get the job done.

    Any input would be very much appreciated!

  • #2
    You can use your BAM files and use samtools to extract specific locations:
    samtools view input.bam chr1:12311212-12311312
    You can write a Perl script to set the parameters for samtools.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X