Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Similarity programs for SNP files

    Hi,

    I'm a newbie to this field so please forgive the use of very basic language.

    I've got two files. Each contains position and nucleotide information on each line.

    Each file holds just the SNP information for a person. So for ex. for person A, the file would look like.
    Chromosome 6: 133,088,927, G
    Chromosome 6: 133,088,928, A
    and so on.

    The second file too has nucleotide information for the exact same locations.

    Is there a utility somewhere that will show me the similarity between the two files? Something on the lines of BLAST which ofcourse requires the full sequence information and not just SNPs.

    Prompt help will be much appreciated.

    Thanks.

    PS. the location information might not be exactly as detailed, I've oversimplified it for the sake of clarity.
    Last edited by leofixings; 05-03-2011, 07:38 AM. Reason: Missed out some information.

  • #2
    comm

    type "man comm" at command line and see if you can use the "comm" command. Perhaps simplly piping the output to "wc" might be a crude, but effective measure.

    COMM(1) User Commands COMM(1)

    NAME
    comm - compare two sorted files line by line

    SYNOPSIS
    comm [OPTION]... FILE1 FILE2

    DESCRIPTION
    Compare sorted files FILE1 and FILE2 line by line.

    With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains
    lines unique to FILE2, and column three contains lines common to both files.

    -1 suppress lines unique to FILE1

    -2 suppress lines unique to FILE2

    -3 suppress lines that appear in both files

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 11:49 AM
    0 responses
    15 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X