Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAPsnp sort script

    Hi all,
    For anyone familiar with the SOAP package...
    SOAPsnp takes the SOAPaligner output as its input. The only issue here is that the SOAPaligner output has to be slightly modified. The output has to be sorted first by chromosome name (alphabetically) then by chromosome coordinate (numerically). The output contains 13 tab delimited columns. Chromosome name is the 8th column and coordinate is the 9th.

    My perl skills are still infantile and I'm having a tough time formatting my data.
    Does anyone have a script they wouldn't mind sharing or a solution to this?

    Here is an example line of the output:
    SRR003674.68 GATTAAATAAATATATAGATACCTTTTCCTACTTAT ,)4E8)*;/,.914+-+,&+&)+(+$%($"($#$&# 1 a 36 + Scer_gi|93117368|ref|NC_001136.8| 1272821 2 T->31C-28 T->28C-28 36M 28T2T4
    SRR003674.113 GATATGCTTGAGGATGAACGAGAAGCTAATATAGTC '%&%%%"&$&#)*)(++(+-0+1+'++(104-2-)0 1 a 36 - Scer_gi|93117368|ref|NC_001136.8| 1277159 2 G->6C-30 A->7T-26 36M 6GA28


    Thanks in advance...

  • #2
    this has been working in a unix environment

    sort -k [column number] [filename] > outfile

    Comment


    • #3
      In case anyone is interested...here's a way using unix commands

      (1) From your SOAPaligner output, make a new file for each chromosome using GREP

      grep -w "chr_1" OUTPUT.soap > chr1_OUTPUT.soap
      grep -w "chr_2" OUTPUT.soap > chr2_OUTPUT.soap
      grep -w "chr_3" OUTPUT.soap > chr3_OUTPUT.soap
      etc...

      (2) For each individual chromosome file sort by the chromosomal coordinate (which in this case is the 9th column) using the SORT command

      sort -k 9 -n chr1_OUTPUT.soap > chr1_OUTPUT_sorted.soap
      sort -k 9 -n chr2_OUTPUT.soap > chr2_OUTPUT_sorted.soap
      sort -k 9 -n chr3_OUTPUT.soap > chr3_OUTPUT_sorted.soap
      etc...

      (3) Concatenate the outputs into a single file
      cat chr1_OUTPUT_sorted.soap > OUTPUT_sorted.soap
      cat chr2_OUTPUT_sorted.soap >> OUTPUT_sorted.soap
      cat chr3_OUTPUT_sorted.soap >> OUTPUT_sorted.soap

      OUTPUT_sorted.soap can now be used as the input file in SOAPsnp

      Functional but not that elegant...

      Comment


      • #4
        You can do this with a single unix sort command.

        Code:
        sort -k8,8 -k9,9n OUTPUT.soap > OUTPUT_sorted.soap

        Comment


        • #5
          Thanks!

          When priority sorting there seems to be a memory issue. Breaking the output into individual chromosomes seemed ease that.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advancing Precision Medicine for Rare Diseases in Children
            by seqadmin




            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
            12-16-2024, 07:57 AM
          • seqadmin
            Recent Advances in Sequencing Technologies
            by seqadmin



            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

            Long-Read Sequencing
            Long-read sequencing has seen remarkable advancements,...
            12-02-2024, 01:49 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-17-2024, 10:28 AM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-13-2024, 08:24 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-12-2024, 07:41 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-11-2024, 07:45 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Working...
          X