Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with While []... done in bash

    Hello all!! I need help converting tab to fasta... I am a newbie and know only a little bash scripting. I need to convert a tab delimited SNP file into either a single fasta file or a multiple fasta files for each column using the first line as identifier.

    The closest I got was a script that generates the required .fasta files but enters a loop and can only be stopped by ctrl-C.


    #!/bin/bash


    echo ">" > carat.txt

    counter=1
    #My tab file has 64 columns

    while : [ $counter -lt 64]
    do

    less <SNP.txt |awk "{print$"$counter"}"| cat carat.txt - >$counter.fa
    counter=$(($counter +1))

    done

    exit



    SNP_001 SNP_002 SNP_003....
    T T T T
    C C C C
    C C C C
    C C C C
    A A A A
    A A A A
    T T T T
    T T T T
    C C C C
    G G G G
    G G G G
    C C C C

  • #2
    See if this thread helps: http://stackoverflow.com/questions/1...a-file-in-bash

    Comment


    • #3
      Is your input the following:

      T T T T
      C C C C
      C C C C
      C C C C
      A A A A
      A A A A
      T T T T
      T T T T
      C C C C
      G G G G
      G G G G
      C C C C

      and do you want to get

      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC

      back as a result?
      Last edited by blakeoft; 03-31-2014, 05:00 AM. Reason: nitpicky spacing

      Comment


      • #4
        I think @musta1234 wants the matrix transposed and then converted to a multi-fasta file.

        >SNP_001
        TCCCAATTCGGC
        >SNP_002
        TCCCAATTCGGC
        >SNP_003
        TCCCAATTCGGC

        Comment


        • #5
          Thats right

          Sorry for the sloppy explanation, but all the nucleotides are from a tab delimited file and Genomax stated the way I want it perfectly.


          >SNP_001
          TCCCAATTCGGC
          >SNP_002
          TCCCAATTCGGC
          >SNP_003
          TCCCAATTCGGC

          ......

          SNP_XXX
          ATGCATGCATGC

          Thanks

          Comment


          • #6
            This is a bash shell script based on a solution in the stackoverflow thread I had posted above.

            Save the code in a file (script.sh in example below) and then run as follows:

            Code:
            $ sh script.sh your_data file
            Code:
            #!/bin/bash 
            declare -a array=( )                      # we build a 1-D-array
            
            read -a line < "$1"                       # read the headline
            
            COLS=${#line[@]}                          # save number of columns
            
            index=0
            while read -a line; do
                for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                    array[$index]=${line[$COUNTER]}
                    ((index++))
                done
            done < "$1"
            
            for (( ROW = 0; ROW < COLS; ROW++ )); do
                    printf ">"
              for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                printf "%s" ${array[$COUNTER]}
                if [ $COUNTER == $ROW ]
                then
                    printf "\n"
                fi
              done
              printf "\n" 
            done

            Comment


            • #7
              Thanks

              I will definitely give it a try...

              Comment


              • #8
                Works GREAT!!!

                Hey Genomax and all!!

                The code works great... handles a file with 160 columns and 128,000 lines very well.

                Thanks

                Originally posted by GenoMax View Post
                This is a bash shell script based on a solution in the stackoverflow thread I had posted above.




                Save the code in a file (script.sh in example below) and then run as follows:

                Code:
                $ sh script.sh your_data file
                Code:
                #!/bin/bash 
                declare -a array=( )                      # we build a 1-D-array
                
                read -a line < "$1"                       # read the headline
                
                COLS=${#line[@]}                          # save number of columns
                
                index=0
                while read -a line; do
                    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                        array[$index]=${line[$COUNTER]}
                        ((index++))
                    done
                done < "$1"
                
                for (( ROW = 0; ROW < COLS; ROW++ )); do
                        printf ">"
                  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                    printf "%s" ${array[$COUNTER]}
                    if [ $COUNTER == $ROW ]
                    then
                        printf "\n"
                    fi
                  done
                  printf "\n" 
                done

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X