Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with While []... done in bash

    Hello all!! I need help converting tab to fasta... I am a newbie and know only a little bash scripting. I need to convert a tab delimited SNP file into either a single fasta file or a multiple fasta files for each column using the first line as identifier.

    The closest I got was a script that generates the required .fasta files but enters a loop and can only be stopped by ctrl-C.


    #!/bin/bash


    echo ">" > carat.txt

    counter=1
    #My tab file has 64 columns

    while : [ $counter -lt 64]
    do

    less <SNP.txt |awk "{print$"$counter"}"| cat carat.txt - >$counter.fa
    counter=$(($counter +1))

    done

    exit



    SNP_001 SNP_002 SNP_003....
    T T T T
    C C C C
    C C C C
    C C C C
    A A A A
    A A A A
    T T T T
    T T T T
    C C C C
    G G G G
    G G G G
    C C C C

  • #2
    See if this thread helps: http://stackoverflow.com/questions/1...a-file-in-bash

    Comment


    • #3
      Is your input the following:

      T T T T
      C C C C
      C C C C
      C C C C
      A A A A
      A A A A
      T T T T
      T T T T
      C C C C
      G G G G
      G G G G
      C C C C

      and do you want to get

      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC
      TCCCAATTCGGC

      back as a result?
      Last edited by blakeoft; 03-31-2014, 05:00 AM. Reason: nitpicky spacing

      Comment


      • #4
        I think @musta1234 wants the matrix transposed and then converted to a multi-fasta file.

        >SNP_001
        TCCCAATTCGGC
        >SNP_002
        TCCCAATTCGGC
        >SNP_003
        TCCCAATTCGGC

        Comment


        • #5
          Thats right

          Sorry for the sloppy explanation, but all the nucleotides are from a tab delimited file and Genomax stated the way I want it perfectly.


          >SNP_001
          TCCCAATTCGGC
          >SNP_002
          TCCCAATTCGGC
          >SNP_003
          TCCCAATTCGGC

          ......

          SNP_XXX
          ATGCATGCATGC

          Thanks

          Comment


          • #6
            This is a bash shell script based on a solution in the stackoverflow thread I had posted above.

            Save the code in a file (script.sh in example below) and then run as follows:

            Code:
            $ sh script.sh your_data file
            Code:
            #!/bin/bash 
            declare -a array=( )                      # we build a 1-D-array
            
            read -a line < "$1"                       # read the headline
            
            COLS=${#line[@]}                          # save number of columns
            
            index=0
            while read -a line; do
                for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                    array[$index]=${line[$COUNTER]}
                    ((index++))
                done
            done < "$1"
            
            for (( ROW = 0; ROW < COLS; ROW++ )); do
                    printf ">"
              for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                printf "%s" ${array[$COUNTER]}
                if [ $COUNTER == $ROW ]
                then
                    printf "\n"
                fi
              done
              printf "\n" 
            done

            Comment


            • #7
              Thanks

              I will definitely give it a try...

              Comment


              • #8
                Works GREAT!!!

                Hey Genomax and all!!

                The code works great... handles a file with 160 columns and 128,000 lines very well.

                Thanks

                Originally posted by GenoMax View Post
                This is a bash shell script based on a solution in the stackoverflow thread I had posted above.




                Save the code in a file (script.sh in example below) and then run as follows:

                Code:
                $ sh script.sh your_data file
                Code:
                #!/bin/bash 
                declare -a array=( )                      # we build a 1-D-array
                
                read -a line < "$1"                       # read the headline
                
                COLS=${#line[@]}                          # save number of columns
                
                index=0
                while read -a line; do
                    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
                        array[$index]=${line[$COUNTER]}
                        ((index++))
                    done
                done < "$1"
                
                for (( ROW = 0; ROW < COLS; ROW++ )); do
                        printf ">"
                  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
                    printf "%s" ${array[$COUNTER]}
                    if [ $COUNTER == $ROW ]
                    then
                        printf "\n"
                    fi
                  done
                  printf "\n" 
                done

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 06:35 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 02:46 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-07-2024, 06:57 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-06-2024, 07:17 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Working...
                X