SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Script to make sequences equal length (http://seqanswers.com/forums/showthread.php?t=84984)

DCseq 10-05-2018 06:13 AM

Script to make sequences equal length
 
Hi!
I would like to perform a motif analysis with my ChIP-Seq data on meme-suite.
The program requires me to put in sequences of equal length.
I have the nucleotide sequences from the regions where our protein of interest bound to in ChIP-Seq. These regions are of varying length.
I would like to make them of equal length by figuring out the sequence of greatest length and adding 'N's to both ends of the remaining sequences.
Anyone done this/has a R script?
Many thanks for your feedback in advance!

Richard Finney 10-05-2018 06:59 AM

This does not "add to both ends", I think I need a little more information to do that.

Here's a solution that appends "N"s.
It's not R, it's an awk "one liner" (a long one liner) :

cat yourseqfile | awk '{ a[count++] = $0; if (length($0)>l)l=length($0) } END { for (i=0; i<count; i++) { printf("%s",a[i]); for (j=length(a[i]); j<l; j++) printf "N"; printf("\n"); } }'

example:
bash-3.2$ cat yourseqfile
START

A
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTAC
ACGT
DONE

bash-3.2$ cat yourseqfile | awk '{ a[count++] = $0; if (length($0)>l)l=length($0) } END { for (i=0; i<count; i++) { printf("%s",a[i]); for (j=length(a[i]); j<l; j++) printf "N"; printf("\n"); } }'
STARTNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNN
ANNNNNNNNNNNNNNNNNNNNNNN
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACNNNNNNNNNN
ACGTNNNNNNNNNNNNNNNNNNNN
DONENNNNNNNNNNNNNNNNNNNN


All times are GMT -8. The time now is 05:57 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.