![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
program which can make a pair end to have equal number of sequence | dejavu2010 | Bioinformatics | 29 | 10-07-2019 06:35 PM |
script for SOAP de novo to make use of illumina short read data | muralis.bio | Bioinformatics | 0 | 09-05-2015 08:48 AM |
Perl script: Make Statistics Of Mirna Abundances For Many Samples | pony2001mx | Bioinformatics | 6 | 05-04-2014 06:12 PM |
how to use bash script to make iterative loop through directory with two file types | jddavis | Bioinformatics | 2 | 05-23-2013 08:02 AM |
< Script to compute distribution length of sequences > | Giorgio C | Bioinformatics | 8 | 08-23-2012 03:29 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Germany Join Date: Jul 2017
Posts: 6
|
![]()
Hi!
I would like to perform a motif analysis with my ChIP-Seq data on meme-suite. The program requires me to put in sequences of equal length. I have the nucleotide sequences from the regions where our protein of interest bound to in ChIP-Seq. These regions are of varying length. I would like to make them of equal length by figuring out the sequence of greatest length and adding 'N's to both ends of the remaining sequences. Anyone done this/has a R script? Many thanks for your feedback in advance! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
This does not "add to both ends", I think I need a little more information to do that.
Here's a solution that appends "N"s. It's not R, it's an awk "one liner" (a long one liner) : cat yourseqfile | awk '{ a[count++] = $0; if (length($0)>l)l=length($0) } END { for (i=0; i<count; i++) { printf("%s",a[i]); for (j=length(a[i]); j<l; j++) printf "N"; printf("\n"); } }' example: bash-3.2$ cat yourseqfile START A ACGTACGTACGTACGTACGTACGT ACGTACGTACGTAC ACGT DONE bash-3.2$ cat yourseqfile | awk '{ a[count++] = $0; if (length($0)>l)l=length($0) } END { for (i=0; i<count; i++) { printf("%s",a[i]); for (j=length(a[i]); j<l; j++) printf "N"; printf("\n"); } }' STARTNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNN ANNNNNNNNNNNNNNNNNNNNNNN ACGTACGTACGTACGTACGTACGT ACGTACGTACGTACNNNNNNNNNN ACGTNNNNNNNNNNNNNNNNNNNN DONENNNNNNNNNNNNNNNNNNNN Last edited by Richard Finney; 10-05-2018 at 07:18 AM. |
![]() |
![]() |
![]() |
Tags |
chip-seq, meme-suite, motif analysis, r script |
Thread Tools | |
|
|