SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
script for SOAP de novo to make use of illumina short read data muralis.bio Bioinformatics 0 09-05-2015 08:48 AM
Perl script: Make Statistics Of Mirna Abundances For Many Samples pony2001mx Bioinformatics 6 05-04-2014 06:12 PM
program which can make a pair end to have equal number of sequence dejavu2010 Bioinformatics 25 03-19-2014 07:45 AM
how to use bash script to make iterative loop through directory with two file types jddavis Bioinformatics 2 05-23-2013 08:02 AM
< Script to compute distribution length of sequences > Giorgio C Bioinformatics 8 08-23-2012 03:29 AM

Reply
 
Thread Tools
Old 10-05-2018, 06:13 AM   #1
DCseq
Junior Member
 
Location: Germany

Join Date: Jul 2017
Posts: 6
Default Script to make sequences equal length

Hi!
I would like to perform a motif analysis with my ChIP-Seq data on meme-suite.
The program requires me to put in sequences of equal length.
I have the nucleotide sequences from the regions where our protein of interest bound to in ChIP-Seq. These regions are of varying length.
I would like to make them of equal length by figuring out the sequence of greatest length and adding 'N's to both ends of the remaining sequences.
Anyone done this/has a R script?
Many thanks for your feedback in advance!
DCseq is offline   Reply With Quote
Old 10-05-2018, 06:59 AM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 699
Default

This does not "add to both ends", I think I need a little more information to do that.

Here's a solution that appends "N"s.
It's not R, it's an awk "one liner" (a long one liner) :

cat yourseqfile | awk '{ a[count++] = $0; if (length($0)>l)l=length($0) } END { for (i=0; i<count; i++) { printf("%s",a[i]); for (j=length(a[i]); j<l; j++) printf "N"; printf("\n"); } }'

example:
bash-3.2$ cat yourseqfile
START

A
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTAC
ACGT
DONE

bash-3.2$ cat yourseqfile | awk '{ a[count++] = $0; if (length($0)>l)l=length($0) } END { for (i=0; i<count; i++) { printf("%s",a[i]); for (j=length(a[i]); j<l; j++) printf "N"; printf("\n"); } }'
STARTNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNN
ANNNNNNNNNNNNNNNNNNNNNNN
ACGTACGTACGTACGTACGTACGT
ACGTACGTACGTACNNNNNNNNNN
ACGTNNNNNNNNNNNNNNNNNNNN
DONENNNNNNNNNNNNNNNNNNNN

Last edited by Richard Finney; 10-05-2018 at 07:18 AM.
Richard Finney is offline   Reply With Quote
Reply

Tags
chip-seq, meme-suite, motif analysis, r script

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO