SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
NextSeq variable length reads biocomputer Bioinformatics 2 01-06-2017 08:32 PM
Demultiplexing using variable-length barcodes id0 Bioinformatics 1 08-10-2013 03:46 PM
Merge variable-length adaptor from beginning of read sowalsky Bioinformatics 0 11-12-2012 01:27 PM
tophat, --mate-inner-dist and variable length reads dawe Bioinformatics 2 01-20-2012 05:47 AM
Software for variable-length PE reads shazzle Bioinformatics 1 11-15-2010 10:51 PM

Reply
 
Thread Tools
Old 02-23-2017, 09:14 PM   #1
Illusive Man
Member
 
Location: GA

Join Date: Sep 2013
Posts: 14
Default Fastest way to add Ns to variable length sequences to get uniform length

I have a fasta file with a thousand sequences with a distribution of lengths between 100 and 150bp. I would like to add Ns to all sequences whose length is <150. I know it is possible but thus far I have yet to find anything to easily do this. Please help. Thanks!
Illusive Man is offline   Reply With Quote
Old 02-23-2017, 10:44 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Cross-posted:

https://www.biostars.org/p/238635/

Please link other sites when you cross-post, so people don't waste their time answering a question that has already been answered.
Brian Bushnell is offline   Reply With Quote
Old 02-24-2017, 09:15 AM   #3
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

Not fast, but something like:

Code:
perl -e 'while(<>) {if (/^>/) {print;next}; chomp; print $_,"N" x (150-length($_)),"\n"}'  in.fasta
jkbonfield is offline   Reply With Quote
Old 02-24-2017, 10:05 AM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 823
Default

That looks like it should be plenty fast enough, more likely to be limited by the read speed of the hard drive than the speed of the code.
gringer is offline   Reply With Quote
Reply

Tags
biopython, fasta file, metagenomics, python, sequences

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO