SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
perl : Remove redundant feature in fasta file StephaniePi83 Bioinformatics 9 12-15-2012 07:01 PM
.abi to fasta/fastq conversion script/program? AppleInformatics General 12 08-26-2012 11:17 PM
Bioperl Script to convert fasta to fq Lizex Bioinformatics 0 01-26-2012 11:21 PM
how to remove N (gap) in sequences? chronix De novo discovery 3 03-08-2011 05:31 AM
gap alignment and local alignment? mingkunli Illumina/Solexa 3 02-19-2009 12:13 PM

Reply
 
Thread Tools
Old 02-15-2010, 04:21 PM   #1
kmkocot
Member
 
Location: Alabama

Join Date: Jun 2009
Posts: 48
Default Script to remove gap-only sites from fasta alignment?

Hi all,

I am trying to find a script or a program that I can call in a pipeline that can remove gap-only sites in a multiple sequence alignment. Can you think of anything I can use? I'd also like to
delete the columns at either end of each alignment where there are only 2 or fewer sequences with non-gap characters if you can think of anything that can do that.

Thanks!
Kevin
kmkocot is offline   Reply With Quote
Old 02-23-2010, 02:19 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

I can picture how I would solve this using Biopython, provided the whole alignment can be loaded into RAM.

How big are the alignments (number of columns, number of rows)?
maubp is offline   Reply With Quote
Old 02-23-2010, 09:16 AM   #3
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by kmkocot View Post
Hi all,

I am trying to find a script or a program that I can call in a pipeline that can remove gap-only sites in a multiple sequence alignment. Can you think of anything I can use? I'd also like to
delete the columns at either end of each alignment where there are only 2 or fewer sequences with non-gap characters if you can think of anything that can do that.

Thanks!
Kevin
Given your alignments in the SAM format, you can remove all alignments with indels with the C-program "dbamfilter -i". You could also easily parse the CIGAR field in the SAM format for any "I" or "D" operators.

What format are you working with?
nilshomer is offline   Reply With Quote
Old 02-23-2010, 09:52 AM   #4
kmkocot
Member
 
Location: Alabama

Join Date: Jun 2009
Posts: 48
Default

Hi maubp,

The alignments vary but all have fewer than 30 taxa and fewer than 1000 amino acids.

Thanks,
Kevin
kmkocot is offline   Reply With Quote
Old 02-23-2010, 10:50 AM   #5
kmkocot
Member
 
Location: Alabama

Join Date: Jun 2009
Posts: 48
Default

Hi maubp,

The alignments vary but all have fewer than 30 taxa and fewer than 1000 amino acids.

Thanks,
Kevin
kmkocot is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO