SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bias in merging sequence data? quinne5 RNA Sequencing 0 11-01-2012 09:17 AM
fastest pairwise alignment method Trilliput Bioinformatics 0 10-19-2012 11:59 AM
Pairwise comparisons in DEXSeq Julien Roux Illumina/Solexa 7 07-20-2012 03:29 AM
R mismatch position in pairwise alignment NicoBxl Bioinformatics 4 10-27-2010 08:36 AM
Performing many pairwise alignments fbarreto Bioinformatics 4 02-27-2010 04:36 AM

Reply
 
Thread Tools
Old 12-16-2012, 11:37 PM   #1
loba17
Member
 
Location: Switzerland

Join Date: Sep 2011
Posts: 19
Default Merging Pairwise Sequence Alignment

Dear All,

I am looking for a way to "merge" two large sequence files. The first sequence is the reference sequence and the second is a consensus sequence containing ambiguous nucleotides (Ns). The merged sequence should correspond to the consensus sequence but the missing nucleotides should be used from the reference.

Example:

BAM-consensus: CCCTANNNNNNNNCATCTACATGG
Reference: GTATAGATATCATCATCTACATCC
Output => CCCTAGATATCATCATCTACATGG


I tried Emboss (cons, consambig, megamerger) but the problems are ambiguous nucleotides. I wrote a simple unix script and it worked just fine but not with large sequences (100MB). Any ideas that would work for larger files?

Thanks for considering my question!
loba17 is offline   Reply With Quote
Old 12-17-2012, 12:38 AM   #2
RickBioinf
Member
 
Location: Leiden, The Netherlands

Join Date: Sep 2012
Posts: 28
Default

Maybe try and right a Perl file, I think that would be your best option for this.
I personally don't know a tool for this.
RickBioinf is offline   Reply With Quote
Old 12-17-2012, 01:23 AM   #3
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

m1:a=fgetc(file1);
b=fgetc(file2);
if(a=='N')a=b;
fputc(a,file3);
if(feof(file1)==0)goto m1;
gsgs is offline   Reply With Quote
Old 12-17-2012, 02:42 AM   #4
loba17
Member
 
Location: Switzerland

Join Date: Sep 2011
Posts: 19
Default

thanks gsgs

for your helpful reply. unfortunately I am not familiar with fgetc. I am on a Linux RedHat system.
loba17 is offline   Reply With Quote
Old 12-17-2012, 03:12 AM   #5
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

it's a pity with all those different systems ...
do you have a C-compiler ? The simpler the better
gsgs is offline   Reply With Quote
Old 12-17-2012, 03:23 AM   #6
loba17
Member
 
Location: Switzerland

Join Date: Sep 2011
Posts: 19
Default

the system was not my choice

I do have a c compiler an it is :
gcc version 4.4.6 20120305 (Red Hat 4.4.6-4)

does this help in anyway?
loba17 is offline   Reply With Quote
Old 12-17-2012, 03:33 AM   #7
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

yes, that could work in theory ...

have you ever successfully compiled a simple C-program ?

I'll test it (on Windows) and post it here later (or PM or email if you prefer)

are the 2 files in fasta-format ?
gsgs is offline   Reply With Quote
Old 12-17-2012, 03:53 AM   #8
loba17
Member
 
Location: Switzerland

Join Date: Sep 2011
Posts: 19
Default

I guess I would be able to handle compiling a simple C program using:
gcc infile.c -o outfile

I have one file with two sequences but I can easily split into two.

I could also try to compile the lines you send me myself if you could help me with the C formating:

/* File: merger.c
Function: Simple program to merge a reference sequence and with another sequence.
*/

#include <stdio.h>

void main ()
{
m1:a=fgetc(file1);
b=fgetc(file2);
if(a=='N')a=b;
fputc(a,file3);
if(feof(file1)==0)goto m1;
}

???

Thanks again for your help!
loba17 is offline   Reply With Quote
Old 12-17-2012, 04:01 AM   #9
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

/* I'm not sure about Linux, (ascii 10 for end of line rather than
ascii 13 and ascii 10 for windows
it prints the result, redirect to a file via ">" - piping :
loba oldfile.fa ref.fa > newfile.fa

fasta-file with one sequence only, same length and lines and aligned to the reference
should it also handle multi-sequence-fasta-files ?

*/


#include <stdio.h>
int a,b;
FILE *file1,*file2;

int main(int argc,char*argv[]){
if(argc<3){printf("\nusage:loba file1 file2\n\n");
printf("replaces N in file1 by what file2 has at that position\n");
exit(1);}


if((file1=fopen(argv[1],"rb"))==NULL){printf("\ncan't open file %s\n",argv[1]);exit(1);}
if((file2=fopen(argv[2],"rb"))==NULL){printf("\ncan't open file %s\n",argv[2]);exit(1);}

a=0;while(a!=10){a=fgetc(file1);printf("%c",a);} /* first line printed */
a=0;while(a!=10)a=fgetc(file2); /* first line ignored (fasta-format ?)*/

m1:;
a=fgetc(file1);
b=fgetc(file2);
if(a=='N')a=b;
printf("%c",a);
if(feof(file1)==0)goto m1;
fclose(file1);fclose(file2);

}

Last edited by gsgs; 12-17-2012 at 04:14 AM.
gsgs is offline   Reply With Quote
Old 12-17-2012, 04:49 AM   #10
loba17
Member
 
Location: Switzerland

Join Date: Sep 2011
Posts: 19
Default

it works - thanks !

the compiler finishes with warning but no errors (see below) - I guess that is ok?
I tested it and it works now I have to test it with a big file !!

Thank you very much for your help !

merger.c: In function ‘main’:
merger.c:21: warning: incompatible implicit declaration of built-in function ‘exit’
merger.c:23: warning: incompatible implicit declaration of built-in function ‘exit’
merger.c:24: warning: incompatible implicit declaration of built-in function ‘exit’
loba17 is offline   Reply With Quote
Old 12-17-2012, 04:59 AM   #11
gsgs
Senior Member
 
Location: germany

Join Date: Oct 2009
Posts: 140
Default

if it works, the warnings are no problem
maybe it should be exit() instead of exit(1) for your compiler
gsgs is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO