SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Entrez esearch.fcgi large set of sequences download: fluctuating number of sequences Toliman Bioinformatics 2 06-21-2018 07:21 AM
Finding repetitive elements from BAC sequences and Illumina sequences. int11ap1 De novo discovery 0 11-03-2014 08:23 AM
Create a data base of bacterial 16S sequences jocke_tuffing Bioinformatics 3 09-22-2014 04:41 AM
Need advice: want to create a gtf file of transcript sequences JonB RNA Sequencing 0 08-29-2013 12:27 PM
Create one sequence based on overlapping primer sequences in amplicon ketan_bnf Bioinformatics 2 09-15-2011 01:33 AM

Reply
 
Thread Tools
Old 08-14-2018, 01:20 AM   #1
giampe
Member
 
Location: Bari, Italy

Join Date: Aug 2009
Posts: 22
Default create concatenamer of sequences

Dear all
I was wondering if anyone could help me in obtaining a concatenamer of sequences in the way showed below.
I have several multifasta files relative a genes sequences (ABC, GHJ…) in different organisms (>182680572, >749299147…)

Gene ABC
>182680572
ATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATA
>749299147
ATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCA
>584117620
ATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCTCACGACCTAA
>985743106
ATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTCCCGCCAGAAGATGTTCATACTTGGTTAAGACCTTTACAAGCCGACCAACGTGGTGACAGTGTCGTCCTTTACGCACCGAATCCCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGACGTCTTCGGGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGG



GENE GHJ

>182680572
ATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATA
>749299147
ATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCA
>584117620
ATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCTCACGACCTAA
>985743106
ATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTCCCGCCAGAAGATGTTCATACTTGGTTAAGACCTTTACAAGCCGACCAACGTGGTGACAGTGTCGTCCTTTACGCACCGAATCCCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGACGTCTTCGGGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGG

Then I want to obtain for each organism a concatened sequence of the genes in the same order for each organisms like below:

>182680572
ATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATAATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATA
>749299147
ATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCAATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCA
>584117620
ATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCTCACGACCTAAATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTTCCACCAGAAGATGTTCATACTTGGTTGAGACCTTTACAAGCTGACCAACGCGGTGACAGTGTCATCCTTTACGCACCCAATACCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGGCGTCTTCGAGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGCTCACGACCTAA
>985743106
ATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTCCCGCCAGAAGATGTTCATACTTGGTTAAGACCTTTACAAGCCGACCAACGTGGTGACAGTGTCGTCCTTTACGCACCGAATCCCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGACGTCTTCGGGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGGATGACAACATTGATGGAATCTTGGTCCCGTTGCCTGGAACGTCTTGAAACTGAATTCCCGCCAGAAGATGTTCATACTTGGTTAAGACCTTTACAAGCCGACCAACGTGGTGACAGTGTCGTCCTTTACGCACCGAATCCCTTTATCATTGAACTAGTAGAAGAGCGATACTTAGGACGTCTTCGGGAATTGTTATCCTATTTTTCAGGAATACGTGAAGTAGTCCTTGCAATTGG

Does anyone knows how to do it with a perl/python script or bioinformatic software?
giampe is offline   Reply With Quote
Old 08-14-2018, 11:13 PM   #2
fec2
Junior Member
 
Location: Malaysia

Join Date: Aug 2018
Posts: 1
Default

Hi,

I have similar situation, but my ID for each file is different:

File A:
>My_bacteriaA_GeneA
atgatg

>My_bacteriaB_GeneA
atgaag

>My_bacteriaC_GeneA
atgatg


File B
>My_bacteriaB_GeneB
atggtc

>My_bacteriaC_GeneB
atggtc

>My_bacteriaA_GeneB
atggta

.

.

.
File Z
>My_bacteriaA_GeneZ
atggta

>My_bacteriaC_GeneZ
atggta

>My_bacteriaB_GeneZ
atggtg

I wish to have a concatenated fasta file that combined every core-genes for each bacteria as below to build a phylogenomic tree:

>My_bacteriaA
atgatg(GeneA)atggta(GeneB)...atggta(GeneZ)

>My_bacteriaB
atgaag(GeneA)atggtc(GeneB)...atggtg(GeneZ)

>My_bacteriaC
atgatg(GeneA)atggtc(GeneB)...atggta(GeneZ)

Please note that the order each fasta file are random, not in a particular order.

Thank you!
fec2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:01 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO