SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to create an output file using perl vineetha Bioinformatics 3 07-01-2015 02:51 PM
How to implement Clustalw in Perl Program rajcdfd Bioinformatics 2 06-05-2014 02:30 AM
perl semna Bioinformatics 6 09-16-2011 12:16 AM
perl? semna Bioinformatics 1 07-27-2011 07:05 AM
HOW to create image like this? zcrself Illumina/Solexa 1 09-21-2010 04:38 AM

Reply
 
Thread Tools
Old 10-01-2015, 02:08 AM   #1
vineetha
Junior Member
 
Location: India

Join Date: Jun 2015
Posts: 2
Default Create perl program

Hi,

I have two sequence file.I want only the mathching sequences which is present on both files and save that output in third file...Can you please help me for creating perl program for this.??

FILE1:

>Contig1
TTCAAAAACTCATATGGGTGGTACAATGCGTCTTGGATCTAGGAGAACATATTTTCAAGTTGCAGATTGTAAATCTGCAAAATTATATGGTAACCAGAGCTTTGTAGATGAGAGGCATCGACACAGATATGAGGTGAACCCCGACATGGTGCAGC

>Contig2
GACTTGAAGATGCTGGTCTTTCTTTCACTGGCAAAGATGAAAGTGGTCATCGCATGGAGATTGTTGAGCTGCCGAGTCATCCTTACTTCATCGGAGTTCAATTTCATCCAGAATTTAAATCAAGGCCAGGAACCCCTTCAGCCCTGTTT

>Contig3
CTAGGACTTATAGCCGCAGCAACTGGGCAACTTGAAACTCTCTTGAAGAAGGGTGTTCCCAAAACATGGGGGTTGAGCAATGGTACGTCAGGACTAAAATCACATCGATATGTAAATGGGACAAAACTGTTTAATGGATCATTAGATG

>Contig4
GCATTTATTGCAATGGGAATGGTATACATGTTTAAAGGAAACAGTAACATATGTTGTGGGCGCTTGGCCCCGGATTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAG

>Contig5
CCCCCCCTTATTTGTCGTTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAG

FILE2:

>Contig1
TTCAAAAACTCATATGGGTGGTACAATGCGTCTTGGATCTAGGAGAACATATTTTCAAGTTGCAGATTGTAAATCTGCAAAATTATATGGTAACCAGAGCTTTGTAGATGAGAGGCATCGACACAGATATGAGGTGAACCCCGACATGGTGCAGC

>Contig3

CTAGGACTTATAGCCGCAGCAACTGGGCAACTTGAAACTCTCTTGAAGAAGGGTGTTCCCAAAACATGGGGGTTGAGCAATGGTACGTCAGGACTAAAATCACATCGATATGTAAATGGGACAAAACTGTTTAATGGATCATTAGATG

>Contig4
GCATTTATTGCAATGGGAATGGTATACATGTTTAAAGGAAACAGTAACATATGTTGTGGGCGCTTGGCCCCGGATTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAG

>Contig6
GCATTTATTGCAATGTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAGCCAGAGCTTTGTAGATGAGAGGCATGGTACAATGCGTCTTG

EXPECTED OUTPUT:

>Contig1
TTCAAAAACTCATATGGGTGGTACAATGCGTCTTGGATCTAGGAGAACATATTTTCAAGTTGCAGATTGTAAATCTGCAAAATTATATGGTAACCAGAGCTTTGTAGATGAGAGGCATCGACACAGATATGAGGTGAACCCCGACATGGTGCAGC

>Contig3
CTAGGACTTATAGCCGCAGCAACTGGGCAACTTGAAACTCTCTTGAAGAAGGGTGTTCCCAAAACATGGGGGTTGAGCAATGGTACGTCAGGACTAAAATCACATCGATATGTAAATGGGACAAAACTGTTTAATGGATCATTAGATG

>Contig4
GCATTTATTGCAATGGGAATGGTATACATGTTTAAAGGAAACAGTAACATATGTTGTGGGCGCTTGGCCCCGGATTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAG
vineetha is offline   Reply With Quote
Old 10-01-2015, 04:30 AM   #2
lindenb
Senior Member
 
Location: France

Join Date: Apr 2010
Posts: 143
Default

cross-posted : https://www.biostars.org/p/160027/
lindenb is offline   Reply With Quote
Old 10-01-2015, 04:33 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,976
Default

@vineetha: Are you looking to actually match the sequences or you just want to find matching headers? If a header matches then is the sequence always identical in both files?
GenoMax is offline   Reply With Quote
Old 10-01-2015, 12:03 PM   #4
cmccabe
Senior Member
 
Location: chicago

Join Date: Jul 2012
Posts: 353
Default

@vineetha: Can awk be used to do this? A one-liner like below will work.

Code:
awk 'NR==FNR{a[$0];next}$0 in a{print $0}' file1.txt file2.txt
>Contig1
TTCAAAAACTCATATGGGTGGTACAATGCGTCTTGGATCTAGGAGAACATATTTTCAAGTTGCAGATTGTAAATCTGCAAAATTATATGGTAACCAGAGCTTTGTAGATGAGAGGCATCGACACAGATATGAGGTGAACCCCGACATGGTGCAGC

>Contig3
CTAGGACTTATAGCCGCAGCAACTGGGCAACTTGAAACTCTCTTGAAGAAGGGTGTTCCCAAAACATGGGGGTTGAGCAATGGTACGTCAGGACTAAAATCACATCGATATGTAAATGGGACAAAACTGTTTAATGGATCATTAGATG

>Contig4
GCATTTATTGCAATGGGAATGGTATACATGTTTAAAGGAAACAGTAACATATGTTGTGGGCGCTTGGCCCCGGATTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAG
The perl equivalent would be something like:

Code:
perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1.txt file2.txt
>Contig1
TTCAAAAACTCATATGGGTGGTACAATGCGTCTTGGATCTAGGAGAACATATTTTCAAGTTGCAGATTGTAAATCTGCAAAATTATATGGTAACCAGAGCTTTGTAGATGAGAGGCATCGACACAGATATGAGGTGAACCCCGACATGGTGCAGC

>Contig3
CTAGGACTTATAGCCGCAGCAACTGGGCAACTTGAAACTCTCTTGAAGAAGGGTGTTCCCAAAACATGGGGGTTGAGCAATGGTACGTCAGGACTAAAATCACATCGATATGTAAATGGGACAAAACTGTTTAATGGATCATTAGATG
>Contig4
GCATTTATTGCAATGGGAATGGTATACATGTTTAAAGGAAACAGTAACATATGTTGTGGGCGCTTGGCCCCGGATTTTTGATAATCAAATTTTGCTACTGCATTTTTTTTAAAG

Last edited by cmccabe; 10-01-2015 at 12:59 PM. Reason: added perl
cmccabe is offline   Reply With Quote
Reply

Tags
bioinformatics sequencing, perl script

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO