SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem doing string search in Tablet salmonella Bioinformatics 2 06-26-2011 10:03 AM
About Mosaik Assembler Consensus String Iris Cristata Bioinformatics 3 07-05-2010 10:24 AM
Help please, regulatory elements johnshembb Bioinformatics 1 04-29-2010 10:57 AM
generate CIGAR string from 2 sequences? bbimber Bioinformatics 0 03-20-2010 10:44 AM
Tag (string) Compression Techniques foolishbrat Bioinformatics 2 01-09-2009 09:32 PM

Reply
 
Thread Tools
Old 11-26-2011, 05:02 PM   #1
Pradhaun
Member
 
Location: US

Join Date: Nov 2011
Posts: 16
Default Perl - how to get the last few elements in a string

Hi folks,

Could any one please tell me how to get the last few elements from a string using perl?

for eg, $string= ATTGGCTACC;

if I want to print the last 3 elements in the above string, how can I script it?


Thank you,

Pradhaun.
Pradhaun is offline   Reply With Quote
Old 11-26-2011, 05:43 PM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Why don't you try this: http://lmgtfy.com/?q=perl+substring
kmcarr is offline   Reply With Quote
Old 11-27-2011, 03:47 AM   #3
heiya
Member
 
Location: China

Join Date: Nov 2011
Posts: 14
Default

$substring=substr($string,7,3);
or $substring=substr($string,-3,3);
heiya is offline   Reply With Quote
Old 11-27-2011, 04:34 AM   #4
Pradhaun
Member
 
Location: US

Join Date: Nov 2011
Posts: 16
Default

thank you heiya!! It worked out!!
Thank you so much!!
Pradhaun is offline   Reply With Quote
Old 11-28-2011, 04:42 AM   #5
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

The substr always seems alien to me, whereas simple regexes seem natural.

my ($substring) =$string=~ /(...)$/;

speaks to me "capture the last 3 characters before the end of the string and assign them to $substring. (Okay the parentheses around $substring, denoting that the characters themselves should be assigned to $substring, not just whether the test succeeded or not -- that is not very intuitive.)

But if I want to only capture the last three bases if they are sequence characters, only a minor change is required:

my ($substring)=$string=~/([ACGTacgtNn]{3})$/;

But I have heard that substr is faster.

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-29-2011, 10:16 AM   #6
Pradhaun
Member
 
Location: US

Join Date: Nov 2011
Posts: 16
Default

Thank you pmiguel! I tried your suggestion. It worked well! Thank you so much!

-pradhaun
Pradhaun is offline   Reply With Quote
Old 11-29-2011, 10:54 AM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Hi Pradhaun,

Welcome to the world of regular expressions!
Although I should caution you about the old perl joke:

Say you have a problem and think "I will use regular expressions to solve that". Now, you have two problems...

Also Rick though I should have shortened the line to:

my ($substring)=$string=~/([ACGTN]{3})$/i;

--
Phillip
pmiguel is offline   Reply With Quote
Old 11-30-2012, 12:15 PM   #8
Pradhaun
Member
 
Location: US

Join Date: Nov 2011
Posts: 16
Default How to estimate the total copy number of genes from FPKM values?

Hello all,

I am using Tophat and cufflinks to estimate the copy number of genes in an organism by comparing to its closely related organism genes. I got a bam and gtf file as a outputs from Tophat and Cufflinks . I tried to view the alignment (Tophat output) file as well as the gtf file in IGV. I can see the FPKM value for every single gene but My question is, how can I know whether these two organisms are closely related? Like for eg., "There are x number of same set of genes present in both the organisms" Or is there any way to know the overall FPKM or copy number?

I am new to this kind of study, am not sure whether my thinking can be applicable or not. So I would appreciate if you can please provide your suggestions...

Thank you,
Pradhaun
Pradhaun is offline   Reply With Quote
Old 11-30-2012, 12:19 PM   #9
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Quote:
Originally Posted by pmiguel View Post
The substr always seems alien to me, whereas simple regexes seem natural.

my ($substring) =$string=~ /(...)$/;

speaks to me "capture the last 3 characters before the end of the string and assign them to $substring. (Okay the parentheses around $substring, denoting that the characters themselves should be assigned to $substring, not just whether the test succeeded or not -- that is not very intuitive.)

But if I want to only capture the last three bases if they are sequence characters, only a minor change is required:

my ($substring)=$string=~/([ACGTacgtNn]{3})$/;

But I have heard that substr is faster.

--
Phillip
I also think substring will work on really huge strings (like a chromosome length sequence), regex not so much, in my experience.
swbarnes2 is offline   Reply With Quote
Old 12-03-2012, 08:43 AM   #10
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by Pradhaun View Post
Hello all,

I am using Tophat and cufflinks to estimate the copy number of genes in an organism by comparing to its closely related organism genes. I got a bam and gtf file as a outputs from Tophat and Cufflinks . I tried to view the alignment (Tophat output) file as well as the gtf file in IGV. I can see the FPKM value for every single gene but My question is, how can I know whether these two organisms are closely related? Like for eg., "There are x number of same set of genes present in both the organisms" Or is there any way to know the overall FPKM or copy number?

I am new to this kind of study, am not sure whether my thinking can be applicable or not. So I would appreciate if you can please provide your suggestions...

Thank you,
Pradhaun
Maybe if you put your question into a new thread instead of an old thread about Perl programming then you will receive some meaningful comments. It would be interesting to see what other people have to say. My feeling is that there is too much variation in normal transcriptome studies to give a feel for relatedness. Not to mention the entirely fuzzy definition of 'related'.
westerman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO