SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: wapRNA: a web-based application for the processing of RNA sequences. Newsbot! Literature Watch 0 09-08-2011 02:00 AM
repeat sequences/large files in galaxy Giles Bioinformatics 2 06-27-2011 11:08 AM
Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequenci krobison Literature Watch 0 04-16-2010 07:35 AM
PubMed: Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep- Newsbot! Literature Watch 0 04-14-2010 02:01 AM
PubMed: Detection of large numbers of novel sequences in the metatranscriptomes of co Newsbot! Literature Watch 0 08-30-2008 05:06 AM

Reply
 
Thread Tools
Old 07-28-2010, 07:12 AM   #1
perencia
Junior Member
 
Location: Spain

Join Date: Jun 2010
Posts: 6
Default Large RNA sequences ? Does it has any sense ?

Hi!

First, i'm a computer scientist recently exploring bioinformatics field, so please forgive me if i say something really stupid

Basically i'm studying the possibility of implementing Nussinov-Jacobsen algorithm on GPU's, accelerating, if possible, time performance in orders of magnitude; but to accomplish that, the RNA sequence has to be very large. I was wondering if it has some sense since i've seen most RNA seqs are about 200 bases.

Thanks!
perencia is offline   Reply With Quote
Old 07-28-2010, 07:34 AM   #2
raela
Member
 
Location: Ithaca, NY

Join Date: Apr 2010
Posts: 39
Default

How long do you mean by 'very large'? It depends on the sequencing technology used and the length ordered. Even 200 is somewhat in the 'long' range for NGS (I believe).
raela is offline   Reply With Quote
Old 07-28-2010, 07:47 AM   #3
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

What kind of RNA do you mean? There are entire RNA genomes of bacteria. And normal mRNAs are some 100s to 1000s nucleotides long.
By quickly googling Nussinov-Jacobsen I learned that you can do RNA folding prediction with it. That only makes sense for small RNAs.
epigen is offline   Reply With Quote
Old 07-28-2010, 07:52 AM   #4
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

454 has 400+ base reads & PacBio is promising reads that long or much longer.

Folding of longer RNAs could be interesting, as secondary structure is sometimes involved in the stability, localization or utilization of an RNA.

It's a niche, but that doesn't mean it isn't interesting.
krobison is offline   Reply With Quote
Old 07-28-2010, 09:17 AM   #5
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

Many RNAs are long, but the current sequencing technologies fragment them prior to sequencing, since they perform better on shorter sequences. SOLiD works up to 50 bp, Illumina works up to about 100 bp, and 454 can get a few hundred bp. If you want something larger you'll have to piece together multiple reads into a longer consensus sequence.
It seems to me, though, that if you need a longer consensus sequence you could just use the complement of the genomic sequence (which is the RNA sequence) for some interesting genes. If your goal is to demonstrate an algorithmic speedup using a GPU-based approach it seems that it wouldn't be important to have cutting-edge RNA data, but it would be better to use a well studied RNA (like ribosomal RNA or tRNA) for your comparison.
mrawlins is offline   Reply With Quote
Old 07-28-2010, 01:06 PM   #6
perencia
Junior Member
 
Location: Spain

Join Date: Jun 2010
Posts: 6
Default

Quote:
Originally Posted by mrawlins View Post
Many RNAs are long, but the current sequencing technologies fragment them prior to sequencing, since they perform better on shorter sequences. SOLiD works up to 50 bp, Illumina works up to about 100 bp, and 454 can get a few hundred bp. If you want something larger you'll have to piece together multiple reads into a longer consensus sequence.
It seems to me, though, that if you need a longer consensus sequence you could just use the complement of the genomic sequence (which is the RNA sequence) for some interesting genes. If your goal is to demonstrate an algorithmic speedup using a GPU-based approach it seems that it wouldn't be important to have cutting-edge RNA data, but it would be better to use a well studied RNA (like ribosomal RNA or tRNA) for your comparison.
Ok.

And how many nucleotides can have that ribosomal or tRNA ?
perencia is offline   Reply With Quote
Old 07-28-2010, 01:31 PM   #7
mrawlins
Member
 
Location: Retirement - Not working with bioinformatics anymore.

Join Date: Apr 2010
Posts: 63
Default

In Shewanella the longest ribosomal sequence is about 2900 bases long. Human ribosome sequences may be a bit longer. The tRNAs (in Shewanella) are about 76 bases long. I seem to recall that tRNAs have some of the best documented secondary structure, though, so while they may not make a good test of your algorithm's speed, they might be a good test of accuracy.
mrawlins is offline   Reply With Quote
Old 07-29-2010, 03:50 AM   #8
perencia
Junior Member
 
Location: Spain

Join Date: Jun 2010
Posts: 6
Default

Quote:
Originally Posted by mrawlins View Post
In Shewanella the longest ribosomal sequence is about 2900 bases long. Human ribosome sequences may be a bit longer. The tRNAs (in Shewanella) are about 76 bases long. I seem to recall that tRNAs have some of the best documented secondary structure, though, so while they may not make a good test of your algorithm's speed, they might be a good test of accuracy.
Thanks!

I wonder, what about the searching for structures on an entire genome ?
perencia is offline   Reply With Quote
Old 07-29-2010, 05:16 AM   #9
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by perencia View Post
I wonder, what about the searching for structures on an entire genome ?
Only some viruses have an RNA genome (most organisms use DNA), and their genomes tend not to be very big (not big enough to worry about GPU optimisations I would guess).
maubp is offline   Reply With Quote
Old 07-29-2010, 05:43 AM   #10
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Have you had a chance to take a look at Rfam and Sean Eddy's Infernal?
Bruins is offline   Reply With Quote
Old 07-29-2010, 07:05 AM   #11
perencia
Junior Member
 
Location: Spain

Join Date: Jun 2010
Posts: 6
Default

Quote:
Originally Posted by Bruins View Post
Have you had a chance to take a look at Rfam and Sean Eddy's Infernal?
No, but i'll look for them now

I've searching a little more, and found that report

http://hal.archives-ouvertes.fr/docs...ufold_ICCS.pdf

and that implementation

http://www.cc.gatech.edu/~bader/papers/GTfold.html

Former is a GPU implementation of the Unafold Algorithm
( http://mfold.bioinfo.rpi.edu/ )

It seems that a GPU optimisation may take room on a multiple RNA structure prediction, as in the first report ( a set of 11 Picor-
naviral sequences (7124 to 8214 nucleotides)).
I'll post what i find

Anyway, i'd like to known which are the benefits from such procedures, where do they impact. Bioinformatics is a large field i guess .

Last edited by perencia; 07-29-2010 at 07:12 AM.
perencia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO