Fun with the Vienna RNAfold and ffmpeg programs ...
Human Non Protein Coding RNA Folding - A Video on Youtube
Human Non Protein Coding RNA Folding - Part 1 http://www.youtube.com/watch?v=EoKMjZaGCdg
Human Non Protein Coding RNA Folding - Part 2 http://youtu.be/9Tgfz90RZ-8
Human Non Protein Coding RNA Folding - Part 3 http://youtu.be/NLC4s3XFBS0
Human Non Protein Coding RNA Folding - Part 4 http://youtu.be/Lja214LrK8g
Human Non Protein Coding RNA Folding - Part 5 http://youtu.be/FKCjzrIWEBo
While studying the results of sequencing tumor samples in various cancers, I noticed
1) that there was strange transcription going on in "the middle of nowhere" and
2) that there are often mutations in "long intergenic non-protein coding" RNAs, sometimes called "lincRNAs" or "Long ncRNAs".
This small project deals with trying to figure out what these RNAs do. The initial focus us on lincRNAs. They are a subset of what is labeled as "miscellaneous rnas" (or "misRNAs").
If you find a non-silent mutation in a protein encoding gene you might be able to deduce its impact by noting the gene's function and figuring out it's SIFT or polyphen score. You could fold the resulting protein and eye the damage. You could check what the gene's protein binds to or what network it's in.
With non-coding RNA, your path to enlightenment is not well lit. There are new insights suggesting that some are involved in transcription, post-transcriptional modifications and epigenetic control. But many non coding RNAs are poorly understood. Mystery remains.
Because little is known; I decided to "fold" all the "miscRNAs" in refseq and make them into a movie. These include lincRNAs, pseudogenes, anti-sense RNAs, and many "uncharacterized" RNA transcripts. The resulting video is ordered by these groups and gene name.
The quick idea: use the human brain to see patterns. Are there motifs that can be tied to some meaning? Later, perhaps, I could use image processing techniques to subset the images, extract recurrent patterns and annotate these genes. Maybe an unknown gene might have a pattern that occurs in a better understood gene thus giving us a clue. For now, this is a just a first pass at trying to understand just what these RNAs do. Regardless, I just wanted a broad overview of the landscape ahead.
The data was gathered from genbank flat files and processed using Vienna RNAfold, Gimp, Imagemacick and ffmpeg. All tools are available for Linux systems.
Though the plot is thin, and the characters are one dimesional (well ... actually two dimesional) the videos are here ...
Human Non Protein Coding RNA Folding - Part 1 http://www.youtube.com/watch?v=EoKMjZaGCdg
Human Non Protein Coding RNA Folding - Part 2 http://youtu.be/9Tgfz90RZ-8
Human Non Protein Coding RNA Folding - Part 3 http://youtu.be/NLC4s3XFBS0
Human Non Protein Coding RNA Folding - Part 4 http://youtu.be/Lja214LrK8g
Human Non Protein Coding RNA Folding - Part 5 http://youtu.be/FKCjzrIWEBo
The data for RNA folding of all refseq "miscRNAs" non-coding RNAs is from January 2013. There are 3996 RNAs in the series of videos.
I hope there's some meaning to get from viewing all the foldings. I welcome any comments, even "dumb" ones. Sometimes they're not so dumb.
Human Non Protein Coding RNA Folding - A Video on Youtube
Human Non Protein Coding RNA Folding - Part 1 http://www.youtube.com/watch?v=EoKMjZaGCdg
Human Non Protein Coding RNA Folding - Part 2 http://youtu.be/9Tgfz90RZ-8
Human Non Protein Coding RNA Folding - Part 3 http://youtu.be/NLC4s3XFBS0
Human Non Protein Coding RNA Folding - Part 4 http://youtu.be/Lja214LrK8g
Human Non Protein Coding RNA Folding - Part 5 http://youtu.be/FKCjzrIWEBo
While studying the results of sequencing tumor samples in various cancers, I noticed
1) that there was strange transcription going on in "the middle of nowhere" and
2) that there are often mutations in "long intergenic non-protein coding" RNAs, sometimes called "lincRNAs" or "Long ncRNAs".
This small project deals with trying to figure out what these RNAs do. The initial focus us on lincRNAs. They are a subset of what is labeled as "miscellaneous rnas" (or "misRNAs").
If you find a non-silent mutation in a protein encoding gene you might be able to deduce its impact by noting the gene's function and figuring out it's SIFT or polyphen score. You could fold the resulting protein and eye the damage. You could check what the gene's protein binds to or what network it's in.
With non-coding RNA, your path to enlightenment is not well lit. There are new insights suggesting that some are involved in transcription, post-transcriptional modifications and epigenetic control. But many non coding RNAs are poorly understood. Mystery remains.
Because little is known; I decided to "fold" all the "miscRNAs" in refseq and make them into a movie. These include lincRNAs, pseudogenes, anti-sense RNAs, and many "uncharacterized" RNA transcripts. The resulting video is ordered by these groups and gene name.
The quick idea: use the human brain to see patterns. Are there motifs that can be tied to some meaning? Later, perhaps, I could use image processing techniques to subset the images, extract recurrent patterns and annotate these genes. Maybe an unknown gene might have a pattern that occurs in a better understood gene thus giving us a clue. For now, this is a just a first pass at trying to understand just what these RNAs do. Regardless, I just wanted a broad overview of the landscape ahead.
The data was gathered from genbank flat files and processed using Vienna RNAfold, Gimp, Imagemacick and ffmpeg. All tools are available for Linux systems.
Though the plot is thin, and the characters are one dimesional (well ... actually two dimesional) the videos are here ...
Human Non Protein Coding RNA Folding - Part 1 http://www.youtube.com/watch?v=EoKMjZaGCdg
Human Non Protein Coding RNA Folding - Part 2 http://youtu.be/9Tgfz90RZ-8
Human Non Protein Coding RNA Folding - Part 3 http://youtu.be/NLC4s3XFBS0
Human Non Protein Coding RNA Folding - Part 4 http://youtu.be/Lja214LrK8g
Human Non Protein Coding RNA Folding - Part 5 http://youtu.be/FKCjzrIWEBo
The data for RNA folding of all refseq "miscRNAs" non-coding RNAs is from January 2013. There are 3996 RNAs in the series of videos.
I hope there's some meaning to get from viewing all the foldings. I welcome any comments, even "dumb" ones. Sometimes they're not so dumb.