SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Discovering protein–lncRNA interactions from large-scale CLIP-Seq and RNA-Seq data rnaNGS Bioinformatics 0 01-15-2015 02:14 AM
Discovery of protein–lncRNA interactions from large-scale CLIP-Seq and RNA-Seq data yjhua2110 RNA Sequencing 0 01-14-2015 06:18 PM
How to identify antisense in lncRNA? mkmb RNA Sequencing 1 10-10-2014 08:58 AM
How to identify the contamination of the genome from the RNA seq data? tianyub836 Bioinformatics 17 03-03-2013 08:44 PM
RNA-Seq: SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicin Newsbot! Literature Watch 0 08-14-2010 02:00 AM

Reply
 
Thread Tools
Old 04-14-2015, 12:05 AM   #1
xunong
Junior Member
 
Location: china

Join Date: Sep 2014
Posts: 9
Default How to identify known lncRNA from RNA-seq data?

Hi guys,

As I have some RNA-seq data, and have already run the TopHat-cufflinks to get the transcripts.gtf file, then how do I identify the already annotated lncRNAs from the gtf file?
Besides, I've two groups of data, one is control and the other one is gene knock down group, can I perform DE analysis of lncRNAs after identified them?

I'm new to lncRNAs, do u guys have any suggestions?
xunong is offline   Reply With Quote
Old 04-14-2015, 07:25 AM   #2
cascoamarillo
Senior Member
 
Location: MA

Join Date: Oct 2010
Posts: 158
Default

I'm also interested into this kind of lncRNA analysis. Check out this recent paper:
http://biorxiv.org/content/biorxiv/e...17889.full.pdf


Quote:
Originally Posted by xunong View Post
then how do I identify the already annotated lncRNAs from the gtf file?
So, are they already annotated?
cascoamarillo is offline   Reply With Quote
Old 04-14-2015, 05:49 PM   #3
xunong
Junior Member
 
Location: china

Join Date: Sep 2014
Posts: 9
Default

Well, I downloaded a bed file of known LncRNA from Noncode databasehttp://www.bioinfo.org/noncode/index.php, for my gtf file, can I just pick up ones that have overlap with this bed file according to chr start/end site?
xunong is offline   Reply With Quote
Old 11-18-2015, 07:55 AM   #4
kongantik
Junior Member
 
Location: central

Join Date: Jan 2011
Posts: 9
Default

Xunong,

I wrote a pipeline that does what you are looking for.

http://git.io/SaFh1g
kongantik is offline   Reply With Quote
Old 04-16-2017, 01:47 AM   #5
shakilvet
Junior Member
 
Location: srinagar

Join Date: Jan 2017
Posts: 1
Default

is it possible to identify lncRNA from RNA SEQ data. As the data consists of only cDNA molecules which are synthesised from mRNA
shakilvet is offline   Reply With Quote
Old 04-17-2017, 10:16 AM   #6
kongantik
Junior Member
 
Location: central

Join Date: Jan 2011
Posts: 9
Default

Quote:
Originally Posted by shakilvet View Post
is it possible to identify lncRNA from RNA SEQ data. As the data consists of only cDNA molecules which are synthesised from mRNA
If you do Total RNA sequencing at high depth, you will able able to identify ncRNA. If you do mRNA-Seq, then no. Illumina provides kits for Total RNA-seq for Human, Mouse and Rat, wherein you can remove rRNA and then sequence rest of the RNA.
kongantik is offline   Reply With Quote
Old 12-11-2017, 04:27 AM   #7
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Hello,

I'm looking for detecting known and novel lncRNAs from RNA-Seq data. I have used HISAt2 for alignment and Stringtie for Transcript assembly as mentioned in this paper [https://www.nature.com/articles/nprot.2016.095]. After I have a stringtie_merged.gtf from Stringtie (Merging transcripts from all samples) I used gffcompare (same like Cuffcompare from Tuxedo protocol) to examine how the transcripts compare with the reference annotation. This gives me annotated protein coding and already known lncRNAs.

My question is can I apply your pipeline lncRNAPipe to detect novel lncRNAs?

Thank you
bvk is offline   Reply With Quote
Old 12-11-2017, 10:34 AM   #8
kongantik
Junior Member
 
Location: central

Join Date: Jan 2011
Posts: 9
Default

From what I see, gffcompare produces new class codes for known ncRNA. The pipeline is flexible, so you can manually run rest of the modules individually.

1. Make a new output directory where you want lncRNApipe output to be stored. Let's call it lncRNApipe_12_11_2017_run1 for this example.
2. Make a new directory with name exactly as cuffcompare inside lncRNApipe_12_11_2017_run1 directory.
3. Now, move all the files produced by gffcompare to lncRNApipe_12_11_2017_run1/cuffcompare. The important file lncRNApipe looks for is the *.tracking file. Make sure that basename of the files produced by gffcompare is lncRNApipe_cuffcmp. For sake of sanity, you can recreate those files with command gffcompare -o /your/complete/path/to/lncRNApipe_12_11_2017_run1/cuffcompare/lncRNApipe_cuffcmp ....... command.
4. Now run the lncRNApipe instructing it to not identify known ncRNAs and also any class codes you want to be subjected to novel lncRNA discovery. For example, by default, transcripts with class codes i, o, u, x and e are considered for lncRNA discovery. You can mention any other class codes you want or remove them (see below how to mention class codes)


If you have any installed lncRNApipe successfully, the following command should work (Do not use params.yaml here as we are running manually). Please look at all options of each of the modules to adjust per your experiment (ex: for categorize module, --min-exons -len -max-len --linc-rna-prox etc...). To get those options, do lncRNApipe --h cat, lncRNApipe --h fetch etc...

lncRNApipe --cpu 12 --run /your/complete/path/to/lncRNApipe_12_11_2017_run1 --cat-ncRNAs "--extract-pat 'i|o|u|x|e' --ignore-genePred-err -sample-names 'your_sample_name'" --fetch "-local /your/full/path/to/reference/genome/fasta/you/used/for/transcript/assembly.fa" --cpc --rna --inf --cov-inf 0 &> lncRNApipe.`date +"%m_%d_%y_%H-%M-%S"`.log

If you encounter any error, send me the log file:

lncRNApipe --send-run-report '-email [email protected] -log lncRNApipe.log -m "What error you encountered"'

Best.

Last edited by kongantik; 12-11-2017 at 10:37 AM.
kongantik is offline   Reply With Quote
Old 12-28-2017, 02:07 AM   #9
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Hello,

Need your help

The output from Stringtie-merge is "stringtie_merged.gtf" (6 samples). As you said I ran gffcompare command on output given by stringtie-merge.

gffcompare -r /path/annotation_data/b37/Homo_sapiens.GRCh37.82.chr_patch_hapl_scaff.gtf -G -o lncRNApipe_cuffcmp /path/stringtie_merged.gtf"

The above command gave me these files - lncRNApipe_cuffcmp.annotated.gtf, lncRNApipe_cuffcmp.loci, lncRNApipe_cuffcmp.stats, lncRNApipe_cuffcmp.tracking

Now I need both known lncRNAs and also novel lncRNAs.

In the command given by you in the previous comment there is an option
"-local /your/full/path/to/reference/genome/fasta/you/used/for/transcript/assembly.fa"

I don't have any .fa fasta file.

What Is should do now
bvk is offline   Reply With Quote
Old 12-28-2017, 04:39 AM   #10
kongantik
Junior Member
 
Location: central

Join Date: Jan 2011
Posts: 9
Default

Do you have the genome file in FASTA format for Homo Sapiens? Use that file here with -local option.

Look for the genome FASTA file in /path/annotation_data/b37

gffcompare already gives known lncRNAs. Run the command with the genome FASTA
kongantik is offline   Reply With Quote
Old 12-28-2017, 04:45 AM   #11
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

No, I dont have any genome file in fasta format. The reference genome I used for Hisat is from [https://ccb.jhu.edu/software/hisat2/manual.shtml] H.Sapiens GRCh37 (genome_snp_tran).

What I should do if I dont have genome file in fasta format?

And one more question - gffcompare gives known lncRNAs right. In which file I have to look?

Last edited by bvk; 12-28-2017 at 04:51 AM.
bvk is offline   Reply With Quote
Old 12-28-2017, 05:18 AM   #12
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Quote:
Originally Posted by kongantik View Post
Do you have the genome file in FASTA format for Homo Sapiens? Use that file here with -local option.

Look for the genome FASTA file in /path/annotation_data/b37

gffcompare already gives known lncRNAs. Run the command with the genome FASTA
Hello Karthik,

Could you please tell me what I should do if I dont have .fa reference file. As I see in the hisat2 manual that there wont be any .fa files. Thank you
bvk is offline   Reply With Quote
Old 12-28-2017, 06:34 AM   #13
kongantik
Junior Member
 
Location: central

Join Date: Jan 2011
Posts: 9
Default

From what I see in the build script, you can download the genome for GRCh37 from ftp://ftp.ensembl.org/pub/release-75...assembly.fa.gz

and then decompress using gunzip command and then you can provide that as FASTA

Let me know if that does not work.
kongantik is offline   Reply With Quote
Old 12-28-2017, 06:39 AM   #14
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Yes, I just got that from Ensembl site. And running the commnad with that file. I will tell you if there are any errors.

BTW, could you please tell in which file output file from gffcompare I can find known lncRNAs?

Thank you
bvk is offline   Reply With Quote
Old 12-28-2017, 07:27 AM   #15
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Quote:
Originally Posted by kongantik View Post
From what I see in the build script, you can download the genome for GRCh37 from ftp://ftp.ensembl.org/pub/release-75...assembly.fa.gz

and then decompress using gunzip command and then you can provide that as FASTA

Let me know if that does not work.
Like you said before I gave the command. And I have an error now.

perl lncRNApipe --cpu 12 --run /path/lncRNApipe_28_12_2017_run1 --cat-ncRNAs "--extract-pat 'i|o|u|x|e' --ignore-genePred-err -sample-names 'STB133, STB236, STB34, STB36, STB65, STB79'" --fetch "-local /path/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa" --cpc --rna --inf --cov-inf 0 &> lncRNApipe.`date +"%m_%d_%y_%H-%M-%S"`.log

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Thu Dec 28 17:18:29 2017 Validating options...

Thu Dec 28 17:18:29 2017 Starting <E2><98><B2><E2><98><B4> lncRNApipe Pipeline...

Thu Dec 28 17:18:30 2017 ########################### Module 2: Running categorize_ncRNAs.pl ###################################

Making output directory for categorize_ncRNAs.pl [ /path/lncRNApipe_28_12_2017_run1/categorize_ncRNAs ]


Command call:
-------------
/path/categorize_ncRNAs.pl -cpu 12 -cuffcmp /path/lncRNApip
e_28_12_2017_run1/cuffcompare/lncRNApipe_cuffcmp.tracking -out /path/lncRNApipe_28_12_2017_run1/categoriz
e_ncRNAs -bin /path/.lncRNApipe.depbin/linux_gtfToGenePred --extract-pat 'i|o|u|x|e' --ignore-genePred-err -s
ample-names 'STB133, STB236, STB34, STB36, STB65, STB79'


perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

See /path/categorize_ncRNAs.pl -h for options.

At last this is what I see

AUTHOR
Kranti Konganti, <[email protected]>.

COPYRIGHT
This program is distributed under the Artistic License.

DATE
Jan-26-2016



Thu Dec 28 17:55:10 2017 ☲☴ lncRNApipe Pipeline aborted(?)

Last edited by bvk; 12-28-2017 at 08:02 AM.
bvk is offline   Reply With Quote
Old 12-29-2017, 02:55 AM   #16
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Hello Kranti,

May I know the answer for the above comment.
bvk is offline   Reply With Quote
Old 12-30-2017, 11:30 PM   #17
bvk
Member
 
Location: czech

Join Date: May 2015
Posts: 65
Default

Quote:
Originally Posted by kongantik View Post
From what I see in the build script, you can download the genome for GRCh37 from ftp://ftp.ensembl.org/pub/release-75...assembly.fa.gz

and then decompress using gunzip command and then you can provide that as FASTA

Let me know if that does not work.
Hello Kranti,

Please check the above comment. That is what I got when I used the command you gave.
bvk is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:00 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO