Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify known lncRNA from RNA-seq data?

    Hi guys,

    As I have some RNA-seq data, and have already run the TopHat-cufflinks to get the transcripts.gtf file, then how do I identify the already annotated lncRNAs from the gtf file?
    Besides, I've two groups of data, one is control and the other one is gene knock down group, can I perform DE analysis of lncRNAs after identified them?

    I'm new to lncRNAs, do u guys have any suggestions?

  • #2
    I'm also interested into this kind of lncRNA analysis. Check out this recent paper:



    Originally posted by xunong View Post
    then how do I identify the already annotated lncRNAs from the gtf file?
    So, are they already annotated?

    Comment


    • #3
      Well, I downloaded a bed file of known LncRNA from Noncode databasehttp://www.bioinfo.org/noncode/index.php, for my gtf file, can I just pick up ones that have overlap with this bed file according to chr start/end site?

      Comment


      • #4
        Xunong,

        I wrote a pipeline that does what you are looking for.

        Comment


        • #5
          is it possible to identify lncRNA from RNA SEQ data. As the data consists of only cDNA molecules which are synthesised from mRNA

          Comment


          • #6
            Originally posted by shakilvet View Post
            is it possible to identify lncRNA from RNA SEQ data. As the data consists of only cDNA molecules which are synthesised from mRNA
            If you do Total RNA sequencing at high depth, you will able able to identify ncRNA. If you do mRNA-Seq, then no. Illumina provides kits for Total RNA-seq for Human, Mouse and Rat, wherein you can remove rRNA and then sequence rest of the RNA.

            Comment


            • #7
              Hello,

              I'm looking for detecting known and novel lncRNAs from RNA-Seq data. I have used HISAt2 for alignment and Stringtie for Transcript assembly as mentioned in this paper [https://www.nature.com/articles/nprot.2016.095]. After I have a stringtie_merged.gtf from Stringtie (Merging transcripts from all samples) I used gffcompare (same like Cuffcompare from Tuxedo protocol) to examine how the transcripts compare with the reference annotation. This gives me annotated protein coding and already known lncRNAs.

              My question is can I apply your pipeline lncRNAPipe to detect novel lncRNAs?

              Thank you

              Comment


              • #8
                From what I see, gffcompare produces new class codes for known ncRNA. The pipeline is flexible, so you can manually run rest of the modules individually.

                1. Make a new output directory where you want lncRNApipe output to be stored. Let's call it lncRNApipe_12_11_2017_run1 for this example.
                2. Make a new directory with name exactly as cuffcompare inside lncRNApipe_12_11_2017_run1 directory.
                3. Now, move all the files produced by gffcompare to lncRNApipe_12_11_2017_run1/cuffcompare. The important file lncRNApipe looks for is the *.tracking file. Make sure that basename of the files produced by gffcompare is lncRNApipe_cuffcmp. For sake of sanity, you can recreate those files with command gffcompare -o /your/complete/path/to/lncRNApipe_12_11_2017_run1/cuffcompare/lncRNApipe_cuffcmp ....... command.
                4. Now run the lncRNApipe instructing it to not identify known ncRNAs and also any class codes you want to be subjected to novel lncRNA discovery. For example, by default, transcripts with class codes i, o, u, x and e are considered for lncRNA discovery. You can mention any other class codes you want or remove them (see below how to mention class codes)


                If you have any installed lncRNApipe successfully, the following command should work (Do not use params.yaml here as we are running manually). Please look at all options of each of the modules to adjust per your experiment (ex: for categorize module, --min-exons -len -max-len --linc-rna-prox etc...). To get those options, do lncRNApipe --h cat, lncRNApipe --h fetch etc...

                lncRNApipe --cpu 12 --run /your/complete/path/to/lncRNApipe_12_11_2017_run1 --cat-ncRNAs "--extract-pat 'i|o|u|x|e' --ignore-genePred-err -sample-names 'your_sample_name'" --fetch "-local /your/full/path/to/reference/genome/fasta/you/used/for/transcript/assembly.fa" --cpc --rna --inf --cov-inf 0 &> lncRNApipe.`date +"%m_%d_%y_%H-%M-%S"`.log

                If you encounter any error, send me the log file:

                lncRNApipe --send-run-report '-email [email protected] -log lncRNApipe.log -m "What error you encountered"'

                Best.
                Last edited by kongantik; 12-11-2017, 11:37 AM.

                Comment


                • #9
                  Hello,

                  Need your help

                  The output from Stringtie-merge is "stringtie_merged.gtf" (6 samples). As you said I ran gffcompare command on output given by stringtie-merge.

                  gffcompare -r /path/annotation_data/b37/Homo_sapiens.GRCh37.82.chr_patch_hapl_scaff.gtf -G -o lncRNApipe_cuffcmp /path/stringtie_merged.gtf"

                  The above command gave me these files - lncRNApipe_cuffcmp.annotated.gtf, lncRNApipe_cuffcmp.loci, lncRNApipe_cuffcmp.stats, lncRNApipe_cuffcmp.tracking

                  Now I need both known lncRNAs and also novel lncRNAs.

                  In the command given by you in the previous comment there is an option
                  "-local /your/full/path/to/reference/genome/fasta/you/used/for/transcript/assembly.fa"

                  I don't have any .fa fasta file.

                  What Is should do now

                  Comment


                  • #10
                    Do you have the genome file in FASTA format for Homo Sapiens? Use that file here with -local option.

                    Look for the genome FASTA file in /path/annotation_data/b37

                    gffcompare already gives known lncRNAs. Run the command with the genome FASTA

                    Comment


                    • #11
                      No, I dont have any genome file in fasta format. The reference genome I used for Hisat is from [https://ccb.jhu.edu/software/hisat2/manual.shtml] H.Sapiens GRCh37 (genome_snp_tran).

                      What I should do if I dont have genome file in fasta format?

                      And one more question - gffcompare gives known lncRNAs right. In which file I have to look?
                      Last edited by bvk; 12-28-2017, 05:51 AM.

                      Comment


                      • #12
                        Originally posted by kongantik View Post
                        Do you have the genome file in FASTA format for Homo Sapiens? Use that file here with -local option.

                        Look for the genome FASTA file in /path/annotation_data/b37

                        gffcompare already gives known lncRNAs. Run the command with the genome FASTA
                        Hello Karthik,

                        Could you please tell me what I should do if I dont have .fa reference file. As I see in the hisat2 manual that there wont be any .fa files. Thank you

                        Comment


                        • #13
                          From what I see in the build script, you can download the genome for GRCh37 from ftp://ftp.ensembl.org/pub/release-75...assembly.fa.gz

                          and then decompress using gunzip command and then you can provide that as FASTA

                          Let me know if that does not work.

                          Comment


                          • #14
                            Yes, I just got that from Ensembl site. And running the commnad with that file. I will tell you if there are any errors.

                            BTW, could you please tell in which file output file from gffcompare I can find known lncRNAs?

                            Thank you

                            Comment


                            • #15
                              Originally posted by kongantik View Post
                              From what I see in the build script, you can download the genome for GRCh37 from ftp://ftp.ensembl.org/pub/release-75...assembly.fa.gz

                              and then decompress using gunzip command and then you can provide that as FASTA

                              Let me know if that does not work.
                              Like you said before I gave the command. And I have an error now.

                              perl lncRNApipe --cpu 12 --run /path/lncRNApipe_28_12_2017_run1 --cat-ncRNAs "--extract-pat 'i|o|u|x|e' --ignore-genePred-err -sample-names 'STB133, STB236, STB34, STB36, STB65, STB79'" --fetch "-local /path/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa" --cpc --rna --inf --cov-inf 0 &> lncRNApipe.`date +"%m_%d_%y_%H-%M-%S"`.log

                              perl: warning: Setting locale failed.
                              perl: warning: Please check that your locale settings:
                              LANGUAGE = (unset),
                              LC_ALL = (unset),
                              LC_CTYPE = "UTF-8",
                              LANG = "en_US.UTF-8"
                              are supported and installed on your system.
                              perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

                              Thu Dec 28 17:18:29 2017 Validating options...

                              Thu Dec 28 17:18:29 2017 Starting <E2><98><B2><E2><98><B4> lncRNApipe Pipeline...

                              Thu Dec 28 17:18:30 2017 ########################### Module 2: Running categorize_ncRNAs.pl ###################################

                              Making output directory for categorize_ncRNAs.pl [ /path/lncRNApipe_28_12_2017_run1/categorize_ncRNAs ]


                              Command call:
                              -------------
                              /path/categorize_ncRNAs.pl -cpu 12 -cuffcmp /path/lncRNApip
                              e_28_12_2017_run1/cuffcompare/lncRNApipe_cuffcmp.tracking -out /path/lncRNApipe_28_12_2017_run1/categoriz
                              e_ncRNAs -bin /path/.lncRNApipe.depbin/linux_gtfToGenePred --extract-pat 'i|o|u|x|e' --ignore-genePred-err -s
                              ample-names 'STB133, STB236, STB34, STB36, STB65, STB79'


                              perl: warning: Setting locale failed.
                              perl: warning: Please check that your locale settings:
                              LANGUAGE = (unset),
                              LC_ALL = (unset),
                              LC_CTYPE = "UTF-8",
                              LANG = "en_US.UTF-8"
                              are supported and installed on your system.
                              perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

                              See /path/categorize_ncRNAs.pl -h for options.

                              At last this is what I see

                              AUTHOR
                              Kranti Konganti, <[email protected]>.

                              COPYRIGHT
                              This program is distributed under the Artistic License.

                              DATE
                              Jan-26-2016



                              Thu Dec 28 17:55:10 2017 ☲☴ lncRNApipe Pipeline aborted(?)
                              Last edited by bvk; 12-28-2017, 09:02 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X