Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LINE(long interspersed nuclear element) RNA Seq

    Dear All,

    I would like to analyze in a dataset of RNA Seq the expression of LINE1 transcripts ( called L1 ORF).
    I would like to share with you any idea in the piepeline I may use.

    I tried to align with Tophat with -G option the LINE1.gtf file, but I think this is not the correct way to do it.

    A software that could help me to do this has recently been developed but to be honest the guidelines are poorly documented,

    http://nerettilab.com/software/repenrich/

    Could you please help me?

    Many Thanks,
    paolo

  • #2
    Using Repenrich

    Hi,

    I would encourage you to look at the full tutorial available here for RepEnrich:

    RepEnrich is a method to estimate repetitive element enrichment using high-throughput sequencing data. - nskvir/RepEnrich



    Best wishes,
    Steven

    Comment


    • #3
      Hi,

      For the downstream analysis of RepEnrich results and more specifically on normalization, I was wondering if it would make sense to normalize the count for the repeat by its average length across the genome. In a way RepEnrich considers the occurrence of the repeats but not its length, so not sure.

      Any ideas on the same will be very helpful.

      Best wishes
      shruti
      Last edited by shruti; 09-16-2015, 06:19 AM.

      Comment


      • #4
        Hi Shruti,

        We explored a similar approach for normalization as you suggest, which would be similar to FPKM for gene analysis. We decided that it was best to compare the same repeat across conditions using EdgeR rather than comparing different repeat subfamilies using a normalization procedure as you suggest.

        The problem with using the average length of the repeat is that often repeats display very specific biases in their length distribution, which can differ depending on the type of repeat. For examples human L1 retrotransposons such as L1Hs display a 3' bias in the genome because the 5' end tends to be truncated during retrotransposition. Since the the length of the L1 can vary dramatically 6 kb to only ~100 bp, we decided a normalization based on average repeat length would likely be problematic.

        Best wishes,
        Steven

        Comment


        • #5
          Hi Steven,

          I understand your point. It would be good if we could get the coverage of the repeats by reads (unique bases covered by reads). Did you try something on these lines?

          Also is it possible to get this information from RepEnrich?

          Thanks,
          shruti

          Comment


          • #6
            Hi,

            RepEnrich will provide count estimates at the level of repeat subfamily. You can select to use only unique reads in the estimate using the --allcountmethod option and using the _unique_counts.txt output. However, we found the fraction_counts.txt (default) to be the best predictor of the true abundance.

            Something we also tried, as an independent way of doing the analysis is aligning to the consensus sequence for a repeat subfamily to produce a coverage plot (see Figure 6 in our paper: http://www.biomedcentral.com/1471-2164/15/583). When doing this analysis we also align the genomic sequences annotated for the subfamily by Repeatmasker back to the consensus, to provide an understanding of the background distribution of subfamily lengths in the genome. This second method we have found to be useful to examine whether specific sub-regions of the repeat sub-family is contributing to differential enrichment.

            Best wishes,
            Steven C

            Comment


            • #7
              Hi Stevens,

              Maybe I was not clear, or am misunderstanding something here..
              I was asking if it is possible to get the number of unique bases of the repeats covered by at least 1 read... so that one can normalize the read count on the number of unique bases covered rather than average length of the repeat.

              Thanks
              shruti
              Last edited by shruti; 09-17-2015, 08:13 AM.

              Comment


              • #8
                Hi Shruti,

                I could interpret your question two ways, depending on how you picture your analysis. Either using genomic copies of the repeats or a consensus sequence approach. There are polymorphisms in the genomic copies of repeats that do allow for some unambiguous alignments. However, these are the minority of cases. I think that using only these unique alignments would introduces bias, and others have also shown this (an example being http://www.sciencedirect.com/science...4580715000362#).

                At the consensus level there are unique bases that distinguish subfamilies. For example, the consensus sequence for L1HS vs. L1PA2 are for the most part similar in sequence, but do have some unique sequence that can be used to distinguish these subfamilies. However, using only this small subset of sequence to examine these repeats would again introduce bias. Furthermore, due to the polymorphisms many genomic copies can differ from the consensus sequence.

                -Steven C

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 11:49 AM
                0 responses
                15 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                61 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X