Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I am confused about Transcript start site identification using RNA-Seq,Please help me

    I am a newbie to RNA-Seq.
    Currently,I am doing the work about a bacterial.
    I got the pair-end rna-seq data sequenced by Illumina.

    I have mapped the data to the genome using bowtie, and I filtered the unmapped and ambiguously mapped reads.
    Then the reads are all unambiguously mapped, and I pileuped the bam files.

    The first thing I want to do is to determine the transcript start site.
    So, I have a look at the upstream of the predicted gene.
    At the beginning, I thought, If there are problem about the start site of the predicted gene, I will find the continuous reads at that point. The only thing I should do is extend the start site to the point whose coverage is 0.

    In practice, I will get a very long sequence if I do that way.
    Maybe there are some noise, and I should set the cut condition big than 0,ie.1,2,3...
    However I don't know which number is suitable.

    In my opinion,the reads is 90 bp, if it can be mapped to the genome,it shouldn't be noise,and if there are more than one reads mapped to the region,is the region really be transcripted? I am very confused.

    Please help me.

  • #2
    Hello hanifk,

    Welcome to the wonderful world of RNA-seq analysis. .

    RNA-seq can have very good signal-to-noise ratios compared to other expression analysis methods (e.g. microarrays). However, there is always some noise... Furthermore, if you sequence an RNA-seq library deep enough you will eventually sequence every mappable base in the genome.

    In a good RNA-seq library you can expect most of your reads to map to annotated genes. But a lot of reads will still map to introns and outside of known genes. This 'noise' has many recognized sources. For example, low level genomic DNA contamination and random transcription can give you sporadic reads throughout the genome. Unprocessed RNA contamination will give you additional noise signal within introns.

    In other words it is not a good strategy to look for the point where the coverage drops off to 0. This will not be a very good estimate of transcription start points. I would guess that for a deep library this will mostly give you positions where unmappable regions of the genome start...

    You goal is being actively pursued by various approaches, many of which are described in detail throughout this forum. For example, the concept of 'peak finding' may be useful to you in identifying transcription start sites. Basically looking for the point where read coverage diverges from being random signal from sporadic reads to 'true' signal corresponding to actually transcription.

    Possible tools to consider that are geared towards using RNA-seq data to derive transcripts may be a place to start. For example, read the papers for: ERANGE, Cufflinks and Scripture and see if one of those approaches seems appropriate. This paper may also be useful: 'Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli'

    Comment


    • #3
      Originally posted by malachig View Post
      Hello hanifk,

      Welcome to the wonderful world of RNA-seq analysis. .

      RNA-seq can have very good signal-to-noise ratios compared to other expression analysis methods (e.g. microarrays). However, there is always some noise... Furthermore, if you sequence an RNA-seq library deep enough you will eventually sequence every mappable base in the genome.

      In a good RNA-seq library you can expect most of your reads to map to annotated genes. But a lot of reads will still map to introns and outside of known genes. This 'noise' has many recognized sources. For example, low level genomic DNA contamination and random transcription can give you sporadic reads throughout the genome. Unprocessed RNA contamination will give you additional noise signal within introns.

      In other words it is not a good strategy to look for the point where the coverage drops off to 0. This will not be a very good estimate of transcription start points. I would guess that for a deep library this will mostly give you positions where unmappable regions of the genome start...

      You goal is being actively pursued by various approaches, many of which are described in detail throughout this forum. For example, the concept of 'peak finding' may be useful to you in identifying transcription start sites. Basically looking for the point where read coverage diverges from being random signal from sporadic reads to 'true' signal corresponding to actually transcription.

      Possible tools to consider that are geared towards using RNA-seq data to derive transcripts may be a place to start. For example, read the papers for: ERANGE, Cufflinks and Scripture and see if one of those approaches seems appropriate. This paper may also be useful: 'Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli'
      I am not a native english speaker, and i am not confident that I can express my confusion clearly.

      Luckily,you understood my problem and taught me with patient.
      I am grateful to your help,and I promise you I will pursuit in this wonderful world.
      Thank you very much.

      Comment


      • #4
        Originally posted by malachig View Post
        Hello hanifk,

        Welcome to the wonderful world of RNA-seq analysis. .

        RNA-seq can have very good signal-to-noise ratios compared to other expression analysis methods (e.g. microarrays). However, there is always some noise... Furthermore, if you sequence an RNA-seq library deep enough you will eventually sequence every mappable base in the genome.

        In a good RNA-seq library you can expect most of your reads to map to annotated genes. But a lot of reads will still map to introns and outside of known genes. This 'noise' has many recognized sources. For example, low level genomic DNA contamination and random transcription can give you sporadic reads throughout the genome. Unprocessed RNA contamination will give you additional noise signal within introns.

        In other words it is not a good strategy to look for the point where the coverage drops off to 0. This will not be a very good estimate of transcription start points. I would guess that for a deep library this will mostly give you positions where unmappable regions of the genome start...

        You goal is being actively pursued by various approaches, many of which are described in detail throughout this forum. For example, the concept of 'peak finding' may be useful to you in identifying transcription start sites. Basically looking for the point where read coverage diverges from being random signal from sporadic reads to 'true' signal corresponding to actually transcription.

        Possible tools to consider that are geared towards using RNA-seq data to derive transcripts may be a place to start. For example, read the papers for: ERANGE, Cufflinks and Scripture and see if one of those approaches seems appropriate. This paper may also be useful: 'Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli'
        hello malachig,

        I have some question want to ask for help. I'm working on a mammalian project now. And I have RNAseq sequences form several different tissues which are sequeced by Illumina GA II. I don't know how to get the transcript start site of genes from the RNAseq data. Is there any method that can figure this out ?
        Thanks

        Comment


        • #5
          Originally posted by malachig View Post
          Hello hanifk,

          Welcome to the wonderful world of RNA-seq analysis. .

          RNA-seq can have very good signal-to-noise ratios compared to other expression analysis methods (e.g. microarrays). However, there is always some noise... Furthermore, if you sequence an RNA-seq library deep enough you will eventually sequence every mappable base in the genome.

          In a good RNA-seq library you can expect most of your reads to map to annotated genes. But a lot of reads will still map to introns and outside of known genes. This 'noise' has many recognized sources. For example, low level genomic DNA contamination and random transcription can give you sporadic reads throughout the genome. Unprocessed RNA contamination will give you additional noise signal within introns.

          In other words it is not a good strategy to look for the point where the coverage drops off to 0. This will not be a very good estimate of transcription start points. I would guess that for a deep library this will mostly give you positions where unmappable regions of the genome start...

          You goal is being actively pursued by various approaches, many of which are described in detail throughout this forum. For example, the concept of 'peak finding' may be useful to you in identifying transcription start sites. Basically looking for the point where read coverage diverges from being random signal from sporadic reads to 'true' signal corresponding to actually transcription.

          Possible tools to consider that are geared towards using RNA-seq data to derive transcripts may be a place to start. For example, read the papers for: ERANGE, Cufflinks and Scripture and see if one of those approaches seems appropriate. This paper may also be useful: 'Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli'
          hello malachig,

          I have some question want to ask for help. I'm working on a mammalian genome project now. And I have RNAseq sequences form several different tissues which are sequeced by Illumina GA II. I don't know how to get the transcript start site of genes from the RNAseq data. Is there any method that can figure this out ?
          Thanks

          Comment


          • #6
            Is anybody out there to answer this question ? I looking for an answer too.



            Originally posted by feixue1039 View Post
            hello malachig,

            I have some question want to ask for help. I'm working on a mammalian genome project now. And I have RNAseq sequences form several different tissues which are sequeced by Illumina GA II. I don't know how to get the transcript start site of genes from the RNAseq data. Is there any method that can figure this out ?
            Thanks

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X