Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    genomic distribution of reads

    Simon,
    I just posted my question on this. I wonder Can seqmonk allow me to identify where my differentially expressed reads are e.g whether they are in UTR or promoter region so on. At this time I have not seq monk. I am looking for some tool which will enlist selected differentially expressed read counts of RNA seq data in tabular form to give me reads are more enriched in promoter / UTR etc regions?

    Thanks

    Comment


    • #92
      Originally posted by mathew View Post
      Simon,
      I just posted my question on this. I wonder Can seqmonk allow me to identify where my differentially expressed reads are e.g whether they are in UTR or promoter region so on. At this time I have not seq monk. I am looking for some tool which will enlist selected differentially expressed read counts of RNA seq data in tabular form to give me reads are more enriched in promoter / UTR etc regions?

      Thanks
      Kind of. SeqMonk has independent tracks containing gene, mRNA and CDS tracks and can annotate against any of these (including noting if the hit is upstream or downstream of the feature within prescribed limits). It doesn't have an internal gene model which connects these features so it can't annotate things like 5'UTR from the information it has.

      The feature annotation would be accessed by creating an annotated report from the reports menu. If you select "close to" as the annotation type then the resulting report will tell you which feature was hit and whether the hit was overlapping, upstream or downstream, and if it didn't overlap it will say how far away it was.

      The other alternative is to create separate annotation tracks for promoters, 3'/5'UTRs and exons to go alongside the existing features. You can make promoters and exons within the program, but you'd have to get a list of UTRs from another source and import them. You could then use the feature filter to get counts for how many of your hits overlapped with each of these different kinds of gene region.

      Comment


      • #93
        SeqMonk downloads old version of S. cerevisiae genome ?

        I wonder why I can't get hold of the latest version of S. cerevisiae genome from SeqMonk.

        Just a quick comparison of chrII from SeqMonk and the version available at ensembl or ncbi gives me different results.

        SeqMonk - 813178
        Ensembl - 813184
        NCBI - 813184

        Also at ebi I can see that the size of chrII is the right one so I'm puzzled a bit whether this is some silly setting at my place or something else?

        Btw, at your website you mirror plenty of genomes.
        What is the difference between these two?:
        SGD1.01.zip 13-Aug-2009 11:54 1.1M
        SGD1.zip 11-Jan-2008 14:17 1.8M

        This one is not working at all:
        SGD1_new.zip 11-Jan-2008 14:17 292K

        Any help would be appreciated.

        Comment


        • #94
          Originally posted by mjp View Post
          I wonder why I can't get hold of the latest version of S. cerevisiae genome from SeqMonk.
          I've just been to have a look and it seems that our latest S.cerevisiae annotation set was one assembly behind the latest available on Ensembl. I've just kicked off a build of the EF4 assembly and it should be processed in a few hours, and available on our site some time on Monday.

          We try to keep these genomes up to date but I'm afraid it's easy for some new builds to slip through without us noticing. If you ever see that the latest build available at Ensembl doesn't match what is available in SeqMonk then let me know so I can add it in.

          Originally posted by mjp View Post
          What is the difference between these two?:
          SGD1.01.zip 13-Aug-2009 11:54 1.1M
          SGD1.zip 11-Jan-2008 14:17 1.8M

          This one is not working at all:
          SGD1_new.zip 11-Jan-2008 14:17 292K
          S.cerevisiae was one of the earliest species we supported. From memory I think the SGD1 builds were performed before we'd automated building annotation sets from Ensembl and were doing them manually. They're only really there for compatibility with people who used them from early on and you probably shouldn't use them for new projects any more.

          Comment


          • #95
            Thanks for taking care of it.

            Comment


            • #96
              Originally posted by mjp View Post
              Thanks for taking care of it.
              No problem. The EF4 assembly should be available now.

              Comment


              • #97
                Originally posted by simonandrews View Post
                The EF4 assembly should be available now.
                Yes it is.

                No warnings now. Thanks

                Comment


                • #98
                  I think my RNAseq data looks wierd.

                  This is the first time I have used seqmonk to visualize my RNAseq data. I did it according to the Seqmonk course manual and the video guide. I imported a BAM file that was generated through CASAVA. The manual shows the reads clustering along the boxes in the mRNA track but my reads are throughout the whole gene which I think means that they include the introns. Can someone tell me if there is something wrong with what I did. I will attach a pic of a region.
                  Attached Files

                  Comment


                  • #99
                    Originally posted by shawpa View Post
                    This is the first time I have used seqmonk to visualize my RNAseq data. I did it according to the Seqmonk course manual and the video guide. I imported a BAM file that was generated through CASAVA. The manual shows the reads clustering along the boxes in the mRNA track but my reads are throughout the whole gene which I think means that they include the introns. Can someone tell me if there is something wrong with what I did. I will attach a pic of a region.
                    It sounds like you did the right kind of import. For RNA-Seq the only extra step you need to take is to tick the box which says "Split spliced reads" when importing your BAM file.

                    Looking at your data it looks like there's a bit of enrichment over exons, but that you're seeing data through a large proportion of your transcript. Do you also see data between transcripts or are the reads just within them? Is your library normal RNA-Seq or something where you'd expect to see unspliced RNA as well (nuclear RNA etc)?

                    It's maybe also worth pointing out that the RNA-Seq analysis video is now a bit outdated as there are improved tools to do this kind of analysis in SeqMonk. I'll have to do a new video to demonstrate the new recommended workflow, which is basically:
                    1. Import spliced reads
                    2. [Optional] Create filtered set of transcripts to remove splicing artefacts and pseudogenes
                    3. Use the RPKM quantitation pipeline to quantitate the filtered transcripts, but don't correct for read length
                    4. Do a Cumulative distribution plot to check the data nornalisation and fix if necessary using the normalisation tools under data quantitation.
                    5. Identify changing transcripts using the Intensity difference filter
                    6. Additionally compare replicate sets if you have them using the replicate set stats filter
                    7. deduplicate the results to get the most changing transcript for each gene


                    If your dataset contains unspliced transcripts then you might be better off doing a simple read count quantitation over genes to get your initial quantitation.

                    Comment


                    • Originally posted by simonandrews View Post
                      It sounds like you did the right kind of import. For RNA-Seq the only extra step you need to take is to tick the box which says "Split spliced reads" when importing your BAM file.

                      Looking at your data it looks like there's a bit of enrichment over exons, but that you're seeing data through a large proportion of your transcript. Do you also see data between transcripts or are the reads just within them? Is your library normal RNA-Seq or something where you'd expect to see unspliced RNA as well (nuclear RNA etc)?

                      It's maybe also worth pointing out that the RNA-Seq analysis video is now a bit outdated as there are improved tools to do this kind of analysis in SeqMonk. I'll have to do a new video to demonstrate the new recommended workflow, which is basically:
                      1. Import spliced reads
                      2. [Optional] Create filtered set of transcripts to remove splicing artefacts and pseudogenes
                      3. Use the RPKM quantitation pipeline to quantitate the filtered transcripts, but don't correct for read length
                      4. Do a Cumulative distribution plot to check the data nornalisation and fix if necessary using the normalisation tools under data quantitation.
                      5. Identify changing transcripts using the Intensity difference filter
                      6. Additionally compare replicate sets if you have them using the replicate set stats filter
                      7. deduplicate the results to get the most changing transcript for each gene


                      If your dataset contains unspliced transcripts then you might be better off doing a simple read count quantitation over genes to get your initial quantitation.
                      Honestly I am now more confused than before. I went through the RPKM pipeline and the values seem to be both negative and positive for the probes which confuses me. I don't know what the cumulative distribution plot is supposed to show me. I used the scatter plot and filtered for values above -3 since the graph is more linear after that. Then I did a difference filter for anything more than 2. I think it is log scaled. Now the intensity values are negative or positive rather than all positive. Your help is greatly appreciated.
                      Attached Files

                      Comment


                      • Having looked at the results you posted I'm a little confused too! The probes you showed don't look like they line up against the transcripts. Which feature track did you select when you ran the pipeline? I just did the same thing on the same region of one of our datasets and got probes which looked like they were in the right place, so I'm not sure what you selected.

                        The other plots look mostly OK though. Don't worry about positive vs negative values. Since you're log transforming your data the transition from positive to negative simply indicates switching from a read density of >1 read per million to < 1 read per million. Since read values < 1 read per million are generally very low indeed then we normally just switch the display to show positive values and ignore the really low ones (they'll still be there when you run the quantitations).

                        The cumulative distribution plot simply shows how your datasets progress from the lowest quantitated value to the highest. The important point here is that the two curves should be pretty much the same (which they are in your case), so that you overall distribution of values is similar, it's only the positions of specific transcripts which changes. The reason you have a somewhat complex curve is that your quantitation was performed on a base rather than a read level, so the first part of your curve will simply be places where you had a fraction of a read. You can fix this by entering the read length from your sequencing run when you do the RPKM quantitation.

                        Your scatter plot also looks OK. You have some noise at the bottom end but the data values are generally pretty consistent with a few outliers which might prove to be interesting. You should be able to use the intensity difference filter to identify the outliers (but do run the quantiation again with supplying a read length since this will sort out the noise profile at the bottom end of the plot).

                        Comment


                        • I am pretty sure I chose the mRNA track. Actually when I was importing the data I did get about 4 errors saying certain reads couldn't be aligned because they exceeded the genome length by 22 or 23bp. I used the UCSC mm9 genome from illumina's website but maybe it doesn't match the NCBI37 in seqmonk exactly. I have had issues with illumina's provided genomes before. Could this be one of my problems.

                          Comment


                          • Originally posted by shawpa View Post
                            I am pretty sure I chose the mRNA track. Actually when I was importing the data I did get about 4 errors saying certain reads couldn't be aligned because they exceeded the genome length by 22 or 23bp.
                            If it's only that small a number of reads over that small a length it's nothing to worry about. It might be they're using a different mitochondrial genome than ensembl (there seems to be some variation around), or they could be allowing circular matches which might crop up very occasionally.

                            Originally posted by shawpa View Post
                            I used the UCSC mm9 genome from illumina's website but maybe it doesn't match the NCBI37 in seqmonk exactly. I have had issues with illumina's provided genomes before. Could this be one of my problems.
                            No, those should be OK, and the fact that your reads line up against the features in SeqMonk suggests that the assembly is OK. It's the quantitation in the program which would set up the probes. If you're not sure you can right-click on the 'All probes' list and select 'View list', which will show you what settings you used. I was just surprised not to see a probe spanning the whole transcript in the screenshot, since that should have had the highest quantitation.

                            Comment


                            • Hi Simon,
                              I'm just wondering about the best way to cite seqmonk in a paper !
                              thanks for any hints
                              pbseq

                              Comment


                              • Originally posted by pbseq View Post
                                Hi Simon,
                                I'm just wondering about the best way to cite seqmonk in a paper !
                                thanks for any hints
                                There is no publication for SeqMonk at the moment - I guess that's the perennial problem of software which is always under development! We'd normally say that the easiest thing to do is to simply cite the project's web site.

                                Cheers

                                Simon.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X