Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • miRNA aligning/counting

    Hello all. Recently our core lab ran the Illumina small RNA protocol on a sample (human RNA) to collect miRNA sequence. The primary purpose is to analyze differential expression of the known miRNA species in the various samples run. It is now up to me to do the bioinformatics part. I know that I can get a set of known miRNA sequences from miRBASE. I also know that I will have to either mask or trim the adapter portion of the read prior to aligning. Can anyone who has done anything similar offer some advice? What tools do you think are best for the job? Should I be trying to align to just the miRNA sequences or the whole genome?

    Thanks in advance.

  • #2
    Hi kmcarr,

    I have a similar project on the go with A. thaliana where I initially aligned all my reads to the whole genome and then intersect those results with the known location of mirBASE mature and precursor positions. I found it to be easier this way because at a later stage I can look for potentially novel miRNA

    I used novoalign (www.novocraft.com) to simultaneously align and strip off the 3' adaptor sequence. Parameters are

    novoalign -d genome -f <reads in fastq|prb formt> -s<adaptor sequence> > output

    SOAP2 and MAQ may also be used to for this purpose but I found that novoalign offered favourable performance and sensitivity. Bowtie may also do a good job but I havent tried this tool out for this work.

    Once I got the alignments I sort up the read alignments by genome sequence and ascending position. I then cross reference these positions by the location of precursor microRNA with a perl script. At this stage I got counts for each mirBASE miRNA from my short reads and I can convert these to reads/million counts.

    Contact me privately if you would like more info.

    It would be nice if other people doing similar work could share their protocols for this type of bioinformatics analysis. We could all learn something new.
    Last edited by zee; 01-15-2009, 08:07 PM.

    Comment


    • #3
      miRNA alignment

      I use a very similar approach, but I first collapse identical reads before aligning (to avoid aligning the same let-7 and other abundant miRNA reads hundreds of thousands of times. You can then count the number of reads in the original file to generate counts. The only problem with this is that you lose the sequence quality information (if you have a need for that).

      Ryan

      Comment


      • #4
        I agree. Collapsing the reads to unique examples is a very useful step as miRNA solexa runs are very over-sampled. e.g. 3M reads can often only represent 200k unique reads.

        I tend to remove adaptor tags and quality filter reads before matching to miRBase. This also reduces the search space significantly.

        Comment


        • #5
          U guys are correct, I forgot to add that after my first analysis I started to do read collapsing.
          When I did my mirBase counts, i have an option to factor in the frequency of that tag.
          I recently had a look at software for this purpose of counting tags overlapping miRNA. I found ERANGE and still trying to make it work on my genome of interest.
          Anybody care to share what they're using? I have a very crude pipeline in perl that will automate the counting and graph miRNA matches.

          Comment


          • #6
            I have my own perl scripts for handling the raw data and managing searches of the reads against mirBase, etc.

            Then I load the data into MySQL for analysis. It allows the easy tracking of the 'abundance' of each read following collapsing of the data.

            Comment


            • #7
              I also use my own script to process the result. I usually predict miRNA first and then group as known or Novel at last step. Alignining to mirbase is trivial issue once you know got candidate miRNA.

              DD

              Comment


              • #8
                Hi all,
                I have a question similar to the one posted by kmcarr. We should align miRNA sequences obtained by the Solexa/Illumina platform and we are not interested (now) to discover new miRNA species. Is there a precompiled or assembled short sequence comprising the sequences of all miRNA species (mature and hairpin) that one can use for alignment instead of using all the genome? I am thinking to something like
                ----seqMir-1....seqMir2....seqmir3.....-----
                I think that the alignment algorithm should work faster.

                Have some of you thought to such a solution? Should it work? How can I assemble such a sequence in an automatic way?

                Thanks.
                Andrea

                Comment


                • #9
                  Andrea,

                  miRBase has what you are looking for:



                  Go to the Download tab and you will find fasta files with either the hairpin or mature sequences. They also provide GFF files with the genome coordinates of the miRNAs.

                  Happy mapping.

                  Comment


                  • #10
                    Dear kmcarr,
                    yes I know that mirbase has the sequences and GFF coordinates but they are multifasta sequences format and not a single sequence file (I am thinking to Mapview that requires a unique fasta sequence...).
                    I will try to use Bowtie and SAM tool to align and view the sequences and I do not know which format they require.
                    Do you have an idea?

                    Thanks and bye for now.
                    Andrea

                    Comment


                    • #11
                      I've also used SOAP to get rid of adapters and map reads, but right now I need something to do a fuzzy identification and trimming of adapters on WINDOWS (for teaching purposes). I've finally found a mapper that works on windows (PASS) but it wont cut the adapters.

                      Any ideas gratefully received

                      David

                      Comment


                      • #12
                        Andrea,

                        If you're using mirBase to search for miRNAs I'd recommend you use the hairpin.fasta file only as many search algorithms cope badly where the search sequence is shorter than the query as is often the case when searching against the mature sequences. You then need to parse the miRNA.dat file to determine whether your hairpin matches align to known mature regions.

                        All this is simple to do with the data in a database.
                        Cheers,

                        Chris

                        Comment


                        • #13
                          David,

                          Have you tried cygwin on Windows? The vast majority of code is available for Linux only, so it's probably best to try that avenue rather than look for things available for Windows as you may miss out the best applications.
                          Cheers,

                          Chris

                          Comment


                          • #14
                            David,

                            The EMBOSS package contains a program called fuzznuc which does what you want, fuzzy identification of nucleotide sequences (http://embossgui.sourceforge.net/dem...l/fuzznuc.html).

                            EMBOSS is a huge package and primarily supported for unix and unix like environments but there is a native Windows port (ftp://emboss.open-bio.org/pub/EMBOSS/windows/). I have never used the windows port but if it is anything like the unix versions it will require some commitment to get it installed and running properly.

                            Comment


                            • #15
                              Hello,

                              What are you guys doing for the actual statistical model once you know the abundance of each miRNA in each sample? Are you doing a pooled comparison like sage or are you taking a linear model approach like limma?

                              If taking the second one what off the shelf programs are you using?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X