Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • remove rRNA/tRNA from miRNA seq data

    Hi All,

    I am beginner at miRNA analysis. Now I am working on miRNA annotation.I have been working on it for several days but still have no idea how to do it.
    What I want is to annotate and remove rRNA/tRNA from my total data. Could anybody give me some guidelines or provide some resources?

    I know I should blast against Rfam. And probably I should use blastn. But I haven't been able to get what I mentioned above. I would appreciate if somebody can help me on it.

    Thanks!

  • #2
    I am really stuck here. It would be quite helpful if anybody can provide me some examples on how to blast against Rfam.

    Comment


    • #3
      Hi there, the rRNA sequences should be available in NCBI if your genome is sequenced - and get your tRNA sequences from Rfam, which is available through Biomart here: http://xfam-biomart.sanger.ac.uk/ - download the tRNA sequences for your genome.

      Then create a multi-FASTA file with all the rRNA and tRNA sequences inside it, build a Bowtie index of this multi-FASTA file, and align your reads against it.

      You can keep the unaligned reads then which do not map to rRNA or tRNA.

      For example:

      bowtie --best --un <unaligned_reads_file.fq> index_of_rRNAs_and_tRNAs <input_fastq_file>

      That should do the trick, then align unaligned_reads_file.fq against your genome.

      Comment


      • #4
        Originally posted by adam_ie View Post
        Hi there, the rRNA sequences should be available in NCBI if your genome is sequenced - and get your tRNA sequences from Rfam, which is available through Biomart here: http://xfam-biomart.sanger.ac.uk/ - download the tRNA sequences for your genome.

        Then create a multi-FASTA file with all the rRNA and tRNA sequences inside it, build a Bowtie index of this multi-FASTA file, and align your reads against it.

        You can keep the unaligned reads then which do not map to rRNA or tRNA.

        For example:

        bowtie --best --un <unaligned_reads_file.fq> index_of_rRNAs_and_tRNAs <input_fastq_file>

        That should do the trick, then align unaligned_reads_file.fq against your genome.
        Hi Adam,

        Your guidelines are very helpful! I am new in this area and I have been looking for ways to download rRNA/tRNA data. Bowtie can do the job, but blastn should preform better as is suggested in some literature I got. Is it correct? Do you have any idea on how to use blastn to perform this work?

        Comment


        • #5
          Bowtie2 ... not bowtie ... will perform what you want much faster and probably better than blast. Bowtie2, as adam_ie suggests, will create an output file of unaligned sequences. Blast will not do this thus you will have to run additional steps to parse the Blast output.

          If you are looking at literature that is over 2 years old and/or does not talk specifically about bowtie2 versus blastn then I suggest just ignoring that literature's advice.

          Comment


          • #6
            Originally posted by westerman View Post
            Bowtie2 ... not bowtie ... will perform what you want much faster and probably better than blast. Bowtie2, as adam_ie suggests, will create an output file of unaligned sequences. Blast will not do this thus you will have to run additional steps to parse the Blast output.

            If you are looking at literature that is over 2 years old and/or does not talk specifically about bowtie2 versus blastn then I suggest just ignoring that literature's advice.
            Hi, thank you for your suggestions. I will try to use bowtie 2 instead. But I am still not sure as Bowtie 2 or bowtie works best when aligning to large genomes according to their manual(http://bowtie-bio.sourceforge.net/bo...-isnt-bowtie-2). So could bowtie 2 just replace blast for this job?

            Comment


            • #7
              I see what you mean -- the manual suggests the bowtie2 is good for large genomes while Blast (and others) are good for shorter genomes. The rRNA/tRNAs would indeed a be a short "genome".

              Really either program can be used either way -- the manual just says "... [short genomes] can be done with Bowtie 2 but you may want to consider ... Blast ..."

              All I can say is that I use bowtie2 for rRNA removal and am satisfied with it. It may not catch every rRNA but unless I know exactly what rRNAs I have then 95+% accuracy is good enough. Bowtie is quick to run and easy to use since it generates the desired output file.

              Comment


              • #8
                Hi Westerman,

                I am using bowtie2 for rRNA removal:

                Step1: Create Index

                # bowtie2-build rRNA.fasta rRNA.index

                Step 2: Align to rRNA index inorder to get rRNA free fastq file.
                Input fastq file is not that big

                # bowtie2 -p 2 -M 1 -q -U /filter_clean.fastq --un fasqFileWithoutrRNA -x rRNA.index --al aligned to rRNA.fastq

                It is taking long to do alignment (More than 1hour and still running).

                Can you please shed some light on this. Am I doing something wrong?

                I would really appreciate your help.

                Thanks,
                Naresh

                Comment


                • #9
                  I don't recognize the '-M' option. Assuming it is valid then my suggestion is to raise the number of threads (-p) if your system has more than 2 processors in it. Also check to make sure you are not running out of memory. The command 'free' can help with this.

                  You can also look at the size of your output files. Are they increasing in size? The unaligned plus the aligned will eventually get to the size of the input file. Honestly 1+ hour is nothing for anything sizable. If the process goes over 12 hours then you have problem.

                  Comment


                  • #10
                    Hi Westerman,

                    Thanks for your prompt reply. I used -M option for below reason.

                    -M <int> look for up to <int>+1 alns; report best, with MAPQ (5 for --end-to-end, 2 for --local)

                    I checked memory, I think there still some memory left:
                    [root@Victor-20 ~]# free
                    total used free shared buffers cached
                    Mem: 49457456 48185040 1272416 0 15368076 29318996
                    -/+ buffers/cache: 3497968 45959488
                    Swap: 70364156 73684 70290472

                    Input file has 17 million reads.

                    Output file (unaligned and aligned) size is not increasing, it is still same when program was started i.e
                    450kb and 32kb. That's why I am worried little bit.

                    Thanks,
                    Naresh
                    Last edited by nareshvasani; 12-11-2014, 08:47 AM.

                    Comment


                    • #11
                      You must have a different version than I do (bowtie2-2.2.4) since '-M' doesn't show up on my help nor do I see it in the online manual (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml)

                      You do have some sort of problem. The output file sizes should be increasing. You have used up almost all of your memory which is rarely a good sign. I would stop the process and try something else. My suggestions:

                      1) Remove the '-M 1'. As far as I can tell it was an option in older bowtie2 and bowtie but is no longer needed.

                      2) Probably more important, make sure you are running the latest version of bowtie2.

                      Comment


                      • #12
                        Hey Westerman,

                        I installed new version but I was not succesful to use it, so i decided to stick with older version (Bowtie 2 version 2.0.0-beta5)

                        One interesting thing happen, when I terminated process the size of output file changes to 285.7mb and 3.2gb.

                        Any suggestion how can i uninstall older version of bowtie2?

                        Thanks,
                        Naresh

                        Comment


                        • #13
                          As for the sudden increase in file size ... maybe the program was working and your 'ls' just wasn't keeping up? That would be strange but I suppose possible in some circumstances.

                          As for an uninstall ... gee, I hate sysadmin questions since they are dependent on your particular setup and your expertise in system administration -- but I suspect that using an 'rm' on your old version would get rid of it. :-)

                          Basically if you need more help in the uninstall/install process you will have to tell us more about your system and how you tend to install programs in the first place.

                          Comment


                          • #14
                            ohh okay.

                            I don't have much expertise in system administraion.
                            Actually I installed new bowtie2 version but I was not able to excetue it. Whenever I type bowtie2, only older version will work not new version so i thought i will uninstall older version.

                            Thanks,
                            naresh

                            Comment


                            • #15
                              Removal should be good. I suspect a PATH problem. Either your path is not pointing to the newer version or the path points to the older version before it points to the new version. You could also not have the execute bit set on the new version. Once again sysadmin trivia that you'll need to solve on your own.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              29 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X