Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Could I filter rRNA and tRNA by using Tophat or Cufflinks??

    My RNA-seq data was mapped by Tophat, but rRNA and tRNA were not removed, so I wonder whether Tophat or Cufflinks can remove reads match to rRNA or tRNA?

  • #2
    Why would you want to filter them out, anyway?

    Comment


    • #3
      Let me share my experience on this. When I was analyzing a set of microbial RNA-seq data, cufflink got stuck at "99% complete" for days. It is a known issue - check cufflinks FAQ. The authors suggest to remove rRNA and MT DNA. So I removed those in the GTF file and the run finished in a few hours. I believe the rRNA genes usually have excessive coverage, which may choke cufflinks.

      Comment


      • #4
        I asked because if you just want to do simple counting in your next analysis step, you would just get a few extra count values, which you can then ignore. Isn't cufflink a bit too sophisticated for prokaryotic genomes, anyway? Wouldn't it spend all its time trying to assemble multi-exonic transcripts, of which there aren't any, or can you tell it not to bother with splice junctions?

        Comment


        • #5
          Thank you, DZhang.
          I checked cufflinks FAQ, and as it suggest,I run cufflinks With -M rRNA.gtf , but it still takes me more than 1 day when caculating.
          So I wonder is there some tools can filter all the reads like what ABI's bioscope could do: discard the reads which mappable to filter reference, and the remaining reads then align to genome?

          Comment


          • #6
            Mart555,

            To answer your question directly, yes, you can. Map the reads to your filter reference and extract the unmapped reads for further processing. Bowtie/BWA can do the former part and samtools can do the latter.

            My understanding of your challenge is that you do not know what part of the reference sequences taking too many reads, or even if that is the root cause or not in your case. I assume your job is done by now, although it took a bit longer. Can you explain your situation so everybody understands your situation better?

            Thank you,
            Douglas

            Comment


            • #7
              Douglas,
              Thank you for your answer.
              As you suggest, now I want build a Bowtie index of rRNA+tRNA+mtRNA, and I think I can assess the percentage of these junk RNA by Bowtie with this index.
              But I still cannot find out how to extract the umapped reads by using samtools, and if bowtie generated a sam file with rRNA index, how can the unmapped reads remapped to genomic sequence?

              My situation:
              I was done a RIP. As 2100 show, mock RNA has peaks represent the rRNA,but RIP RNA have no such thing.
              Then I sequencing my RNA with HiSeq2000. I use Tophat to mapping with mm9.
              When mapping, RIP-reads take about 8h, wherease the Mock-reads takes me more than 24h.
              So I want filter them out, that will make my analysis much more fast.
              Last edited by mart555; 07-08-2011, 07:34 AM.

              Comment


              • #8
                Hi Mart555,

                check this post:
                Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                Regards,
                Douglas

                Comment


                • #9
                  Hi guys,

                  In your case, what I do is the following:
                  Map my reads against the junk/non desire reference (rRNA, mt,....) with bowtie. Using the --un option and saving a fastq/fasta file with the unmapped reads (desired reads).
                  Then, you can take this file and run it with bowtie/tophat/cufflinks and your referene.

                  Hope it helps.

                  Comment


                  • #10
                    Hi all,
                    Thank you for your help, I'm very appreciate.
                    Now I finished my filter with rRNA\tRNA\mtDNA.
                    About 50% IgG Reads and 26% RIP Reads were filterd, that's reasonable.

                    But another question is where can I get the correct rRNA sequence?
                    Some people recommended get rRNA sequence from http://www.arb-silva.de/
                    I searched “mus musculus”, and download the high quality sequence(about 70 record)with fasta format, and transfer "U" to "T". But these sequence doesn't work.

                    So I searched mouse rRNA sequence from Genebank, and I got only 4 record:
                    gi|262231778|ref|NR_030686.1| Mus musculus 5S RNA (Rn5s), ribosomal RNA
                    gi|120444901|ref|NR_003280.1| Mus musculus 5.8S ribosomal RNA (LOC790956), ribosomal RNA
                    gi|120444900|ref|NR_003279.1| Mus musculus 28S ribosomal RNA (28s), ribosomal RNA
                    gi|328447215|ref|NR_003278.2| Mus musculus 18S ribosomal RNA (Rn18s), ribosomal RNA

                    Integrade these four sequence with tRNA and mtDNA, I successfully filtered my reads, but I still wonder are these four sequence enough?

                    Comment


                    • #11
                      Now I finished my filtering work with rRNA sequences download from Genebank and Silva.

                      Thanks for help, all of you!

                      Comment


                      • #12
                        Hi DZhang and other SeqAnswer frequenters,

                        I want to filter rRNA and mtDNA genes from GTF files.

                        I am using RSEM to map and count reads per gene for a class project with RNA-seq data from from various publications. Then, I am comparing performances of edgeR and DESeq with the outputs of RSEM. I believe the excess coverage of rRNA and possibly mtDNA is messing up my differential expression results.

                        I downloaded my mouse and human GTF files from the USCS genome browser and converted a GFF file from arabidopsis to GTF.

                        How can you filter the rRNA and/or mtDNA out of the GTF file. Is there a list of gene IDs somewhere? I can write scripts in Perl by the way. So, I can do it myself if someone points me in the right direction. I would actually probably use the rRNA/ mtDNA gene ID list to filter the RSEM results.

                        Thanks so much,
                        Clayton
                        Last edited by cutcopy11; 11-26-2011, 07:17 PM.

                        Comment


                        • #13
                          Hi Clayton,

                          Since you have the gtf file, you may search any gene/transcript name with "rRNA" or "ribosomal RNA", and review each entry to confirm before removing it. For mtDNA, it is even easier as you can tell from the Chr. ID.

                          Cheers,
                          Douglas

                          Comment


                          • #14
                            Thanks douglas for your quick response. Where do you recommend searching for those rRNA gene ids? Thanks again, Clay

                            Comment


                            • #15
                              Hi Clay, you should do your search in your gtf/gff file. The overall idea is to remove the rRNA/mtGenes from your gtf/gff file so the program does not process the excessive reads mapped to those genes.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X