Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cascoamarillo
    Senior Member
    • Oct 2010
    • 164

    filter sequences from rRNA, tRNA

    Hi all,

    After the first look, I make a rapid clone survey, to my small RNA library (for Illumina); things seem to be working. But I want to measure the grade of contamination from other sequences, like degraded rRNA, tRNA or even E.coli sequences. I think it could be a good starting point to BLAST my seqs against specific ribosomal RNA databases and/or E.coli database (intead of the whole nt db).
    So my question is: where can I find these databases? I'm being looking for a while throught the ncbi/embl but cound't find anything. Or if anyone has a better idea to check out this contamination...
  • NicoBxl
    not just another member
    • Aug 2010
    • 264

    #2
    rfam : http://rfam.sanger.ac.uk/

    Comment

    • cascoamarillo
      Senior Member
      • Oct 2010
      • 164

      #3
      Thanks for the information!

      But I've another question. When I have an Illumina read and I want to perform a filtering from contaminating seqs (eg. E. coli), I mean, to take off those non-desire reads: how can I do that. Is there any program/script for that pourpose?

      Thanks!

      Comment

      • NicoBxl
        not just another member
        • Aug 2010
        • 264

        #4
        The best way is to keep the reads that align with your reference genome.

        Comment

        • Gators
          Member
          • Feb 2011
          • 22

          #5
          Originally posted by NicoBxl View Post
          The best way is to keep the reads that align with your reference genome.
          That's the best way, but you could always align your reads against e.coli or whatever your putative contamination is, and use the bowtie parameter -un that saves unaligned reads.

          Comment

          • carmeyeii
            Senior Member
            • Mar 2011
            • 137

            #6
            Originally posted by Gators View Post
            That's the best way, but you could always align your reads against e.coli or whatever your putative contamination is, and use the bowtie parameter -un that saves unaligned reads.
            Of course. Very clever, thanks!

            Comment

            • carmeyeii
              Senior Member
              • Mar 2011
              • 137

              #7
              I am analyzing some Illumina libraries that appear to have a lot of ribosomal RNA contamination.

              I'm using Bowtie to align the reads only to a specific set of sequences, and because of the differing amount of rRNA contamination in each sample, each of them maps a different percentage of reads to the dataset (some half of what others map), ranging from 1% to 0.3%.

              I wonder if the amount of rRNA contamination in the preparation of a library can have an impact on the apparent expression level of a gene -- even though one normalizes its counts agains the total number of reads that mapped.

              What's your opinion in this subject?

              Carmen
              Last edited by carmeyeii; 12-21-2012, 08:22 AM.

              Comment

              • alisrpp
                Member
                • Dec 2010
                • 40

                #8
                Hi,

                I'm trying to use Bowtie, as Gators was suggesting, to clean my raw reads of rRNA contamination but bc it's my first time using Bowtie i'm a little bit lost.
                Can anyone suggest me a script for that with the --un parameter?
                Thanks,

                Comment

                • alisrpp
                  Member
                  • Dec 2010
                  • 40

                  #9
                  Hi to all,

                  Finally i could write a bowtie script that is working. To generate my indexes i download all the Porifera rRNA sequences fro NCBI.
                  Now i have 2 questions:

                  - As an input, should i use my FASTQ raw reads (without any trimming, either trimming of the adapters or by quality) or it's better if i clip the adapters and filter by quality first and then i try to remove the rRNA contamination?

                  - I'm trying to print the general statistics using the -t parameter but i guess that because i'm launching the script to a queue i'm not getting anything. How can i obtain the general statistics information?

                  Time loading forward index: 00:00:00
                  Time loading mirror index: 00:00:00
                  Seeded quality full-index search: 00:00:00
                  # reads processed: 1000
                  # reads with at least one reported alignment: 699 (69.90%)
                  # reads that failed to align: 301 (30.10%)
                  Reported 699 alignments to 1 output stream(s)
                  Time searching: 00:00:00
                  Overall time: 00:00:00

                  Thanks,
                  Alicia.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 05:37 AM
                  0 responses
                  5 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  16 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  50 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  109 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...