Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastQ Screen: Does your library contain what you think it does?

    I've just released the first version of a simple little program which allows you to screen a FastQ file against a panel of sequence databases so you can quickly see if your libraries contain the types of sequence you think they do, and if not then what sources of contamination might be in there.

    I take no credit at all for the idea behind this which I saw in the CRI QC pipeline, and which looked so useful I wrote an implementation for our sequencing facility. We've been running our historical data through it and we've already found a few issues we didn't know we had up until now.

    The code is pretty new, so please use with caution, and file bugs if you hit problems.

    You can see example output and get the code from:

    http://www.bioinformatics.bbsrc.ac.u.../fastq_screen/

  • #2
    Sorry, I could not find the code.

    Comment


    • #3
      Originally posted by ttnguyen View Post
      Sorry, I could not find the code.
      Try pressing shift+refresh on our download page. Our downstream cache sometimes likes to hold on to old versions of some of our pages.

      Comment


      • #4
        Colour space usage

        Looks like a great tool. I am using colour space data, so when i use my colour space indexes i get the following message:

        Error: -C was not specified when running bowtie, but index is in colorspace. If
        your reads are in colorspace, please use the -C option. If your reads are not
        in colorspace, please use a normal index (one built without specifying -C to
        bowtie-build).

        How do i specify this?

        Apologies if this is simple, but it is Friday afternoon.

        Ian

        Comment


        • #5
          Originally posted by idonaldson View Post
          Looks like a great tool. I am using colour space data, so when i use my colour space indexes i get the following message:

          Error: -C was not specified when running bowtie, but index is in colorspace.
          The current version of the script doesn't include colorspace support - I'll put that into the next release.

          A quick fix would be to edit the script and just add -C to the list of bowtie options to force colorspace mode on all analyses. Let me know if this doesn't work and I'll send out a proper fix.

          Comment


          • #6
            This looks pretty nice. As I understand it the user supplies the libraries to screen against - I wonder how one would go about to set up a library of adaptors/vectors/contaminants. Are there any commonly used collections of such sequences?

            Comment


            • #7
              Originally posted by gaffa View Post
              This looks pretty nice. As I understand it the user supplies the libraries to screen against - I wonder how one would go about to set up a library of adaptors/vectors/contaminants. Are there any commonly used collections of such sequences?
              Yes, the sequences to screen against are left up to the user since different sets of sequences will be applicable for different facilities. If you look in the example config file shipped with the application then you can see in the comments the set of libraries we're using and where we got them from.

              Basically I'm trying to cover the species which we're commonly working with in our institute plus vectors, adapters and other common sources of contamination (eg E.coli) which could come from any molecular biology lab.

              Any suggestions for other sources to screen against would be welcome.

              Comment


              • #8
                Thanks - i got it working for color-space by changing:
                my $path_to_bowtie = 'bowtie -C';
                Last edited by idonaldson; 04-11-2011, 01:28 AM. Reason: typo

                Comment


                • #9
                  Originally posted by gaffa View Post
                  This looks pretty nice. As I understand it the user supplies the libraries to screen against - I wonder how one would go about to set up a library of adaptors/vectors/contaminants. Are there any commonly used collections of such sequences?
                  It sounds like you are looking for UniVec to screen for contaminants. For the adaptor sequences you will want to contact your sequencing center about a project because custom adaptors and combinations of them are commonly used.

                  Comment


                  • #10
                    I've just put an updated version of fastq screen up onto our website. This version adds a new mode of analysis where the screening results are reported as the percentage of sequences which map to only one of the screen libraries, and the percentage which could map to more than one. This then allows you to see if you're seeing unexpected hits which are specific to the wrong species, or if you just have low complexity sequence which could have mapped anywhere.

                    The new release also fixes a few bugs and adds support for colorspace encoded reads.

                    Comment


                    • #11
                      I've just put up v0.2.1 of fastq screen to fix a bug which affected v0.2 if you were running multilib searches on paired end data. In these cases the percentage hits reported were twice as high as the true value.

                      This bug didn't affect v0.1, nor did it affect searches on single end data, or searches not using the --multilib option.

                      Comment


                      • #12
                        what if my reads are long like 100bps will still work?

                        Comment


                        • #13
                          Originally posted by husamia View Post
                          what if my reads are long like 100bps will still work?
                          Yes, that should work. The screen uses bowtie behind the scenes, so any data you could search with bowtie will work. The only problem you might have with really long reads is that a significant proportion of your library might read through into adapter. In that case you can pass the --trim3 bowtie option in the fastq_screen extra bowtie parameters option to limit how much of your reads you use to determine the match.

                          Comment


                          • #14
                            Hi, I'm new to the forum and very interested in this tool.

                            Simon, are they any sequence libraries (on top of those you recommend in your sample config files) one would want to search against when checking generic human illumina chip-seq reads? Thanks a lot for your work.

                            Comment


                            • #15
                              The choice of libraries really comes down to what other types of library are likely to be around in your facility, or other common sources of contamination. If there are a load of people doing drosophila work then I'd have a drosophila library.

                              The only common ones would be the vectors/adapters, phix and Ecoli as everyone is likely to have those around somewhere.

                              If people have found other common sources of contamination then I'd be interested to hear which species they found. The only odd one we've had (that we figured out at least) was acinetobacter which we think came from the beads used for our ChIP (the OmpA protein on the beads comes from this organism).

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM
                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:31 AM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-24-2024, 06:58 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-23-2024, 08:43 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-17-2024, 07:29 AM
                              0 responses
                              58 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X