Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    FastQ Screen: Does your library contain what you think it does?

    I've just released the first version of a simple little program which allows you to screen a FastQ file against a panel of sequence databases so you can quickly see if your libraries contain the types of sequence you think they do, and if not then what sources of contamination might be in there.

    I take no credit at all for the idea behind this which I saw in the CRI QC pipeline, and which looked so useful I wrote an implementation for our sequencing facility. We've been running our historical data through it and we've already found a few issues we didn't know we had up until now.

    The code is pretty new, so please use with caution, and file bugs if you hit problems.

    You can see example output and get the code from:

  • ttnguyen
    Member
    • Mar 2010
    • 41

    #2
    Sorry, I could not find the code.

    Comment

    • simonandrews
      Simon Andrews
      • May 2009
      • 870

      #3
      Originally posted by ttnguyen View Post
      Sorry, I could not find the code.
      Try pressing shift+refresh on our download page. Our downstream cache sometimes likes to hold on to old versions of some of our pages.

      Comment

      • idonaldson
        Member
        • Oct 2009
        • 37

        #4
        Colour space usage

        Looks like a great tool. I am using colour space data, so when i use my colour space indexes i get the following message:

        Error: -C was not specified when running bowtie, but index is in colorspace. If
        your reads are in colorspace, please use the -C option. If your reads are not
        in colorspace, please use a normal index (one built without specifying -C to
        bowtie-build).

        How do i specify this?

        Apologies if this is simple, but it is Friday afternoon.

        Ian

        Comment

        • simonandrews
          Simon Andrews
          • May 2009
          • 870

          #5
          Originally posted by idonaldson View Post
          Looks like a great tool. I am using colour space data, so when i use my colour space indexes i get the following message:

          Error: -C was not specified when running bowtie, but index is in colorspace.
          The current version of the script doesn't include colorspace support - I'll put that into the next release.

          A quick fix would be to edit the script and just add -C to the list of bowtie options to force colorspace mode on all analyses. Let me know if this doesn't work and I'll send out a proper fix.

          Comment

          • gaffa
            Member
            • Oct 2010
            • 82

            #6
            This looks pretty nice. As I understand it the user supplies the libraries to screen against - I wonder how one would go about to set up a library of adaptors/vectors/contaminants. Are there any commonly used collections of such sequences?

            Comment

            • simonandrews
              Simon Andrews
              • May 2009
              • 870

              #7
              Originally posted by gaffa View Post
              This looks pretty nice. As I understand it the user supplies the libraries to screen against - I wonder how one would go about to set up a library of adaptors/vectors/contaminants. Are there any commonly used collections of such sequences?
              Yes, the sequences to screen against are left up to the user since different sets of sequences will be applicable for different facilities. If you look in the example config file shipped with the application then you can see in the comments the set of libraries we're using and where we got them from.

              Basically I'm trying to cover the species which we're commonly working with in our institute plus vectors, adapters and other common sources of contamination (eg E.coli) which could come from any molecular biology lab.

              Any suggestions for other sources to screen against would be welcome.

              Comment

              • idonaldson
                Member
                • Oct 2009
                • 37

                #8
                Thanks - i got it working for color-space by changing:
                my $path_to_bowtie = 'bowtie -C';
                Last edited by idonaldson; 04-11-2011, 01:28 AM. Reason: typo

                Comment

                • SES
                  Senior Member
                  • Mar 2010
                  • 275

                  #9
                  Originally posted by gaffa View Post
                  This looks pretty nice. As I understand it the user supplies the libraries to screen against - I wonder how one would go about to set up a library of adaptors/vectors/contaminants. Are there any commonly used collections of such sequences?
                  It sounds like you are looking for UniVec to screen for contaminants. For the adaptor sequences you will want to contact your sequencing center about a project because custom adaptors and combinations of them are commonly used.

                  Comment

                  • simonandrews
                    Simon Andrews
                    • May 2009
                    • 870

                    #10
                    I've just put an updated version of fastq screen up onto our website. This version adds a new mode of analysis where the screening results are reported as the percentage of sequences which map to only one of the screen libraries, and the percentage which could map to more than one. This then allows you to see if you're seeing unexpected hits which are specific to the wrong species, or if you just have low complexity sequence which could have mapped anywhere.

                    The new release also fixes a few bugs and adds support for colorspace encoded reads.

                    Comment

                    • simonandrews
                      Simon Andrews
                      • May 2009
                      • 870

                      #11
                      I've just put up v0.2.1 of fastq screen to fix a bug which affected v0.2 if you were running multilib searches on paired end data. In these cases the percentage hits reported were twice as high as the true value.

                      This bug didn't affect v0.1, nor did it affect searches on single end data, or searches not using the --multilib option.

                      Comment

                      • husamia
                        Member
                        • Apr 2010
                        • 66

                        #12
                        what if my reads are long like 100bps will still work?

                        Comment

                        • simonandrews
                          Simon Andrews
                          • May 2009
                          • 870

                          #13
                          Originally posted by husamia View Post
                          what if my reads are long like 100bps will still work?
                          Yes, that should work. The screen uses bowtie behind the scenes, so any data you could search with bowtie will work. The only problem you might have with really long reads is that a significant proportion of your library might read through into adapter. In that case you can pass the --trim3 bowtie option in the fastq_screen extra bowtie parameters option to limit how much of your reads you use to determine the match.

                          Comment

                          • albireo
                            Member
                            • Sep 2012
                            • 39

                            #14
                            Hi, I'm new to the forum and very interested in this tool.

                            Simon, are they any sequence libraries (on top of those you recommend in your sample config files) one would want to search against when checking generic human illumina chip-seq reads? Thanks a lot for your work.

                            Comment

                            • simonandrews
                              Simon Andrews
                              • May 2009
                              • 870

                              #15
                              The choice of libraries really comes down to what other types of library are likely to be around in your facility, or other common sources of contamination. If there are a load of people doing drosophila work then I'd have a drosophila library.

                              The only common ones would be the vectors/adapters, phix and Ecoli as everyone is likely to have those around somewhere.

                              If people have found other common sources of contamination then I'd be interested to hear which species they found. The only odd one we've had (that we figured out at least) was acinetobacter which we think came from the beads used for our ChIP (the OmpA protein on the beads comes from this organism).

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...