Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Zapages
    Member
    • Oct 2012
    • 98

    #46
    Thank you GenoMax. That worked perfectly. So whatever I name the bowtie1/2 index database should be the name used for the database location at the end. I will definitely remember that.

    I have tested two sets out. In my first set, I know sequence belongs to what I am providing for the database, but I get everything as Unmapped. This is strange, is there I am doing something wrong?

    The bowtie1/2 index were made using bowtie-build of the reference genome found on NCBI.

    In other sample, there was an Arabidopsis contamination (somewhere between 2 to 0.2%) and I am trying to remove the regions that are not infected by using the --nohits option.

    The same thing occurred with everything came back as Unmapped, which is strange.

    Should be good version:
    Code:
    fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --paired /Users/ZainA/Downloads/Dmel_520/forward.fastq /Users/ZainA/Downloads/Dmel_520/reverse.fastq --outdir Output
    Contamination version:
    Code:
    fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Arabidopsis/Arabidopsis_gnomon_mRNA.conf --paired /Users/ZainA/Downloads/Dmel_520/Arabidopsis/forward_1p3.fastq /Users/ZainA/Downloads/Dmel_520/Arabidopsis/reverse_2p3.fastq --nohits --outdir output
    Any ideas what is occurring and why is everything coming back unmapped?

    EDIT: Is there method to filter out reads that actually match to a certain genome/s as one or separate files (paired end or single end reads- fastq).
    Last edited by Zapages; 08-08-2014, 08:17 AM.

    Comment

    • simonandrews
      Simon Andrews
      • May 2009
      • 870

      #47
      Sorry to get to this late - have been away from internet access for a week so am still catching up.

      Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems.

      Comment

      • Zapages
        Member
        • Oct 2012
        • 98

        #48
        Originally posted by simonandrews View Post
        Sorry to get to this late - have been away from internet access for a week so am still catching up.

        Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems.
        Thank you for the advice on the --unpaired. Unfortunately, I still get the same results of everything being unmapped, which is strange.

        Code:
        fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf forward.fastq reverse.fastq --outdir output_single
        Results:
        Code:
        Using fastq_screen v0.4.4
        Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf'
        Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path
        Adding database Dmel520_index_bowtie2
        Processing forward.fastq
        Searching forward.fastq against Dmel520_index_bowtie2
        Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory
        Perl module GD::Graph::bars not installed, skipping charts
        Processing reverse.fastq
        Searching reverse.fastq against Dmel520_index_bowtie2
        Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory
        Perl module GD::Graph::bars not installed, skipping charts
        Processing complete


        I now tried the following to see if bowtie2 was working correctly and it is. I have used these control sequences before through tuxedo package (Tophat2 > Cufflinks2 > Cuffdiff2>CummeRbund) and everything worked out fine.

        Code:
        bowtie2 -p 8 -x  /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/Dmel520_index_bowtie2 -1 forward.fastq -2 reverse.fastq -S Dmel_test.sam
        Results as expected:
        Code:
        27704187 reads; of these:
          27704187 (100.00%) were paired; of these:
            4441889 (16.03%) aligned concordantly 0 times
            21592908 (77.94%) aligned concordantly exactly 1 time
            1669390 (6.03%) aligned concordantly >1 times
            ----
            4441889 pairs aligned concordantly 0 times; of these:
              330640 (7.44%) aligned discordantly 1 time
            ----
            4111249 pairs aligned 0 times concordantly or discordantly; of these:
              8222498 mates make up the pairs; of these:
                5063321 (61.58%) aligned 0 times
                2980532 (36.25%) aligned exactly 1 time
                178645 (2.17%) aligned >1 times
        90.86% overall alignment rate
        Also I used this add bowtie2, bowtie or any other bioinformatic tools to Paths in OSX.



        If you have any advice on what is happening here and how to fix this to make FastQ screen work properly. I would really appreciate it.

        Thank you in advance,

        -Zapages

        Comment

        • StevenW
          Member
          • May 2011
          • 15

          #49
          No Hits Problem

          Hi,

          I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect.

          The Fastq Screen output states:

          Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory

          I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g.

          /Users/ZainA/Downloads/bowtie2-2.2.3/bowte2

          (or something similar).

          Regards,

          Steven

          Comment

          • Zapages
            Member
            • Oct 2012
            • 98

            #50
            Originally posted by StevenW View Post
            Hi,

            I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect.

            The Fastq Screen output states:

            Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory

            I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g.

            /Users/ZainA/Downloads/bowtie2-2.2.3/bowte2

            (or something similar).

            Regards,

            Steven
            Thank you Steven, that worked perfectly. I really appreciate the help.

            Are the no hit output for the reads (paired end), are they still arranged properly in order or do I have re-match them to be paired end reads? If so what program do you recommend in this task? Thank you in advance.



            I was wondering if this could be included in future release of FASTQ_Screen as a method to remove only contaminated reads. Unless this is possible with current version of FastQ_Screen.

            For example:

            You have your single or paired end reads - We going to go towards a Denovo assembly for either whole genome or transcriptome approach.

            If we do the --nohits options in FASTQ_Screen based on the contaminated species.

            This will yield us both True and False positive matches within the reads.

            Now if we create index (bowtie/bowtie2) for bunch of closely related species for our de-novo reads. I really wish there was an option to retain hits based on specific database.

            An example would be:

            We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination.

            Through this only the contaminated reads will removed and the good reads will be kept.

            Is this still possible with current version of the application?

            Thank you for creating this awesome program and being so helpful in the whole process.

            Comment

            • StevenW
              Member
              • May 2011
              • 15

              #51
              Fastq_screen

              Hi,

              Glad that worked.

              In paired-end mode the program writes the forward and reverse reads to two separate 'nohits' output files. The reads will be in order with respect to one another in the input and output files.

              There is not a feature you requested specifically, but perhaps you could create 2 configuration files? One setup would map all against all genomes and the other just the contaminants (with --nohits selected).

              i.e.

              A : all libraries in config file, --subset 100000 (only some of the reads analysed - which is quicker)

              B: contaminant libraries only in config file, --nohits, and all reads analysed

              Regards,

              Steven

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #52
                Originally posted by Zapages View Post
                We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination.

                Through this only the contaminated reads will removed and the good reads will be kept.
                It is possible to do this now with BBMap. See this thread for an example: http://seqanswers.com/forums/showthread.php?t=45661

                Comment

                • Zapages
                  Member
                  • Oct 2012
                  • 98

                  #53
                  Hey guys,

                  Thank you Genomax and everyone. Please let me know does this sound on removing containment reads.

                  I think I have figure out a method to what I was discussing earlier with FastQ Screen. Maybe this will be helpful for everyone here.

                  1) Conduct a metagenomic analysis using different mammals, fish species, species closely related to your experimental genomes or list of known conserved genes (i.e. beta actin, cytochrome, etc) through the containments (Arabidopsis and Maize are examples) genomes. This will be done through the use of megaBLAST and nBLAST

                  2) Where ever there is consensus between megaBLAST and nBLAST. - Please remove these sequences from the containments (Arabidopsis and Maize are examples) genomes (fasta files). Hence, this will will remove any conserved genes that are found across the different plants, mammals, and fish. (False positives)

                  3) Run FastQ Screen and take the output of unmapped sequences to containments (Arabidopsis and Maize are examples) as sequences as the not contaminated sequences. (Contamination free reads) The sequences that map to Arabidopsis and/or Maize are the true contaminated reads, which will not be outputted. (True positive contaminated reads)

                  Then continue on with the bioinformatic analysis as your reads are no longer contaminated with any Arabidopsis and/or Maize or any other possible containments.

                  Hopefully, this will allow users to have close as possible results of having not contaminated reads.

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    #54
                    Zapages,

                    Have you considered BBSplit? It is based on BBMap, but designed for a slightly different role; specifically, decontaminating or binning reads from multiple organisms. It maps simultaneously to all references and outputs reads to one file per reference. Each output file will only get reads that map best to that reference. Depending on your ambiguity settings, reads from conserved regions will either be written to the files of ALL references they map to equally well, or just one, or discarded. The output is fasta or fastq.

                    Comment

                    • abmmki
                      Junior Member
                      • Nov 2009
                      • 5

                      #55
                      configure fastq_screen.config

                      Hi,

                      I would like to use fastq_screen against Drosophila, Human, Mouse, Ecoli genome. I have downloaded Bowtie Pre-Built Index files and corresponding genome sequence (single fasta file).

                      I have prepared config file as below, and run command like following .... but got error:

                      #-------- Config file:

                      BOWTIE /data/users/bin/bowtie
                      BOWTIE2 /data/users/bin/bowtie2-2.2.4

                      THREADS 12
                      DATABASE Drosophila /data/users/Bowtie-Prebuilt-Index/dme_ucsc BOWTIE
                      DATABASE Human /data/users/Bowtie-Prebuilt-Index/hg19 BOWTIE
                      DATABASE Mouse /data/users/Bowtie-Prebuilt-Index/mm9 BOWTIE
                      DATABASE Ecoli /data/users/Bowtie-Prebuilt-Index/e_coli BOWTIE

                      #--------------- Command

                      fastq_screen --threads 12 --aligner bowtie --bowtie "-m 2 -g 1 --butterfly-search" $fq/MT1.fq $fq/MT2.fq $fq/MT3.fq $fq/MT4.fq $fq/MT5.fq $fq/MT6.fq $fq/MT7.fq $fq/MT8.fq

                      #-------------- Error

                      Using fastq_screen v0.4.4

                      Reading configuration from '/data/users/bin/fastq_screen_v0.4.4/fastq_screen.conf'

                      Using '/data/users/bin/bowtie/bowtie' as bowtie path

                      Using 12 threads for searches

                      Skipping DATABASE 'Drosophila' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/dme_ucsc'

                      Skipping DATABASE 'Human' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/hg19'

                      Skipping DATABASE 'Mouse' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/mm9'

                      Skipping DATABASE 'Ecoli' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/e_coli'

                      No search libraries were configured at /data/users/bin/fastq_screen_v0.4.4/fastq_screen line 124.



                      ## But I see that Bowtie Prebuilt Index files are present in above mentioned pathways ....... fol example:

                      ls /data/users/Bowtie-Prebuilt-Index/hg19

                      hg19.1.ebwt
                      hg19.2.ebwt
                      hg19.3.ebwt
                      hg19.4.ebwt
                      hg19.fa
                      hg19.rev.1.ebwt
                      hg19.rev.2.ebwt

                      # Final directory names as the prefix of the pre-built index names.So, this is not the issue disccued already.

                      # It shows that Bowtie Index and corresponding genome seq files are present in the directory. Also I used these Index files for mapping already without problem.

                      # I have GD::Graph installed properly.

                      thanks

                      Comment

                      • simonandrews
                        Simon Andrews
                        • May 2009
                        • 870

                        #56
                        I sent you a direct mail about this, but just so the information stays in the post, I think the problem here is that you are only specifying the path to the directory which contains your indices, and not the full path to the actual database. In this case it's a little confusing in that the name of the database and the name of the folder it's in are the same (which makes sense, but since it doesn't have to be like that you need to explicitly tell the program).

                        I think the fix is simply to append the database name to the end of the paths, so instead of:

                        /data/users/khademul/Bowtie-Prebuilt-Index/hg19

                        ..you'd have

                        /data/users/khademul/Bowtie-Prebuilt-Index/hg19/hg19

                        Comment

                        • cjdoherty
                          Junior Member
                          • Jun 2015
                          • 2

                          #57
                          Citing FastX Screen

                          Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
                          The program was so helpful we really appreciate it.
                          Thanks!
                          Last edited by cjdoherty; 06-27-2015, 05:57 PM.

                          Comment

                          • simonandrews
                            Simon Andrews
                            • May 2009
                            • 870

                            #58
                            Originally posted by cjdoherty View Post
                            Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
                            The program was so helpful we really appreciate it.
                            Thanks!
                            There isn't a publication for fastq_screen. We recommend just citing the project URL.

                            Comment

                            • cjdoherty
                              Junior Member
                              • Jun 2015
                              • 2

                              #59
                              Originally posted by simonandrews View Post
                              There isn't a publication for fastq_screen. We recommend just citing the project URL.
                              Thank you. Will do!

                              Comment

                              • touchsk
                                Junior Member
                                • Aug 2015
                                • 7

                                #60
                                Remove only 'one-hit/one-library' hits

                                I am trying to use FASTQ Screen to remove contaminated sequences from my data and have a question. I was looking at the options provided with the tool and was wondering how I could set up something like this:
                                Screen my human data against potential contaminants (EColi, Yeast, Adapters,..) and only remove the hits that are classified as 'one-hit/one-library' AND 'multiple-hits/one-library'. I see that this feature is built-in as part of the plots, but was not clear if it could be (and how to) set up.

                                Thanks
                                SK

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM
                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-26-2026, 11:10 AM
                                0 responses
                                12 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                46 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                105 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-05-2026, 10:09 AM
                                0 responses
                                125 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...