Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Thank you GenoMax. That worked perfectly. So whatever I name the bowtie1/2 index database should be the name used for the database location at the end. I will definitely remember that.

    I have tested two sets out. In my first set, I know sequence belongs to what I am providing for the database, but I get everything as Unmapped. This is strange, is there I am doing something wrong?

    The bowtie1/2 index were made using bowtie-build of the reference genome found on NCBI.

    In other sample, there was an Arabidopsis contamination (somewhere between 2 to 0.2%) and I am trying to remove the regions that are not infected by using the --nohits option.

    The same thing occurred with everything came back as Unmapped, which is strange.

    Should be good version:
    Code:
    fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --paired /Users/ZainA/Downloads/Dmel_520/forward.fastq /Users/ZainA/Downloads/Dmel_520/reverse.fastq --outdir Output
    Contamination version:
    Code:
    fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Arabidopsis/Arabidopsis_gnomon_mRNA.conf --paired /Users/ZainA/Downloads/Dmel_520/Arabidopsis/forward_1p3.fastq /Users/ZainA/Downloads/Dmel_520/Arabidopsis/reverse_2p3.fastq --nohits --outdir output
    Any ideas what is occurring and why is everything coming back unmapped?

    EDIT: Is there method to filter out reads that actually match to a certain genome/s as one or separate files (paired end or single end reads- fastq).
    Last edited by Zapages; 08-08-2014, 08:17 AM.

    Comment


    • #47
      Sorry to get to this late - have been away from internet access for a week so am still catching up.

      Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems.

      Comment


      • #48
        Originally posted by simonandrews View Post
        Sorry to get to this late - have been away from internet access for a week so am still catching up.

        Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems.
        Thank you for the advice on the --unpaired. Unfortunately, I still get the same results of everything being unmapped, which is strange.

        Code:
        fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf forward.fastq reverse.fastq --outdir output_single
        Results:
        Code:
        Using fastq_screen v0.4.4
        Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf'
        Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path
        Adding database Dmel520_index_bowtie2
        Processing forward.fastq
        Searching forward.fastq against Dmel520_index_bowtie2
        Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory
        Perl module GD::Graph::bars not installed, skipping charts
        Processing reverse.fastq
        Searching reverse.fastq against Dmel520_index_bowtie2
        Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory
        Perl module GD::Graph::bars not installed, skipping charts
        Processing complete


        I now tried the following to see if bowtie2 was working correctly and it is. I have used these control sequences before through tuxedo package (Tophat2 > Cufflinks2 > Cuffdiff2>CummeRbund) and everything worked out fine.

        Code:
        bowtie2 -p 8 -x  /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/Dmel520_index_bowtie2 -1 forward.fastq -2 reverse.fastq -S Dmel_test.sam
        Results as expected:
        Code:
        27704187 reads; of these:
          27704187 (100.00%) were paired; of these:
            4441889 (16.03%) aligned concordantly 0 times
            21592908 (77.94%) aligned concordantly exactly 1 time
            1669390 (6.03%) aligned concordantly >1 times
            ----
            4441889 pairs aligned concordantly 0 times; of these:
              330640 (7.44%) aligned discordantly 1 time
            ----
            4111249 pairs aligned 0 times concordantly or discordantly; of these:
              8222498 mates make up the pairs; of these:
                5063321 (61.58%) aligned 0 times
                2980532 (36.25%) aligned exactly 1 time
                178645 (2.17%) aligned >1 times
        90.86% overall alignment rate
        Also I used this add bowtie2, bowtie or any other bioinformatic tools to Paths in OSX.



        If you have any advice on what is happening here and how to fix this to make FastQ screen work properly. I would really appreciate it.

        Thank you in advance,

        -Zapages

        Comment


        • #49
          No Hits Problem

          Hi,

          I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect.

          The Fastq Screen output states:

          Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory

          I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g.

          /Users/ZainA/Downloads/bowtie2-2.2.3/bowte2

          (or something similar).

          Regards,

          Steven

          Comment


          • #50
            Originally posted by StevenW View Post
            Hi,

            I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect.

            The Fastq Screen output states:

            Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory

            I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g.

            /Users/ZainA/Downloads/bowtie2-2.2.3/bowte2

            (or something similar).

            Regards,

            Steven
            Thank you Steven, that worked perfectly. I really appreciate the help.

            Are the no hit output for the reads (paired end), are they still arranged properly in order or do I have re-match them to be paired end reads? If so what program do you recommend in this task? Thank you in advance.



            I was wondering if this could be included in future release of FASTQ_Screen as a method to remove only contaminated reads. Unless this is possible with current version of FastQ_Screen.

            For example:

            You have your single or paired end reads - We going to go towards a Denovo assembly for either whole genome or transcriptome approach.

            If we do the --nohits options in FASTQ_Screen based on the contaminated species.

            This will yield us both True and False positive matches within the reads.

            Now if we create index (bowtie/bowtie2) for bunch of closely related species for our de-novo reads. I really wish there was an option to retain hits based on specific database.

            An example would be:

            We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination.

            Through this only the contaminated reads will removed and the good reads will be kept.

            Is this still possible with current version of the application?

            Thank you for creating this awesome program and being so helpful in the whole process.

            Comment


            • #51
              Fastq_screen

              Hi,

              Glad that worked.

              In paired-end mode the program writes the forward and reverse reads to two separate 'nohits' output files. The reads will be in order with respect to one another in the input and output files.

              There is not a feature you requested specifically, but perhaps you could create 2 configuration files? One setup would map all against all genomes and the other just the contaminants (with --nohits selected).

              i.e.

              A : all libraries in config file, --subset 100000 (only some of the reads analysed - which is quicker)

              B: contaminant libraries only in config file, --nohits, and all reads analysed

              Regards,

              Steven

              Comment


              • #52
                Originally posted by Zapages View Post
                We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination.

                Through this only the contaminated reads will removed and the good reads will be kept.
                It is possible to do this now with BBMap. See this thread for an example: http://seqanswers.com/forums/showthread.php?t=45661

                Comment


                • #53
                  Hey guys,

                  Thank you Genomax and everyone. Please let me know does this sound on removing containment reads.

                  I think I have figure out a method to what I was discussing earlier with FastQ Screen. Maybe this will be helpful for everyone here.

                  1) Conduct a metagenomic analysis using different mammals, fish species, species closely related to your experimental genomes or list of known conserved genes (i.e. beta actin, cytochrome, etc) through the containments (Arabidopsis and Maize are examples) genomes. This will be done through the use of megaBLAST and nBLAST

                  2) Where ever there is consensus between megaBLAST and nBLAST. - Please remove these sequences from the containments (Arabidopsis and Maize are examples) genomes (fasta files). Hence, this will will remove any conserved genes that are found across the different plants, mammals, and fish. (False positives)

                  3) Run FastQ Screen and take the output of unmapped sequences to containments (Arabidopsis and Maize are examples) as sequences as the not contaminated sequences. (Contamination free reads) The sequences that map to Arabidopsis and/or Maize are the true contaminated reads, which will not be outputted. (True positive contaminated reads)

                  Then continue on with the bioinformatic analysis as your reads are no longer contaminated with any Arabidopsis and/or Maize or any other possible containments.

                  Hopefully, this will allow users to have close as possible results of having not contaminated reads.

                  Comment


                  • #54
                    Zapages,

                    Have you considered BBSplit? It is based on BBMap, but designed for a slightly different role; specifically, decontaminating or binning reads from multiple organisms. It maps simultaneously to all references and outputs reads to one file per reference. Each output file will only get reads that map best to that reference. Depending on your ambiguity settings, reads from conserved regions will either be written to the files of ALL references they map to equally well, or just one, or discarded. The output is fasta or fastq.

                    Comment


                    • #55
                      configure fastq_screen.config

                      Hi,

                      I would like to use fastq_screen against Drosophila, Human, Mouse, Ecoli genome. I have downloaded Bowtie Pre-Built Index files and corresponding genome sequence (single fasta file).

                      I have prepared config file as below, and run command like following .... but got error:

                      #-------- Config file:

                      BOWTIE /data/users/bin/bowtie
                      BOWTIE2 /data/users/bin/bowtie2-2.2.4

                      THREADS 12
                      DATABASE Drosophila /data/users/Bowtie-Prebuilt-Index/dme_ucsc BOWTIE
                      DATABASE Human /data/users/Bowtie-Prebuilt-Index/hg19 BOWTIE
                      DATABASE Mouse /data/users/Bowtie-Prebuilt-Index/mm9 BOWTIE
                      DATABASE Ecoli /data/users/Bowtie-Prebuilt-Index/e_coli BOWTIE

                      #--------------- Command

                      fastq_screen --threads 12 --aligner bowtie --bowtie "-m 2 -g 1 --butterfly-search" $fq/MT1.fq $fq/MT2.fq $fq/MT3.fq $fq/MT4.fq $fq/MT5.fq $fq/MT6.fq $fq/MT7.fq $fq/MT8.fq

                      #-------------- Error

                      Using fastq_screen v0.4.4

                      Reading configuration from '/data/users/bin/fastq_screen_v0.4.4/fastq_screen.conf'

                      Using '/data/users/bin/bowtie/bowtie' as bowtie path

                      Using 12 threads for searches

                      Skipping DATABASE 'Drosophila' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/dme_ucsc'

                      Skipping DATABASE 'Human' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/hg19'

                      Skipping DATABASE 'Mouse' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/mm9'

                      Skipping DATABASE 'Ecoli' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/e_coli'

                      No search libraries were configured at /data/users/bin/fastq_screen_v0.4.4/fastq_screen line 124.



                      ## But I see that Bowtie Prebuilt Index files are present in above mentioned pathways ....... fol example:

                      ls /data/users/Bowtie-Prebuilt-Index/hg19

                      hg19.1.ebwt
                      hg19.2.ebwt
                      hg19.3.ebwt
                      hg19.4.ebwt
                      hg19.fa
                      hg19.rev.1.ebwt
                      hg19.rev.2.ebwt

                      # Final directory names as the prefix of the pre-built index names.So, this is not the issue disccued already.

                      # It shows that Bowtie Index and corresponding genome seq files are present in the directory. Also I used these Index files for mapping already without problem.

                      # I have GD::Graph installed properly.

                      thanks

                      Comment


                      • #56
                        I sent you a direct mail about this, but just so the information stays in the post, I think the problem here is that you are only specifying the path to the directory which contains your indices, and not the full path to the actual database. In this case it's a little confusing in that the name of the database and the name of the folder it's in are the same (which makes sense, but since it doesn't have to be like that you need to explicitly tell the program).

                        I think the fix is simply to append the database name to the end of the paths, so instead of:

                        /data/users/khademul/Bowtie-Prebuilt-Index/hg19

                        ..you'd have

                        /data/users/khademul/Bowtie-Prebuilt-Index/hg19/hg19

                        Comment


                        • #57
                          Citing FastX Screen

                          Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
                          The program was so helpful we really appreciate it.
                          Thanks!
                          Last edited by cjdoherty; 06-27-2015, 05:57 PM.

                          Comment


                          • #58
                            Originally posted by cjdoherty View Post
                            Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
                            The program was so helpful we really appreciate it.
                            Thanks!
                            There isn't a publication for fastq_screen. We recommend just citing the project URL.

                            Comment


                            • #59
                              Originally posted by simonandrews View Post
                              There isn't a publication for fastq_screen. We recommend just citing the project URL.
                              Thank you. Will do!

                              Comment


                              • #60
                                Remove only 'one-hit/one-library' hits

                                I am trying to use FASTQ Screen to remove contaminated sequences from my data and have a question. I was looking at the options provided with the tool and was wondering how I could set up something like this:
                                Screen my human data against potential contaminants (EColi, Yeast, Adapters,..) and only remove the hits that are classified as 'one-hit/one-library' AND 'multiple-hits/one-library'. I see that this feature is built-in as part of the plots, but was not clear if it could be (and how to) set up.

                                Thanks
                                SK

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X