![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
SRA to fastq conversion with fastq-dump loses sequences | pcantalupo | Bioinformatics | 13 | 10-08-2015 05:09 PM |
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq | byb121 | Bioinformatics | 6 | 12-20-2013 02:26 AM |
RNA-Seq: Second-Generation Sequencing Supply an Effective Way to Screen RNAi Targets | Newsbot! | Literature Watch | 0 | 04-16-2011 03:50 AM |
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? | jjw14 | Illumina/Solexa | 2 | 06-01-2010 05:35 PM |
PubMed: Implementation of Novel Pyrosequencing Assays to Screen for Common Mutations | Newsbot! | Literature Watch | 0 | 05-12-2009 06:00 AM |
![]() |
|
Thread Tools |
![]() |
#41 |
Senior Member
Location: Vancouver, BC Join Date: Mar 2010
Posts: 275
|
![]()
You can install GD::Graph with the CPAN shell (you might need 'sudo' depending on your set up):
Code:
perl -MCPAN -e 'install GD::Graph' |
![]() |
![]() |
![]() |
#42 |
Member
Location: NJ Join Date: Oct 2012
Posts: 97
|
![]()
I actually built the index offline on my laptop (OSX) using bowtie2-build genome.fasta genome_index_bowtie2 . Previously, I was using the bowtie2 index made from iPlant. Unfortunately, I still received the same error as before.
Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --outdir /Output --paired /Users/ZainA/Downloads/Dmel_520/S4A15_SRR070422_1p.fastq /Users/ZainA/Downloads/Dmel_520/S4A15_SRR070422_2p.fastq Using fastq_screen v0.4.4 Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf' Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path No search libraries were configured at /Users/ZainA/Downloads/fastq_screen_v0.4.4/fastq_screen line 124. I get the following error: Code:
Warning: prerequisite GD 1.18 not found. Warning: prerequisite GD::Text 0.80 not found. only nested arrays of non-refs are supported at /System/Library/Perl/5.12/ExtUtils/MakeMaker.pm line 664 Warning: No success on command[/usr/bin/perl Makefile.PL] 'YAML' not installed, will not store persistent state RUZ/GDGraph-1.48.tar.gz /usr/bin/perl Makefile.PL -- NOT OK Running make test Make had some problems, won't test Running make install Make had some problems, won't install |
![]() |
![]() |
![]() |
#43 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
The reason it's saying that you don't have any libraries configured is that you're specifying that you should use bowtie2, but your library isn't marked as a bowtie2 library in your config file. Have a look at the example config file we ship with the distribution to see what the syntax looks like.
For GD graph - you didn't say what OS you were using. Assuming some type of linux, the easiest way is to install it from your package repository. On CentOS/Fedora I'd do: yum install perl-GD-Graph ..but there's likely to be something similar on whatever OS you're using. If that's not an option then using the perl CPAN module is the next easiest way: perl -MCPAN -e 'install GD::Graph' Hope this helps |
![]() |
![]() |
![]() |
#44 |
Member
Location: NJ Join Date: Oct 2012
Posts: 97
|
![]()
I am trying to run this on OSX Mountain Lion. I don't think yum will work on it. The other command generates the same error as before to get GD::Graph to work.
I tried to rearrange the configuration file as close to example version. Unfortunately, I still get this error. I used the following to build the index database for my reference genome. Code:
bowtie2-build reference.fasta reference_index_bowtie2 Code:
bowtie-build reference.fasta reference_index_bowtie Code:
BOWTIE /Users/ZainA/Downloads/bowtie-1.1.0 BOWTIE2 /Users/ZainA/Downloads/bowtie2-2.2.3 ##Dmel_5_20 - sequences available from /Users/ZainA/Downloads/Dmel_520/dmel-all-chromosome-r5.20.fasta DATABASE Dmel520_Bowtie /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie BOWTIE DATABASE Dmel520_Bowtie2 /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2 BOWTIE2 Code:
Dmel_520 ZainA$ fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --outdir /Output --paired /Users/ZainA/Downloads/Dmel_520/forward.fastq /Users/ZainA/Downloads/Dmel_520/reverse.fastq Code:
Using fastq_screen v0.4.4 Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf' Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path Skipping DATABASE 'Dmel520_Bowtie2' since no bowtie index was found at '/Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2' No search libraries were configured at /Users/ZainA/Downloads/fastq_screen_v0.4.4/fastq_screen line 124. Thank you in advance. |
![]() |
![]() |
![]() |
#45 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]()
You appear to be using the directory as base names for the database in the config file. Your bowtie2 index base name as indicated by your command line for bowtie2-build is "reference_index_bowtie2", so the conf file should have this line
Code:
DATABASE Dmel520_Bowtie2 /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/reference_index_bowtie2 BOWTIE2 |
![]() |
![]() |
![]() |
#46 |
Member
Location: NJ Join Date: Oct 2012
Posts: 97
|
![]()
Thank you GenoMax. That worked perfectly. So whatever I name the bowtie1/2 index database should be the name used for the database location at the end. I will definitely remember that.
I have tested two sets out. In my first set, I know sequence belongs to what I am providing for the database, but I get everything as Unmapped. This is strange, is there I am doing something wrong? The bowtie1/2 index were made using bowtie-build of the reference genome found on NCBI. In other sample, there was an Arabidopsis contamination (somewhere between 2 to 0.2%) and I am trying to remove the regions that are not infected by using the --nohits option. The same thing occurred with everything came back as Unmapped, which is strange. Should be good version: Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --paired /Users/ZainA/Downloads/Dmel_520/forward.fastq /Users/ZainA/Downloads/Dmel_520/reverse.fastq --outdir Output Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Arabidopsis/Arabidopsis_gnomon_mRNA.conf --paired /Users/ZainA/Downloads/Dmel_520/Arabidopsis/forward_1p3.fastq /Users/ZainA/Downloads/Dmel_520/Arabidopsis/reverse_2p3.fastq --nohits --outdir output EDIT: Is there method to filter out reads that actually match to a certain genome/s as one or separate files (paired end or single end reads- fastq). Last edited by Zapages; 08-08-2014 at 09:17 AM. |
![]() |
![]() |
![]() |
#47 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
Sorry to get to this late - have been away from internet access for a week so am still catching up.
Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems. |
![]() |
![]() |
![]() |
#48 | |
Member
Location: NJ Join Date: Oct 2012
Posts: 97
|
![]() Quote:
Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf forward.fastq reverse.fastq --outdir output_single Code:
Using fastq_screen v0.4.4 Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf' Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path Adding database Dmel520_index_bowtie2 Processing forward.fastq Searching forward.fastq against Dmel520_index_bowtie2 Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory Perl module GD::Graph::bars not installed, skipping charts Processing reverse.fastq Searching reverse.fastq against Dmel520_index_bowtie2 Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory Perl module GD::Graph::bars not installed, skipping charts Processing complete I now tried the following to see if bowtie2 was working correctly and it is. I have used these control sequences before through tuxedo package (Tophat2 > Cufflinks2 > Cuffdiff2>CummeRbund) and everything worked out fine. Code:
bowtie2 -p 8 -x /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/Dmel520_index_bowtie2 -1 forward.fastq -2 reverse.fastq -S Dmel_test.sam Code:
27704187 reads; of these: 27704187 (100.00%) were paired; of these: 4441889 (16.03%) aligned concordantly 0 times 21592908 (77.94%) aligned concordantly exactly 1 time 1669390 (6.03%) aligned concordantly >1 times ---- 4441889 pairs aligned concordantly 0 times; of these: 330640 (7.44%) aligned discordantly 1 time ---- 4111249 pairs aligned 0 times concordantly or discordantly; of these: 8222498 mates make up the pairs; of these: 5063321 (61.58%) aligned 0 times 2980532 (36.25%) aligned exactly 1 time 178645 (2.17%) aligned >1 times 90.86% overall alignment rate http://architectryan.com/2012/10/02/.../#.U-l6Zv2z65O If you have any advice on what is happening here and how to fix this to make FastQ screen work properly. I would really appreciate it. Thank you in advance, -Zapages |
|
![]() |
![]() |
![]() |
#49 |
Member
Location: UK Join Date: May 2011
Posts: 13
|
![]()
Hi,
I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect. The Fastq Screen output states: Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g. /Users/ZainA/Downloads/bowtie2-2.2.3/bowte2 (or something similar). Regards, Steven |
![]() |
![]() |
![]() |
#50 | |
Member
Location: NJ Join Date: Oct 2012
Posts: 97
|
![]() Quote:
![]() Are the no hit output for the reads (paired end), are they still arranged properly in order or do I have re-match them to be paired end reads? If so what program do you recommend in this task? Thank you in advance. ![]() I was wondering if this could be included in future release of FASTQ_Screen as a method to remove only contaminated reads. Unless this is possible with current version of FastQ_Screen. For example: You have your single or paired end reads - We going to go towards a Denovo assembly for either whole genome or transcriptome approach. If we do the --nohits options in FASTQ_Screen based on the contaminated species. This will yield us both True and False positive matches within the reads. Now if we create index (bowtie/bowtie2) for bunch of closely related species for our de-novo reads. I really wish there was an option to retain hits based on specific database. An example would be: We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination. Through this only the contaminated reads will removed and the good reads will be kept. Is this still possible with current version of the application? Thank you for creating this awesome program and being so helpful in the whole process. ![]() |
|
![]() |
![]() |
![]() |
#51 |
Member
Location: UK Join Date: May 2011
Posts: 13
|
![]()
Hi,
Glad that worked. In paired-end mode the program writes the forward and reverse reads to two separate 'nohits' output files. The reads will be in order with respect to one another in the input and output files. There is not a feature you requested specifically, but perhaps you could create 2 configuration files? One setup would map all against all genomes and the other just the contaminants (with --nohits selected). i.e. A : all libraries in config file, --subset 100000 (only some of the reads analysed - which is quicker) B: contaminant libraries only in config file, --nohits, and all reads analysed Regards, Steven |
![]() |
![]() |
![]() |
#52 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,087
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#53 |
Member
Location: NJ Join Date: Oct 2012
Posts: 97
|
![]()
Hey guys,
Thank you Genomax and everyone. Please let me know does this sound on removing containment reads. I think I have figure out a method to what I was discussing earlier with FastQ Screen. Maybe this will be helpful for everyone here. ![]() 1) Conduct a metagenomic analysis using different mammals, fish species, species closely related to your experimental genomes or list of known conserved genes (i.e. beta actin, cytochrome, etc) through the containments (Arabidopsis and Maize are examples) genomes. This will be done through the use of megaBLAST and nBLAST 2) Where ever there is consensus between megaBLAST and nBLAST. - Please remove these sequences from the containments (Arabidopsis and Maize are examples) genomes (fasta files). Hence, this will will remove any conserved genes that are found across the different plants, mammals, and fish. (False positives) 3) Run FastQ Screen and take the output of unmapped sequences to containments (Arabidopsis and Maize are examples) as sequences as the not contaminated sequences. (Contamination free reads) The sequences that map to Arabidopsis and/or Maize are the true contaminated reads, which will not be outputted. (True positive contaminated reads) Then continue on with the bioinformatic analysis as your reads are no longer contaminated with any Arabidopsis and/or Maize or any other possible containments. Hopefully, this will allow users to have close as possible results of having not contaminated reads. |
![]() |
![]() |
![]() |
#54 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Zapages,
Have you considered BBSplit? It is based on BBMap, but designed for a slightly different role; specifically, decontaminating or binning reads from multiple organisms. It maps simultaneously to all references and outputs reads to one file per reference. Each output file will only get reads that map best to that reference. Depending on your ambiguity settings, reads from conserved regions will either be written to the files of ALL references they map to equally well, or just one, or discarded. The output is fasta or fastq. |
![]() |
![]() |
![]() |
#55 |
Junior Member
Location: Africa Join Date: Nov 2009
Posts: 5
|
![]()
Hi,
I would like to use fastq_screen against Drosophila, Human, Mouse, Ecoli genome. I have downloaded Bowtie Pre-Built Index files and corresponding genome sequence (single fasta file). I have prepared config file as below, and run command like following .... but got error: #-------- Config file: BOWTIE /data/users/bin/bowtie BOWTIE2 /data/users/bin/bowtie2-2.2.4 THREADS 12 DATABASE Drosophila /data/users/Bowtie-Prebuilt-Index/dme_ucsc BOWTIE DATABASE Human /data/users/Bowtie-Prebuilt-Index/hg19 BOWTIE DATABASE Mouse /data/users/Bowtie-Prebuilt-Index/mm9 BOWTIE DATABASE Ecoli /data/users/Bowtie-Prebuilt-Index/e_coli BOWTIE #--------------- Command fastq_screen --threads 12 --aligner bowtie --bowtie "-m 2 -g 1 --butterfly-search" $fq/MT1.fq $fq/MT2.fq $fq/MT3.fq $fq/MT4.fq $fq/MT5.fq $fq/MT6.fq $fq/MT7.fq $fq/MT8.fq #-------------- Error Using fastq_screen v0.4.4 Reading configuration from '/data/users/bin/fastq_screen_v0.4.4/fastq_screen.conf' Using '/data/users/bin/bowtie/bowtie' as bowtie path Using 12 threads for searches Skipping DATABASE 'Drosophila' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/dme_ucsc' Skipping DATABASE 'Human' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/hg19' Skipping DATABASE 'Mouse' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/mm9' Skipping DATABASE 'Ecoli' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/e_coli' No search libraries were configured at /data/users/bin/fastq_screen_v0.4.4/fastq_screen line 124. ## But I see that Bowtie Prebuilt Index files are present in above mentioned pathways ....... fol example: ls /data/users/Bowtie-Prebuilt-Index/hg19 hg19.1.ebwt hg19.2.ebwt hg19.3.ebwt hg19.4.ebwt hg19.fa hg19.rev.1.ebwt hg19.rev.2.ebwt # Final directory names as the prefix of the pre-built index names.So, this is not the issue disccued already. # It shows that Bowtie Index and corresponding genome seq files are present in the directory. Also I used these Index files for mapping already without problem. # I have GD::Graph installed properly. thanks |
![]() |
![]() |
![]() |
#56 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]()
I sent you a direct mail about this, but just so the information stays in the post, I think the problem here is that you are only specifying the path to the directory which contains your indices, and not the full path to the actual database. In this case it's a little confusing in that the name of the database and the name of the folder it's in are the same (which makes sense, but since it doesn't have to be like that you need to explicitly tell the program).
I think the fix is simply to append the database name to the end of the paths, so instead of: /data/users/khademul/Bowtie-Prebuilt-Index/hg19 ..you'd have /data/users/khademul/Bowtie-Prebuilt-Index/hg19/hg19 |
![]() |
![]() |
![]() |
#57 |
Junior Member
Location: NC Join Date: Jun 2015
Posts: 2
|
![]()
Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
The program was so helpful we really appreciate it. Thanks! Last edited by cjdoherty; 06-27-2015 at 06:57 PM. |
![]() |
![]() |
![]() |
#58 |
Simon Andrews
Location: Babraham Inst, Cambridge, UK Join Date: May 2009
Posts: 871
|
![]() |
![]() |
![]() |
![]() |
#59 |
Junior Member
Location: NC Join Date: Jun 2015
Posts: 2
|
![]() |
![]() |
![]() |
![]() |
#60 |
Junior Member
Location: San Francisco, CA Join Date: Aug 2015
Posts: 7
|
![]()
I am trying to use FASTQ Screen to remove contaminated sequences from my data and have a question. I was looking at the options provided with the tool and was wondering how I could set up something like this:
Screen my human data against potential contaminants (EColi, Yeast, Adapters,..) and only remove the hits that are classified as 'one-hit/one-library' AND 'multiple-hits/one-library'. I see that this feature is built-in as part of the plots, but was not clear if it could be (and how to) set up. Thanks SK |
![]() |
![]() |
![]() |
Tags |
contamination, quality, screening, search |
Thread Tools | |
|
|