SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
SRA to fastq conversion with fastq-dump loses sequences pcantalupo Bioinformatics 13 10-08-2015 04:09 PM
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 01:26 AM
RNA-Seq: Second-Generation Sequencing Supply an Effective Way to Screen RNAi Targets Newsbot! Literature Watch 0 04-16-2011 02:50 AM
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? jjw14 Illumina/Solexa 2 06-01-2010 04:35 PM
PubMed: Implementation of Novel Pyrosequencing Assays to Screen for Common Mutations Newsbot! Literature Watch 0 05-12-2009 05:00 AM

Reply
 
Thread Tools
Old 08-01-2014, 09:36 AM   #41
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 274
Default

You can install GD::Graph with the CPAN shell (you might need 'sudo' depending on your set up):
Code:
perl -MCPAN -e 'install GD::Graph'
For the Fastq_screen issue, make sure your database is indexed with bowie build before running the program (it's not clear if that is the case).
SES is offline   Reply With Quote
Old 08-01-2014, 09:44 PM   #42
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

I actually built the index offline on my laptop (OSX) using bowtie2-build genome.fasta genome_index_bowtie2 . Previously, I was using the bowtie2 index made from iPlant. Unfortunately, I still received the same error as before.

Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --outdir /Output --paired /Users/ZainA/Downloads/Dmel_520/S4A15_SRR070422_1p.fastq /Users/ZainA/Downloads/Dmel_520/S4A15_SRR070422_2p.fastq 
Using fastq_screen v0.4.4
Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf'
Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path
No search libraries were configured at /Users/ZainA/Downloads/fastq_screen_v0.4.4/fastq_screen line 124.
As for GD:Graph,

I get the following error:

Code:
Warning: prerequisite GD 1.18 not found.
Warning: prerequisite GD::Text 0.80 not found.
only nested arrays of non-refs are supported at /System/Library/Perl/5.12/ExtUtils/MakeMaker.pm line 664
Warning: No success on command[/usr/bin/perl Makefile.PL]
'YAML' not installed, will not store persistent state
  RUZ/GDGraph-1.48.tar.gz
  /usr/bin/perl Makefile.PL -- NOT OK
Running make test
  Make had some problems, won't test
Running make install
  Make had some problems, won't install
How should I go about this. Thank you for the help in advance.
Zapages is offline   Reply With Quote
Old 08-02-2014, 03:09 AM   #43
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

The reason it's saying that you don't have any libraries configured is that you're specifying that you should use bowtie2, but your library isn't marked as a bowtie2 library in your config file. Have a look at the example config file we ship with the distribution to see what the syntax looks like.

For GD graph - you didn't say what OS you were using. Assuming some type of linux, the easiest way is to install it from your package repository. On CentOS/Fedora I'd do:

yum install perl-GD-Graph

..but there's likely to be something similar on whatever OS you're using.

If that's not an option then using the perl CPAN module is the next easiest way:

perl -MCPAN -e 'install GD::Graph'

Hope this helps
simonandrews is offline   Reply With Quote
Old 08-02-2014, 05:47 AM   #44
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

I am trying to run this on OSX Mountain Lion. I don't think yum will work on it. The other command generates the same error as before to get GD::Graph to work.

I tried to rearrange the configuration file as close to example version. Unfortunately, I still get this error.

I used the following to build the index database for my reference genome.

Code:
bowtie2-build reference.fasta reference_index_bowtie2
and
Code:
bowtie-build  reference.fasta reference_index_bowtie
My configuration file looks like this:

Code:
BOWTIE /Users/ZainA/Downloads/bowtie-1.1.0
BOWTIE2 /Users/ZainA/Downloads/bowtie2-2.2.3

##Dmel_5_20 - sequences available from
/Users/ZainA/Downloads/Dmel_520/dmel-all-chromosome-r5.20.fasta

DATABASE Dmel520_Bowtie /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie BOWTIE
DATABASE Dmel520_Bowtie2 /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2 BOWTIE2
The output error for the following command is like this:

Code:
Dmel_520 ZainA$ fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --outdir /Output --paired /Users/ZainA/Downloads/Dmel_520/forward.fastq /Users/ZainA/Downloads/Dmel_520/reverse.fastq
The error:
Code:
Using fastq_screen v0.4.4
Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf'
Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path
Skipping DATABASE 'Dmel520_Bowtie2' since no bowtie index was found at '/Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2'
No search libraries were configured at /Users/ZainA/Downloads/fastq_screen_v0.4.4/fastq_screen line 124.
If you could kindly help. I would really appreciate it.
Thank you in advance.
Zapages is offline   Reply With Quote
Old 08-02-2014, 08:28 AM   #45
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,489
Default

You appear to be using the directory as base names for the database in the config file. Your bowtie2 index base name as indicated by your command line for bowtie2-build is "reference_index_bowtie2", so the conf file should have this line
Code:
DATABASE Dmel520_Bowtie2 /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/reference_index_bowtie2 BOWTIE2
This assumes that your index files for bowtie2 are in "/Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/" directory. If they are not there then replace with appropriate path.
GenoMax is offline   Reply With Quote
Old 08-02-2014, 12:08 PM   #46
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

Thank you GenoMax. That worked perfectly. So whatever I name the bowtie1/2 index database should be the name used for the database location at the end. I will definitely remember that.

I have tested two sets out. In my first set, I know sequence belongs to what I am providing for the database, but I get everything as Unmapped. This is strange, is there I am doing something wrong?

The bowtie1/2 index were made using bowtie-build of the reference genome found on NCBI.

In other sample, there was an Arabidopsis contamination (somewhere between 2 to 0.2%) and I am trying to remove the regions that are not infected by using the --nohits option.

The same thing occurred with everything came back as Unmapped, which is strange.

Should be good version:
Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf --paired /Users/ZainA/Downloads/Dmel_520/forward.fastq /Users/ZainA/Downloads/Dmel_520/reverse.fastq --outdir Output
Contamination version:
Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Arabidopsis/Arabidopsis_gnomon_mRNA.conf --paired /Users/ZainA/Downloads/Dmel_520/Arabidopsis/forward_1p3.fastq /Users/ZainA/Downloads/Dmel_520/Arabidopsis/reverse_2p3.fastq --nohits --outdir output
Any ideas what is occurring and why is everything coming back unmapped?

EDIT: Is there method to filter out reads that actually match to a certain genome/s as one or separate files (paired end or single end reads- fastq).

Last edited by Zapages; 08-08-2014 at 08:17 AM.
Zapages is offline   Reply With Quote
Old 08-10-2014, 11:55 PM   #47
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Sorry to get to this late - have been away from internet access for a week so am still catching up.

Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems.
simonandrews is offline   Reply With Quote
Old 08-11-2014, 06:25 PM   #48
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

Quote:
Originally Posted by simonandrews View Post
Sorry to get to this late - have been away from internet access for a week so am still catching up.

Without seeing your files it's difficult to know why they might not be mapping. The first suspicion with any paired end files is that there's a problem with the pairing in your data. Could you try running the screen with just one of your files and remove the --paired option. Depending on whether that gives any hits will determine where you next look for problems.
Thank you for the advice on the --unpaired. Unfortunately, I still get the same results of everything being unmapped, which is strange.

Code:
fastq_screen --threads 8 --aligner bowtie2 --conf=/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf forward.fastq reverse.fastq --outdir output_single
Results:
Code:
Using fastq_screen v0.4.4
Reading configuration from '/Users/ZainA/Downloads/Dmel_520/Dmel5_20.conf'
Using '/Users/ZainA/Downloads/bowtie2-2.2.3' as bowtie2 path
Adding database Dmel520_index_bowtie2
Processing forward.fastq
Searching forward.fastq against Dmel520_index_bowtie2
Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory
Perl module GD::Graph::bars not installed, skipping charts
Processing reverse.fastq
Searching reverse.fastq against Dmel520_index_bowtie2
Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory
Perl module GD::Graph::bars not installed, skipping charts
Processing complete


I now tried the following to see if bowtie2 was working correctly and it is. I have used these control sequences before through tuxedo package (Tophat2 > Cufflinks2 > Cuffdiff2>CummeRbund) and everything worked out fine.

Code:
bowtie2 -p 8 -x  /Users/ZainA/Downloads/Dmel_520/Genomes/Dmel520_Bowtie2/Dmel520_index_bowtie2 -1 forward.fastq -2 reverse.fastq -S Dmel_test.sam
Results as expected:
Code:
27704187 reads; of these:
  27704187 (100.00%) were paired; of these:
    4441889 (16.03%) aligned concordantly 0 times
    21592908 (77.94%) aligned concordantly exactly 1 time
    1669390 (6.03%) aligned concordantly >1 times
    ----
    4441889 pairs aligned concordantly 0 times; of these:
      330640 (7.44%) aligned discordantly 1 time
    ----
    4111249 pairs aligned 0 times concordantly or discordantly; of these:
      8222498 mates make up the pairs; of these:
        5063321 (61.58%) aligned 0 times
        2980532 (36.25%) aligned exactly 1 time
        178645 (2.17%) aligned >1 times
90.86% overall alignment rate
Also I used this add bowtie2, bowtie or any other bioinformatic tools to Paths in OSX.

http://architectryan.com/2012/10/02/.../#.U-l6Zv2z65O

If you have any advice on what is happening here and how to fix this to make FastQ screen work properly. I would really appreciate it.

Thank you in advance,

-Zapages
Zapages is offline   Reply With Quote
Old 08-12-2014, 12:46 AM   #49
StevenW
Member
 
Location: UK

Join Date: May 2011
Posts: 11
Default No Hits Problem

Hi,

I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect.

The Fastq Screen output states:

Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory

I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g.

/Users/ZainA/Downloads/bowtie2-2.2.3/bowte2

(or something similar).

Regards,

Steven
StevenW is offline   Reply With Quote
Old 08-12-2014, 05:39 PM   #50
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

Quote:
Originally Posted by StevenW View Post
Hi,

I am also responsible for developing FastQ Screen. I believe the problem is caused by the path to Bowtie2 in your configuration file being incorrect.

The Fastq Screen output states:

Bowtie/Bowtie2 warning: sh: /Users/ZainA/Downloads/bowtie2-2.2.3: is a directory

I believe /Users/ZainA/Downloads/bowtie2-2.2.3 is the path to the folder where Bowtie2 is kept, you need the path to the executable file e.g.

/Users/ZainA/Downloads/bowtie2-2.2.3/bowte2

(or something similar).

Regards,

Steven
Thank you Steven, that worked perfectly. I really appreciate the help.

Are the no hit output for the reads (paired end), are they still arranged properly in order or do I have re-match them to be paired end reads? If so what program do you recommend in this task? Thank you in advance.



I was wondering if this could be included in future release of FASTQ_Screen as a method to remove only contaminated reads. Unless this is possible with current version of FastQ_Screen.

For example:

You have your single or paired end reads - We going to go towards a Denovo assembly for either whole genome or transcriptome approach.

If we do the --nohits options in FASTQ_Screen based on the contaminated species.

This will yield us both True and False positive matches within the reads.

Now if we create index (bowtie/bowtie2) for bunch of closely related species for our de-novo reads. I really wish there was an option to retain hits based on specific database.

An example would be:

We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination.

Through this only the contaminated reads will removed and the good reads will be kept.

Is this still possible with current version of the application?

Thank you for creating this awesome program and being so helpful in the whole process.
Zapages is offline   Reply With Quote
Old 08-12-2014, 11:46 PM   #51
StevenW
Member
 
Location: UK

Join Date: May 2011
Posts: 11
Default Fastq_screen

Hi,

Glad that worked.

In paired-end mode the program writes the forward and reverse reads to two separate 'nohits' output files. The reads will be in order with respect to one another in the input and output files.

There is not a feature you requested specifically, but perhaps you could create 2 configuration files? One setup would map all against all genomes and the other just the contaminants (with --nohits selected).

i.e.

A : all libraries in config file, --subset 100000 (only some of the reads analysed - which is quicker)

B: contaminant libraries only in config file, --nohits, and all reads analysed

Regards,

Steven
StevenW is offline   Reply With Quote
Old 08-13-2014, 03:19 AM   #52
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,489
Default

Quote:
Originally Posted by Zapages View Post
We could state which set of Organisms to keep the reads for and at the sametime eliminating reads from the contaminated organism. But when the contaminated organism and the other set of Organisms have the same read match, then keep the reads. Its sort of like metagenomics approach to eliminating contamination.

Through this only the contaminated reads will removed and the good reads will be kept.
It is possible to do this now with BBMap. See this thread for an example: http://seqanswers.com/forums/showthread.php?t=45661
GenoMax is offline   Reply With Quote
Old 08-27-2014, 10:42 AM   #53
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

Hey guys,

Thank you Genomax and everyone. Please let me know does this sound on removing containment reads.

I think I have figure out a method to what I was discussing earlier with FastQ Screen. Maybe this will be helpful for everyone here.

1) Conduct a metagenomic analysis using different mammals, fish species, species closely related to your experimental genomes or list of known conserved genes (i.e. beta actin, cytochrome, etc) through the containments (Arabidopsis and Maize are examples) genomes. This will be done through the use of megaBLAST and nBLAST

2) Where ever there is consensus between megaBLAST and nBLAST. - Please remove these sequences from the containments (Arabidopsis and Maize are examples) genomes (fasta files). Hence, this will will remove any conserved genes that are found across the different plants, mammals, and fish. (False positives)

3) Run FastQ Screen and take the output of unmapped sequences to containments (Arabidopsis and Maize are examples) as sequences as the not contaminated sequences. (Contamination free reads) The sequences that map to Arabidopsis and/or Maize are the true contaminated reads, which will not be outputted. (True positive contaminated reads)

Then continue on with the bioinformatic analysis as your reads are no longer contaminated with any Arabidopsis and/or Maize or any other possible containments.

Hopefully, this will allow users to have close as possible results of having not contaminated reads.
Zapages is offline   Reply With Quote
Old 08-27-2014, 10:55 AM   #54
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Zapages,

Have you considered BBSplit? It is based on BBMap, but designed for a slightly different role; specifically, decontaminating or binning reads from multiple organisms. It maps simultaneously to all references and outputs reads to one file per reference. Each output file will only get reads that map best to that reference. Depending on your ambiguity settings, reads from conserved regions will either be written to the files of ALL references they map to equally well, or just one, or discarded. The output is fasta or fastq.
Brian Bushnell is offline   Reply With Quote
Old 11-08-2014, 06:59 PM   #55
abmmki
Junior Member
 
Location: Africa

Join Date: Nov 2009
Posts: 5
Default configure fastq_screen.config

Hi,

I would like to use fastq_screen against Drosophila, Human, Mouse, Ecoli genome. I have downloaded Bowtie Pre-Built Index files and corresponding genome sequence (single fasta file).

I have prepared config file as below, and run command like following .... but got error:

#-------- Config file:

BOWTIE /data/users/bin/bowtie
BOWTIE2 /data/users/bin/bowtie2-2.2.4

THREADS 12
DATABASE Drosophila /data/users/Bowtie-Prebuilt-Index/dme_ucsc BOWTIE
DATABASE Human /data/users/Bowtie-Prebuilt-Index/hg19 BOWTIE
DATABASE Mouse /data/users/Bowtie-Prebuilt-Index/mm9 BOWTIE
DATABASE Ecoli /data/users/Bowtie-Prebuilt-Index/e_coli BOWTIE

#--------------- Command

fastq_screen --threads 12 --aligner bowtie --bowtie "-m 2 -g 1 --butterfly-search" $fq/MT1.fq $fq/MT2.fq $fq/MT3.fq $fq/MT4.fq $fq/MT5.fq $fq/MT6.fq $fq/MT7.fq $fq/MT8.fq

#-------------- Error

Using fastq_screen v0.4.4

Reading configuration from '/data/users/bin/fastq_screen_v0.4.4/fastq_screen.conf'

Using '/data/users/bin/bowtie/bowtie' as bowtie path

Using 12 threads for searches

Skipping DATABASE 'Drosophila' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/dme_ucsc'

Skipping DATABASE 'Human' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/hg19'

Skipping DATABASE 'Mouse' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/mm9'

Skipping DATABASE 'Ecoli' since no bowtie index was found at '/data/users/Bowtie-Prebuilt-Index/e_coli'

No search libraries were configured at /data/users/bin/fastq_screen_v0.4.4/fastq_screen line 124.



## But I see that Bowtie Prebuilt Index files are present in above mentioned pathways ....... fol example:

ls /data/users/Bowtie-Prebuilt-Index/hg19

hg19.1.ebwt
hg19.2.ebwt
hg19.3.ebwt
hg19.4.ebwt
hg19.fa
hg19.rev.1.ebwt
hg19.rev.2.ebwt

# Final directory names as the prefix of the pre-built index names.So, this is not the issue disccued already.

# It shows that Bowtie Index and corresponding genome seq files are present in the directory. Also I used these Index files for mapping already without problem.

# I have GD::Graph installed properly.

thanks
abmmki is offline   Reply With Quote
Old 11-10-2014, 12:14 AM   #56
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

I sent you a direct mail about this, but just so the information stays in the post, I think the problem here is that you are only specifying the path to the directory which contains your indices, and not the full path to the actual database. In this case it's a little confusing in that the name of the database and the name of the folder it's in are the same (which makes sense, but since it doesn't have to be like that you need to explicitly tell the program).

I think the fix is simply to append the database name to the end of the paths, so instead of:

/data/users/khademul/Bowtie-Prebuilt-Index/hg19

..you'd have

/data/users/khademul/Bowtie-Prebuilt-Index/hg19/hg19
simonandrews is offline   Reply With Quote
Old 06-27-2015, 02:05 PM   #57
cjdoherty
Junior Member
 
Location: NC

Join Date: Jun 2015
Posts: 2
Default Citing FastX Screen

Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
The program was so helpful we really appreciate it.
Thanks!

Last edited by cjdoherty; 06-27-2015 at 05:57 PM.
cjdoherty is offline   Reply With Quote
Old 06-29-2015, 12:49 AM   #58
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by cjdoherty View Post
Just want to make sure I'm not missing a publication, is there a preferred way to cite FastQ screen?
The program was so helpful we really appreciate it.
Thanks!
There isn't a publication for fastq_screen. We recommend just citing the project URL.
simonandrews is offline   Reply With Quote
Old 06-29-2015, 05:43 AM   #59
cjdoherty
Junior Member
 
Location: NC

Join Date: Jun 2015
Posts: 2
Default

Quote:
Originally Posted by simonandrews View Post
There isn't a publication for fastq_screen. We recommend just citing the project URL.
Thank you. Will do!
cjdoherty is offline   Reply With Quote
Old 08-14-2015, 11:55 AM   #60
touchsk
Junior Member
 
Location: San Francisco, CA

Join Date: Aug 2015
Posts: 5
Question Remove only 'one-hit/one-library' hits

I am trying to use FASTQ Screen to remove contaminated sequences from my data and have a question. I was looking at the options provided with the tool and was wondering how I could set up something like this:
Screen my human data against potential contaminants (EColi, Yeast, Adapters,..) and only remove the hits that are classified as 'one-hit/one-library' AND 'multiple-hits/one-library'. I see that this feature is built-in as part of the plots, but was not clear if it could be (and how to) set up.

Thanks
SK
touchsk is offline   Reply With Quote
Reply

Tags
contamination, quality, screening, search

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO