Seqanswers Leaderboard Ad

**albireo** · 11-09-2012, 08:39 AM

Thanks for the reply Simon. Could you also advise on how to feed the fastqc "contaminants.txt" data to the program?

**simonandrews** · 11-09-2012, 08:51 AM

Originally posted by albireo View Post

Thanks for the reply Simon. Could you also advise on how to feed the fastqc "contaminants.txt" data to the program?

You'd need to convert it into a fasta file. The script below should do this:

Code:

#!/usr/bin/perl
use warnings;
use strict;

open (IN,'contaminant_list.txt') or die $!;
open (OUT,'>','contaminant_list.fa') or die $!;

while (<IN>) {
  next if (/^\#/);
  chomp;
  next unless ($_);
  my ($name,$seq) = split(/\t+/);
  next unless ($seq);
  $name =~ s/\s+/_/g;
  print OUT ">$name\n$seq\n";
}
close OUT or die $!;

Once you have that you can index it with bowtie-build using something like:

bowtie-build -f contaminant_list.fa contaminants

You can then put the contaminants database into fastq_screen.

Hope this helps

**albireo** · 11-09-2012, 09:10 AM

Hello Simon, it works perfectly, thank you. It actually detected adapter contamination in some of my libraries

**albireo** · 12-06-2012, 04:34 AM

Hi,

I have a problem when running fastqscreen on mouse paired-end ChIPseq data. Basically for all of the four libraries I have, I'm getting more than 99% no hits in the final fastqscreen graph.

Code:

Mmus    99.96   0.02    0.02    0.00    0.00

The sequences I'm checking my libraries against are human, mouse, rat, fly, vectors, adapters. I downloaded the mouse mm9 fasta from the ucsc and generated the bowtie index with bowtie 0.12.7. The same version of bowtie is used in the fastqscreen.conf file.

The reads are 51b paired end and I call the program as follows

Code:

fastq_screen --nohits --conf=fastq_screen.conf --paired <library>_2_sequence.fastq.gz <library>_1_sequence.fastq.gz

I also tried using the --bowtie="--trim5 10" option, as well as --trim3 but this didn't affect the 99% to 100% nohits results.

Separately, I had used bwa to align the reads agains mm9, and the sequences did align. This is the output of samtools flagstats for one of the four bam files:

Code:

78666176 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
76266600 + 0 mapped (96.95%:-nan%)
78666176 + 0 paired in sequencing
39333088 + 0 read1
39333088 + 0 read2
74908040 + 0 properly paired (95.22%:-nan%)
75455201 + 0 with itself and mate mapped
811399 + 0 singletons (1.03%:-nan%)
346117 + 0 with mate mapped to a different chr
130284 + 0 with mate mapped to a different chr (mapQ>=5)

Any idea on what I might be doing wrong? Apologies if I'm missing something really obvious.

**simonandrews** · 12-06-2012, 05:30 AM

Originally posted by albireo View Post

Hi,
Any idea on what I might be doing wrong? Apologies if I'm missing something really obvious.

I can't immediately see why this would be going wrong from the data you've provided. If you run the screen against just the first of your paired reads do you find any hits from that? If you don't then there's probably something odd going on in the search. If you find hits from analysing each of the files as single end, but not when you pair them then that suggests that either something is going wrong in the pairing of sequences or that you have oddly separated pairs.

If you can put a subset of your sequences up somewhere where we can see them (just 100k or so would be plenty) then we could take a look and see what's happening with your data.

**albireo** · 12-06-2012, 07:17 AM

Originally posted by simonandrews View Post

If you run the screen against just the first of your paired reads do you find any hits from that? If you don't then there's probably something odd going on in the search.

Hi Simon, no I don't find any hits even using one of the paired reads.

**albireo** · 12-07-2012, 02:41 AM

Hello,

the problem had to to with the gzip compression of my fastq files. When I unpacked the gz files and used the .fastq as input instead, the program run correctly. Any idea why that should be the case?

By the way the .fastq are very large, ranging from 7 to 12GB. I'm actually using the sampling function in fastqscreen to operate on 5000000 reads only, but I completed one successful run without subsampling as well.

**simonandrews** · 12-10-2012, 03:08 AM

Sorry to take a while to get back to you. I tried your sequences and they worked fine on my system. I also tried gzipping them and that worked OK too.

When fastq_screen runs it simply pipes the original file through zcat so in terms of the searches there's nothing different between normal and gzipped files. Could it simply be that you don't have zcat installed on your system (it's a standard part of gzip so it should be on most unix systems).

Can you try running 'which zcat' and see if that finds anything. If it doesn't then this is the problem, but I'd have thought that that would have returned a more sensible error message.

**albireo** · 12-10-2012, 03:16 AM

Hi Simon, zcat is there. I'm not an expert on gzip however I wonder if there are alternative algorithms/encodings around?

**simonandrews** · 12-10-2012, 03:27 AM

Should be. A simple test would be to run:

zcat [some file which failed] > /dev/null

..and see if that produces any errors. You might also want to check if the disk you're using was getting close to being full. If you analyse a large file the temp files it makes could be pretty big. You could try running the screen with --subset 100000 to see if that works (which is what we'd normally do anyway).

**albireo** · 12-10-2012, 03:33 AM

Ok thanks a lot, will try this and report back.

**chenz123** · 07-10-2013, 06:19 AM

fastq screen search libraries

Hello all,
I am having a problem with fastq screen version 0.4.1 while trying to execute this command:

Code:

fastq_screen --subset 1000000  --illumina1_3 --threads 22 --outdir /someOutDirPath  --paired  /pathTo/raw_data/SomeFastq_L008_R1_001.fastq  /pathTo/raw_data/SomeRealtedFastq_R2_001.fastq

I have downloaded all databases that i needed and configured them in the fastq_screen.conf file.
this is the output i get when i try to execute the command:

Code:

Reading configuration from '/fastq_screen_v0.4.1/fastq_screen.conf'
Using 8 threads for searches
No search libraries were configured at /fastq_screen_v0.4.1//fastq_screen line 119.

from a quick peak in the code, it seems the the "libraries" variable never initiated, maybe it needs to be configured somehow by hard coded? or maybe within the configuration file?

any help would be appreciated.
Cheers, Chen

**simonandrews** · 07-10-2013, 06:51 AM

Originally posted by chenz123 View Post

Hello all,
I am having a problem with fastq screen version 0.4.1 while trying to execute this command:

Code:

fastq_screen --subset 1000000  --illumina1_3 --threads 22 --outdir /someOutDirPath  --paired  /pathTo/raw_data/SomeFastq_L008_R1_001.fastq  /pathTo/raw_data/SomeRealtedFastq_R2_001.fastq

I have downloaded all databases that i needed and configured them in the fastq_screen.conf file.
this is the output i get when i try to execute the command:

Code:

Reading configuration from '/fastq_screen_v0.4.1/fastq_screen.conf'
Using 8 threads for searches
No search libraries were configured at /fastq_screen_v0.4.1//fastq_screen line 119.

from a quick peak in the code, it seems the the "libraries" variable never initiated, maybe it needs to be configured somehow by hard coded? or maybe within the configuration file?

It sounds like a problem with your configuration file. Could you message me the contents of your /fastq_screen_v0.4.1/fastq_screen.conf file and we can see what's going wrong.

**chenz123** · 07-10-2013, 08:28 AM

I've sent you a message containing the content of the configuration file.

Thanks for the help.
Cheers.

**simonandrews** · 07-10-2013, 01:31 PM

If you're using the latest release (0.4.1) then you'll need to pass the option --aligner bowtie2 since all of the indices you have are bowtie2, this is probably the reason it's failing.

We should actually handle this better. I'll get it set up so that if your config file doesn't contain both bowtie1 and bowtie2 indices then it will automatically select the correct one for your run.

Let me know if this fixes things.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News