Thanks for the reply Simon. Could you also advise on how to feed the fastqc "contaminants.txt" data to the program?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by albireo View PostThanks for the reply Simon. Could you also advise on how to feed the fastqc "contaminants.txt" data to the program?
Code:#!/usr/bin/perl use warnings; use strict; open (IN,'contaminant_list.txt') or die $!; open (OUT,'>','contaminant_list.fa') or die $!; while (<IN>) { next if (/^\#/); chomp; next unless ($_); my ($name,$seq) = split(/\t+/); next unless ($seq); $name =~ s/\s+/_/g; print OUT ">$name\n$seq\n"; } close OUT or die $!;
bowtie-build -f contaminant_list.fa contaminants
You can then put the contaminants database into fastq_screen.
Hope this helps
Comment
-
Hi,
I have a problem when running fastqscreen on mouse paired-end ChIPseq data. Basically for all of the four libraries I have, I'm getting more than 99% no hits in the final fastqscreen graph.
Code:Mmus 99.96 0.02 0.02 0.00 0.00
The reads are 51b paired end and I call the program as follows
Code:fastq_screen --nohits --conf=fastq_screen.conf --paired <library>_2_sequence.fastq.gz <library>_1_sequence.fastq.gz
Separately, I had used bwa to align the reads agains mm9, and the sequences did align. This is the output of samtools flagstats for one of the four bam files:
Code:78666176 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 76266600 + 0 mapped (96.95%:-nan%) 78666176 + 0 paired in sequencing 39333088 + 0 read1 39333088 + 0 read2 74908040 + 0 properly paired (95.22%:-nan%) 75455201 + 0 with itself and mate mapped 811399 + 0 singletons (1.03%:-nan%) 346117 + 0 with mate mapped to a different chr 130284 + 0 with mate mapped to a different chr (mapQ>=5)
Comment
-
Originally posted by albireo View PostHi,
Any idea on what I might be doing wrong? Apologies if I'm missing something really obvious.
If you can put a subset of your sequences up somewhere where we can see them (just 100k or so would be plenty) then we could take a look and see what's happening with your data.
Comment
-
Originally posted by simonandrews View PostIf you run the screen against just the first of your paired reads do you find any hits from that? If you don't then there's probably something odd going on in the search.Last edited by albireo; 12-06-2012, 08:19 AM.
Comment
-
Hello,
the problem had to to with the gzip compression of my fastq files. When I unpacked the gz files and used the .fastq as input instead, the program run correctly. Any idea why that should be the case?
By the way the .fastq are very large, ranging from 7 to 12GB. I'm actually using the sampling function in fastqscreen to operate on 5000000 reads only, but I completed one successful run without subsampling as well.
Comment
-
Sorry to take a while to get back to you. I tried your sequences and they worked fine on my system. I also tried gzipping them and that worked OK too.
When fastq_screen runs it simply pipes the original file through zcat so in terms of the searches there's nothing different between normal and gzipped files. Could it simply be that you don't have zcat installed on your system (it's a standard part of gzip so it should be on most unix systems).
Can you try running 'which zcat' and see if that finds anything. If it doesn't then this is the problem, but I'd have thought that that would have returned a more sensible error message.
Comment
-
Should be. A simple test would be to run:
zcat [some file which failed] > /dev/null
..and see if that produces any errors. You might also want to check if the disk you're using was getting close to being full. If you analyse a large file the temp files it makes could be pretty big. You could try running the screen with --subset 100000 to see if that works (which is what we'd normally do anyway).
Comment
-
fastq screen search libraries
Hello all,
I am having a problem with fastq screen version 0.4.1 while trying to execute this command:
Code:fastq_screen --subset 1000000 --illumina1_3 --threads 22 --outdir /someOutDirPath --paired /pathTo/raw_data/SomeFastq_L008_R1_001.fastq /pathTo/raw_data/SomeRealtedFastq_R2_001.fastq
this is the output i get when i try to execute the command:
Code:Reading configuration from '/fastq_screen_v0.4.1/fastq_screen.conf' Using 8 threads for searches No search libraries were configured at /fastq_screen_v0.4.1//fastq_screen line 119.
any help would be appreciated.
Cheers, Chen
Comment
-
Originally posted by chenz123 View PostHello all,
I am having a problem with fastq screen version 0.4.1 while trying to execute this command:
Code:fastq_screen --subset 1000000 --illumina1_3 --threads 22 --outdir /someOutDirPath --paired /pathTo/raw_data/SomeFastq_L008_R1_001.fastq /pathTo/raw_data/SomeRealtedFastq_R2_001.fastq
this is the output i get when i try to execute the command:
Code:Reading configuration from '/fastq_screen_v0.4.1/fastq_screen.conf' Using 8 threads for searches No search libraries were configured at /fastq_screen_v0.4.1//fastq_screen line 119.
Comment
-
If you're using the latest release (0.4.1) then you'll need to pass the option --aligner bowtie2 since all of the indices you have are bowtie2, this is probably the reason it's failing.
We should actually handle this better. I'll get it set up so that if your config file doesn't contain both bowtie1 and bowtie2 indices then it will automatically select the correct one for your run.
Let me know if this fixes things.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Comment