Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks for the reply Simon. Could you also advise on how to feed the fastqc "contaminants.txt" data to the program?

    Comment


    • #17
      Originally posted by albireo View Post
      Thanks for the reply Simon. Could you also advise on how to feed the fastqc "contaminants.txt" data to the program?
      You'd need to convert it into a fasta file. The script below should do this:

      Code:
      #!/usr/bin/perl
      use warnings;
      use strict;
      
      open (IN,'contaminant_list.txt') or die $!;
      open (OUT,'>','contaminant_list.fa') or die $!;
      
      while (<IN>) {
        next if (/^\#/);
        chomp;
        next unless ($_);
        my ($name,$seq) = split(/\t+/);
        next unless ($seq);
        $name =~ s/\s+/_/g;
        print OUT ">$name\n$seq\n";
      }
      close OUT or die $!;
      Once you have that you can index it with bowtie-build using something like:

      bowtie-build -f contaminant_list.fa contaminants

      You can then put the contaminants database into fastq_screen.

      Hope this helps

      Comment


      • #18
        Hello Simon, it works perfectly, thank you. It actually detected adapter contamination in some of my libraries

        Comment


        • #19
          Hi,

          I have a problem when running fastqscreen on mouse paired-end ChIPseq data. Basically for all of the four libraries I have, I'm getting more than 99% no hits in the final fastqscreen graph.

          Code:
          Mmus    99.96   0.02    0.02    0.00    0.00
          The sequences I'm checking my libraries against are human, mouse, rat, fly, vectors, adapters. I downloaded the mouse mm9 fasta from the ucsc and generated the bowtie index with bowtie 0.12.7. The same version of bowtie is used in the fastqscreen.conf file.

          The reads are 51b paired end and I call the program as follows

          Code:
          fastq_screen --nohits --conf=fastq_screen.conf --paired <library>_2_sequence.fastq.gz <library>_1_sequence.fastq.gz
          I also tried using the --bowtie="--trim5 10" option, as well as --trim3 but this didn't affect the 99% to 100% nohits results.

          Separately, I had used bwa to align the reads agains mm9, and the sequences did align. This is the output of samtools flagstats for one of the four bam files:

          Code:
          78666176 + 0 in total (QC-passed reads + QC-failed reads)
          0 + 0 duplicates
          76266600 + 0 mapped (96.95%:-nan%)
          78666176 + 0 paired in sequencing
          39333088 + 0 read1
          39333088 + 0 read2
          74908040 + 0 properly paired (95.22%:-nan%)
          75455201 + 0 with itself and mate mapped
          811399 + 0 singletons (1.03%:-nan%)
          346117 + 0 with mate mapped to a different chr
          130284 + 0 with mate mapped to a different chr (mapQ>=5)
          Any idea on what I might be doing wrong? Apologies if I'm missing something really obvious.

          Comment


          • #20
            Originally posted by albireo View Post
            Hi,
            Any idea on what I might be doing wrong? Apologies if I'm missing something really obvious.
            I can't immediately see why this would be going wrong from the data you've provided. If you run the screen against just the first of your paired reads do you find any hits from that? If you don't then there's probably something odd going on in the search. If you find hits from analysing each of the files as single end, but not when you pair them then that suggests that either something is going wrong in the pairing of sequences or that you have oddly separated pairs.

            If you can put a subset of your sequences up somewhere where we can see them (just 100k or so would be plenty) then we could take a look and see what's happening with your data.

            Comment


            • #21
              Originally posted by simonandrews View Post
              If you run the screen against just the first of your paired reads do you find any hits from that? If you don't then there's probably something odd going on in the search.
              Hi Simon, no I don't find any hits even using one of the paired reads.
              Last edited by albireo; 12-06-2012, 08:19 AM.

              Comment


              • #22
                Hello,

                the problem had to to with the gzip compression of my fastq files. When I unpacked the gz files and used the .fastq as input instead, the program run correctly. Any idea why that should be the case?

                By the way the .fastq are very large, ranging from 7 to 12GB. I'm actually using the sampling function in fastqscreen to operate on 5000000 reads only, but I completed one successful run without subsampling as well.

                Comment


                • #23
                  Sorry to take a while to get back to you. I tried your sequences and they worked fine on my system. I also tried gzipping them and that worked OK too.

                  When fastq_screen runs it simply pipes the original file through zcat so in terms of the searches there's nothing different between normal and gzipped files. Could it simply be that you don't have zcat installed on your system (it's a standard part of gzip so it should be on most unix systems).

                  Can you try running 'which zcat' and see if that finds anything. If it doesn't then this is the problem, but I'd have thought that that would have returned a more sensible error message.

                  Comment


                  • #24
                    Hi Simon, zcat is there. I'm not an expert on gzip however I wonder if there are alternative algorithms/encodings around?

                    Comment


                    • #25
                      Should be. A simple test would be to run:

                      zcat [some file which failed] > /dev/null

                      ..and see if that produces any errors. You might also want to check if the disk you're using was getting close to being full. If you analyse a large file the temp files it makes could be pretty big. You could try running the screen with --subset 100000 to see if that works (which is what we'd normally do anyway).

                      Comment


                      • #26
                        Ok thanks a lot, will try this and report back.

                        Comment


                        • #27
                          fastq screen search libraries

                          Hello all,
                          I am having a problem with fastq screen version 0.4.1 while trying to execute this command:

                          Code:
                          fastq_screen --subset 1000000  --illumina1_3 --threads 22 --outdir /someOutDirPath  --paired  /pathTo/raw_data/SomeFastq_L008_R1_001.fastq  /pathTo/raw_data/SomeRealtedFastq_R2_001.fastq
                          I have downloaded all databases that i needed and configured them in the fastq_screen.conf file.
                          this is the output i get when i try to execute the command:
                          Code:
                          Reading configuration from '/fastq_screen_v0.4.1/fastq_screen.conf'
                          Using 8 threads for searches
                          No search libraries were configured at /fastq_screen_v0.4.1//fastq_screen line 119.
                          from a quick peak in the code, it seems the the "libraries" variable never initiated, maybe it needs to be configured somehow by hard coded? or maybe within the configuration file?

                          any help would be appreciated.
                          Cheers, Chen

                          Comment


                          • #28
                            Originally posted by chenz123 View Post
                            Hello all,
                            I am having a problem with fastq screen version 0.4.1 while trying to execute this command:

                            Code:
                            fastq_screen --subset 1000000  --illumina1_3 --threads 22 --outdir /someOutDirPath  --paired  /pathTo/raw_data/SomeFastq_L008_R1_001.fastq  /pathTo/raw_data/SomeRealtedFastq_R2_001.fastq
                            I have downloaded all databases that i needed and configured them in the fastq_screen.conf file.
                            this is the output i get when i try to execute the command:
                            Code:
                            Reading configuration from '/fastq_screen_v0.4.1/fastq_screen.conf'
                            Using 8 threads for searches
                            No search libraries were configured at /fastq_screen_v0.4.1//fastq_screen line 119.
                            from a quick peak in the code, it seems the the "libraries" variable never initiated, maybe it needs to be configured somehow by hard coded? or maybe within the configuration file?
                            It sounds like a problem with your configuration file. Could you message me the contents of your /fastq_screen_v0.4.1/fastq_screen.conf file and we can see what's going wrong.

                            Comment


                            • #29
                              I've sent you a message containing the content of the configuration file.

                              Thanks for the help.
                              Cheers.

                              Comment


                              • #30
                                If you're using the latest release (0.4.1) then you'll need to pass the option --aligner bowtie2 since all of the indices you have are bowtie2, this is probably the reason it's failing.

                                We should actually handle this better. I'll get it set up so that if your config file doesn't contain both bowtie1 and bowtie2 indices then it will automatically select the correct one for your run.

                                Let me know if this fixes things.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                9 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X