Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • trim galore error

    Hi
    I try to run trim galore but received an error message (pasted below). It says "cutadapt ... failed at /usr/local/bin/trim_galore line 420". Anyone knows what it means?

    thanks


    ###############################
    $ No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)


    SUMMARISING RUN PARAMETERS
    ==========================
    Input filename: Sample_C1.R1.fastq.gz
    Quality Phred score cutoff: 20
    Quality encoding type selected: ASCII+33
    Adapter sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG'
    Maximum trimming error rate: 0.1 (default)
    Minimum required adapter overlap (stringency): 20 bp
    Minimum required sequence length before a sequence gets removed: 20 bp
    Running FastQC on the data once trimming has completed
    Running FastQC with the following extra arguments: '--outdir ./'
    Output file will be GZIP compressed

    Writing final adapter and quality trimmed output to Sample_C1.R1_trimmed.fq


    >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG' from file Sample_C1.R1.fastq.gz <<<
    open3: exec of cutadapt -f fastq -e 0.1 -q 20 -O 20 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG Sample_C1.R1.fastq.gz failed at /usr/local/bin/trim_galore line 420

    RUN STATISTICS FOR INPUT FILE: Sample_C1.R1.fastq.gz
    =============================================
    0 sequences processed in total
    Illegal division by zero at /usr/local/bin/trim_galore line 506.
    ^C
    [3]- Exit 255 trim_galore --fastqc_args "--outdir ./" -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG -s 20 Sample_C1.R1.fastq.gz

  • #2
    In order to run Trim Galore you need to install Cutadapt first and set the filepath to the cutadapt executable within Trim Galore. Have you done that?

    Comment


    • #3
      Our IT admin installed the trim Galore. Let me ask him.
      But when I tried $cutadapt -h it says command not found. Any way I can check myself?

      Probably need to install fastQC too Im afarid?

      Originally posted by fkrueger View Post
      In order to run Trim Galore you need to install Cutadapt first and set the filepath to the cutadapt executable within Trim Galore. Have you done that?

      Comment


      • #4
        Not sure if you have the permissions to install Cutadapt on your system, but once this is done you need to edit a line at the top of Trim Galore to specify the path.
        Running FastQC within Trim Galore is optional (Cutadapt is mandatory), but should you choose to install it you need to modify the path likewise

        Comment


        • #5
          I actually don't.

          but I will pass this to him. thanks.

          Originally posted by fkrueger View Post
          Not sure if you have the permissions to install Cutadapt on your system, but once this is done you need to edit a line at the top of Trim Galore to specify the path.
          Running FastQC within Trim Galore is optional (Cutadapt is mandatory), but should you choose to install it you need to modify the path likewise

          Comment


          • #6
            Hi Felix,

            Our IT fixed the problem. It is running now. Thanks for the pointers.

            I have a few more questions for you after reading the documentation:
            1. -s option: what would be a good overlapping length for trimming for 50 SE? I chose 20, what would you use?
            2. -e option: If we use -s 20, 0.1 means two mismatches allowed correct?

            Can trim galore run fastQC without doing trimming? Just wonder if one doesn't want to do trimming, can he still use fastQC in trim galore? I tried $fastqc -h, it doesnt' seem recognize the command.

            thanks!


            Originally posted by fkrueger View Post
            Not sure if you have the permissions to install Cutadapt on your system, but once this is done you need to edit a line at the top of Trim Galore to specify the path.
            Running FastQC within Trim Galore is optional (Cutadapt is mandatory), but should you choose to install it you need to modify the path likewise

            Comment


            • #7
              It just finished running. It is quite fast. I trimmed off about 5.6% of reads, which I think is about right. The fastQC report says I have 5.1% truSeq adaptor index 3 (100% match). Given that we allow 10% mismatches and -s 20, it seems right.

              I used fastx_clipper a few days ago, it trimmed off 20% reads, which I think is a bit too much

              Just one slight problem, I can't seem to unzip the fastQC report. It is under a folder, the file was named x.gz (I didn't name it). when I tried to $gunzip x.gz, it says, gzip: x.gz: not in gzip format. Can you help please?

              Thanks for the great program!

              Comment


              • #8
                Hi JQL,

                (1) the strincency you want to use depends a bit on the application you have, The cutadapt default is 3, meaning that if it finds 3 bases at the 3' end that look like adapter it will trim it. For BS-Seq applications, for which Trim Galore was intended initially, any kind of adapter sequence is detrimental to mapping, methylation calling, or both. I have thus lowered the default to 1 so that it trims of virtually anything looking like adapter. Choosing a value of 20 will probably not remove a lot of adapter sequences at all. For non bisulfite applications you can probably get away with 3 or so, but I would use the default of 1 personally

                (2) the calculation is sound

                About your last comment, if you don't want to do any trimming and run FastQC, why not run FastQC alone? Your IT guys should know where it is installed (maybe you can try a "locate fastqc"?)

                Comment


                • #9
                  Originally posted by JQL View Post
                  It just finished running. It is quite fast. I trimmed off about 5.6% of reads, which I think is about right. The fastQC report says I have 5.1% truSeq adaptor index 3 (100% match). Given that we allow 10% mismatches and -s 20, it seems right.

                  I used fastx_clipper a few days ago, it trimmed off 20% reads, which I think is a bit too much

                  Just one slight problem, I can't seem to unzip the fastQC report. It is under a folder, the file was named x.gz (I didn't name it). when I tried to $gunzip x.gz, it says, gzip: x.gz: not in gzip format. Can you help please?

                  Thanks for the great program!
                  FastQC does not produce any .gz files. Instead it produces one folder with all files in the correct folder stucture, and a .zip report of it. The two outputs should something like this:

                  H1.fq_fastqc
                  H1.fq_fastqc.zip

                  Comment


                  • #10
                    Yep, he just installed fastQC. I will try later

                    I don't think I quite get the -s option yet. Sorry.

                    The TruSeq adaptor index from Illumina has 63 bases as following:
                    GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG

                    My RNA-seq is 50 SE. So, lets say I use -s 3, and I have a read like this: 5' xxxx....xxxxTTG 3' (x: any base). Since it matches the three bases (TTG in blue) at the 3' end, this read is trimmed? Yet another case, if another read like this: xxx...xxxGCC (in red), this one will also be trimmed?

                    If this is true, would it be a bit too non-specific? Or I misunderstood...


                    thanks
                    John

                    Comment


                    • #11
                      Originally posted by JQL View Post
                      Yep, he just installed fastQC. I will try later

                      I don't think I quite get the -s option yet. Sorry.

                      The TruSeq adaptor index from Illumina has 63 bases as following:
                      GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG

                      My RNA-seq is 50 SE. So, lets say I use -s 3, and I have a read like this: 5' xxxx....xxxxTTG 3' (x: any base). Since it matches the three bases (TTG in blue) at the 3' end, this read is trimmed? Yet another case, if another read like this: xxx...xxxGCC (in red), this one will also be trimmed?

                      If this is true, would it be a bit too non-specific? Or I misunderstood...


                      thanks
                      John
                      Your first example is correct, if the read is 5' xxxx....xxxxTTG 3' and -s is 3, the 3 last bases would be trimmed. If it was 5' xxxx....xxxxTTT 3', the read would not be trimmed as the most 3' bases need to overlap. Similarly, if some part of the adapter is found within the read it does only trim the read at this position if the full adapter sequence is found in the rest of the read (for which the -e applies). So your xxx...xxxGCC would not be trimmed.

                      As another note: you need to select the other end of the primer to remove, here GATCGGAAGAG. As Illumina fragments are A tailed, you need to add an additional A at the start, resulting in the sequence: AGATCGGAAGAG.

                      You will also find that this is already the adapter sequence Trim Galore uses by default. I have tried to make Trim Galore a pretty straight forward tool that does the right thing automatically, so if you just run

                      ./trim_galore your_file.fq

                      you should find that it is doing exactly the right thing in one simple command. I would only modify the parameters if I would like it to do something extra special.

                      Comment


                      • #12
                        Just ran the fastQC, it shows a great improvement over non-trimmed reads over several parameters except seq. duplication level which increases from 61% to 71%.

                        I will try -s 3 and see the difference.

                        Comment


                        • #13
                          Thanks Felix! Thats a very nice explanation.

                          Originally posted by fkrueger View Post
                          Your first example is correct, if the read is 5' xxxx....xxxxTTG 3' and -s is 3, the 3 last bases would be trimmed. If it was 5' xxxx....xxxxTTT 3', the read would not be trimmed as the most 3' bases need to overlap. Similarly, if some part of the adapter is found within the read it does only trim the read at this position if the full adapter sequence is found in the rest of the read (for which the -e applies). So your xxx...xxxGCC would not be trimmed.

                          As another note: you need to select the other end of the primer to remove, here GATCGGAAGAG. As Illumina fragments are A tailed, you need to add an additional A at the start, resulting in the sequence: AGATCGGAAGAG.

                          You will also find that this is already the adapter sequence Trim Galore uses by default. I have tried to make Trim Galore a pretty straight forward tool that does the right thing automatically, so if you just run

                          ./trim_galore your_file.fq

                          you should find that it is doing exactly the right thing in one simple command. I would only modify the parameters if I would like it to do something extra special.

                          Comment


                          • #14
                            Hi Felix,

                            After I ran trim_galore and fastQC, I compare the fastQC reports before and after. I have a couple of questions. I ran trim_galore with all default settings.

                            1. In one of my 8 samples, the TruSeq adaptor reads was not removed completely. See the attached pdf file. That adaptor read starts with 5' AGAGCxxx... which overlaps with the last five bases of the default 13 bp adaptor seq. I can't figure out why this one was not trimmed while others were gone? Can you explain? All other 7 samples TruSeq adaptor reads are no longer reported in fastQC report.

                            2. After trimming, in all 8 samples, I start to seeing the big changes in the last 3bases. The sequence and GC contents behaves strangely. Do you suggest removal the last three bases in all reads? See pics.

                            thanks John
                            Attached Files

                            Comment


                            • #15
                              Originally posted by JQL View Post
                              Hi Felix,

                              After I ran trim_galore and fastQC, I compare the fastQC reports before and after. I have a couple of questions. I ran trim_galore with all default settings.

                              1. In one of my 8 samples, the TruSeq adaptor reads was not removed completely. See the attached pdf file. That adaptor read starts with 5' AGAGCxxx... which overlaps with the last five bases of the default 13 bp adaptor seq. I can't figure out why this one was not trimmed while others were gone? Can you explain? All other 7 samples TruSeq adaptor reads are no longer reported in fastQC report.

                              2. After trimming, in all 8 samples, I start to seeing the big changes in the last 3bases. The sequence and GC contents behaves strangely. Do you suggest removal the last three bases in all reads? See pics.

                              thanks John
                              The one sequence that escaped has a slightly different sequence than the standard adapter sequence and was thus probably missed out. Not sure why this is, but you could either run it again using the end of that very sequence in question or just don't bother because a full length adapter sequence is not going to align anyway.

                              The ratio of the last 3 bases changes due to the trimming, so if you remove e.g. the A from the read then the A content will go down while other bases contents go up. The bias in the start looks very much like the bias from random hexamer priming which is quite typical and has been discussed in several threads already. Overall I think you should be good to start aligning your reads now!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X