Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to remove nextera transposase sequence using cutadapt and fastqc

    Hello everybody,

    We just launched a nextseq500 run recently.
    I analyzed it with Fastqc and I obtained an error at the adapter content plot. It seems like the nextera transposase sequence is too high (see the files).

    So I've tried to trim my reads with cutadapt thus I used the following commands because I'm in paired-end :

    Python-2.7.9/python ~/cutadapt-1.7.1/bin/cutadapt -q 30 -b CTGTCTCTTATACACATCTGACGCTGCCGACGA --minimum-length 20 --overlap=5 -o tmpl1.1.fastq --paired-output tmpl1.2.fastq myRead_S1_L001_R1_001.fastq myRead_S1_L001_R2_001.fastq

    Python-2.7.9/python ~/cutadapt-1.7.1/bin/cutadapt -b CTGTCTCTTATACACATCTCCGAGCCCACGAGAC --minimum-length 20 -q 30 --overlap=5 -o myReads_S1_L001_R2_001.trimmed.fastq --paired-output myReads_S1_L001_R1_001.trimmed.fastq tmpl1.2.fastq tmpl1.1.fastq

    And then I checked my results like this :

    /FastQC/fastqc myReads_S1_L001_R1_001.trimmed.fastq -t 4 -o FASTQTRY/

    Finally I obtained the same plots ! For adapters plots and quality plots as well !

    How is that possible ? How can some reads have a quality score less than 30 ?
    Attached Files
    Last edited by ClemBuntu; 01-21-2015, 08:04 AM.

  • #2
    Hmm, have you tried repeating the run with -a instead of -b? When we trim Nextera adapters we run Cutadapt with -a CTGTCTCTTATA which does the job just nicely (in fact I have done this only an hour ago). If you wanted to use Trim Galore (a wrapper around Cutadapt) you can use the version attached (v0.3.8) with the option --nextera. Usage is simply:
    Code:
    trim_galore --paired --nextera myRead_S1_L001_R1_001.fastq myRead_S1_L001_R2_001.fastq
    Attached Files

    Comment


    • #3
      Hi,
      I've use -b option because it does both 3' or 5' adapters instead of -a doing only 3'

      By the way the adapters were fine removed :
      Cutadapt output :
      === Adapter 1 ===

      Sequence: CTGTCTCTTATACACATCTGACGCTGCCGACGA; Type: variable 5'/3'; Length: 33; Trimmed: 2306073 times.

      === Adapter 1 ===

      Sequence: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC; Type: variable 5'/3'; Length: 34; Trimmed: 2036017 times.
      So the results I obtained with fastq are very strange right ? Did I use it wrong ? Maybe I forgot an option but that seems odd.

      I'v tried to use trim_galore, I've change my .bashrc in order to make this software working, like that :
      alias cutadapt='/home/myhome/Python-2.7.9/python /home/myhome/cutadapt-1.7.1/bin/cutadapt'
      export PATH=$PATH:/home/myhome/FastQC/

      And I got this error :

      >>> Now performing quality (cutoff 30) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATA' from file myReads_S1_L001_R1_001.fastq <<<
      Traceback (most recent call last):
      File "/home/myhome/cutadapt-1.7.1/bin//cutadapt", line 9, in ?
      from cutadapt.scripts import cutadapt
      File "/home/myhome/cutadapt-1.7.1/cutadapt/__init__.py", line 9
      except ImportError as e:
      ^
      SyntaxError: invalid syntax
      I got the same error when I tried to run cutadapt with an old python (v2.4), but the alias in my .bashrc should fix it...
      Last edited by ClemBuntu; 01-22-2015, 02:08 AM.

      Comment


      • #4
        what do you get when you run:
        Code:
        alias cutadapt='/home/myhome/Python-2.7.9/python /home/myhome/cutadapt-1.7.1/bin/cutadapt'
        and then
        Code:
        cutadapt
        You need to get this command to work, or it won't work within Trim Galore. You could also supply '/home/myhome/Python-2.7.9/python /home/myhome/cutadapt-1.7.1/bin/cutadapt' (or rather a version of it that is working) as the path to Cutadapt in one of the first lines of Trim Galore.

        Comment


        • #5
          Launching 'cutadapt' command or all the pathway give me the same thing.

          Anyway, I change the Trim Galore source code as you said and now it works

          According to FastQC the nextera adapter was well remove.
          Now my 2nd question, I used Trim Galore like that :
          ~/trim_galore --paired -q 30 --nextera myReads_S1_L001_R1_001.fastq myReads_S1_L001_R2_001.fastq -o TrimGaloreTry/
          And after I used FastQC on the files I get at the output and I obtained the quality plots I attached.
          My question is : with the "-q 30" option all reads should have a phred score greater or equal than 30, but that's not what FastQC show me. Why ?

          (Edit : I also used cutadapt "manually" and FastQC gave me the same plots)
          Attached Files
          Last edited by ClemBuntu; 01-22-2015, 05:56 AM.

          Comment


          • #6
            Glad to hear that you got the adapter trimming sorted. I suppose the reason why there are still qualities lower than 30 in the file is because the Cutadapt doesn't immediately truncate a sequence as soon as it hits a certain threshold but it uses an algorithm for that:
            Code:
            -q CUTOFF, --quality-cutoff=CUTOFF
                                    Trim low-quality ends from reads before adapter
                                    removal. The algorithm is the same as the one used by
                                    BWA (Subtract CUTOFF from all qualities; compute
                                    partial sums from all indices to the end of the
                                    sequence; cut sequence at the index at which the sum
                                    is minimal) (default: 0)
            So if you only get a single dip in a sequence but all basecalls afterwards are fine again the sequence might pass nevertheless. Does that make sense?

            Comment


            • #7
              Ok that makes sense thanks.
              But do you think it's "normal" that the boxplots extremities are this low ? i.e. up to 14 for 84 -150 bp.
              I used to use cutadapt and the boxplot I obtained are way better than this one, but it's my 1st nextseq run so maybe the quality is lower than HiSeq and MiSeq ?

              Comment


              • #8
                Hmm, hard to tell. But reads that long always show a similar decline towards the 3' end, I don't think its much different for MiSeq to be honest. I would just go ahead with your analysis and see how that goes. You could always come back and perform something more stringent afterwards.

                Comment


                • #9
                  Your read qualities are fine. Could be a lot worse. There were some systematic problems with R2 qualities due to NextSeq reagent problems, you can search these forums for more detail.

                  You are getting this much transposase though because your insert size is too small. You are essentially sequencing all the way to the other end's adaptor. You need larger fragments for sure.

                  And most people don't use read trimming anymore because most modern aligners do read soft-clipping.

                  Comment


                  • #10
                    Originally posted by fkrueger View Post
                    Hmm, have you tried repeating the run with -a instead of -b? When we trim Nextera adapters we run Cutadapt with -a CTGTCTCTTATA which does the job just nicely (in fact I have done this only an hour ago). If you wanted to use Trim Galore (a wrapper around Cutadapt) you can use the version attached (v0.3.8) with the option --nextera. Usage is simply:
                    Code:
                    trim_galore --paired --nextera myRead_S1_L001_R1_001.fastq myRead_S1_L001_R2_001.fastq
                    I second this comment - TrimGalore successfully removed the Nextera adapters (that could not be removed by cutadapt).

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM
                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-14-2024, 06:13 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-08-2024, 08:03 AM
                    0 responses
                    72 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-07-2024, 08:13 AM
                    0 responses
                    81 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-06-2024, 09:51 AM
                    0 responses
                    68 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X