Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LizBent
    Member
    • Jan 2012
    • 31

    Process to remove primers, adapters, etc. from Illumina data

    Hi all,

    I have some Illumina paired-end (100 bp) read data and have seen this very useful page: http://intron.ccam.uchc.edu/groups/t...Sequences.html

    It has some primer adapter sequences and primer sequences in it. My question is, do I have to remove any additional sequences than the ones on this page and the primers I used to amplify my cDNA, such as the index primers? Is the sequence for index primers the same as the PE sequencing or PCR primers given in the link above, with just the index tag added?

    Should I worry about reverse complementing all of these and removing those sequences as well?

    Liz
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    It is always good to start with QC on your data. You will find there are several tools to do this.

    FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and Fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) are good ones to start with. There are utilities that will help you remove the adapters if they are present in your sequences. One more alternative is cutadapt (http://code.google.com/p/cutadapt/).
    Last edited by GenoMax; 01-27-2012, 05:45 AM.

    Comment

    • LizBent
      Member
      • Jan 2012
      • 31

      #3
      Hi, I'm aware of the FastXtoolkit and other tools mentioned. My question is not how to remove adapters and primers, but more whether I need to reverse complement adapters and primers and remove the reverse complemented sequences as well.

      Comment

      • fkrueger
        Senior Member
        • Sep 2009
        • 627

        #4
        We find that using the first 13bp of the Illumina adapter ('AGATCGGAAGAGC') efficiently removes adapter contamination for both paired-end files (the adapters on both sides share this sequence before they fork, and any of the Illumina multiplex barcodes should be further downstream of that).

        A typical command for Cutadapt could be

        ./cutadapt -f fastq -O $stringency -q 20 -a AGATCGGAAGAGC input_file.fastq

        $stringency would define the overlap with the adapter required for it to remove sequence from the end, the default is 3 I believe. This command would remove poor quality sequence as well as adapters from your FastQ file.

        You should only be careful with the option of removing sequences if they become too short, because this can throw off the sequence-by-sequence order of paired-end files which is required by many aligners.

        I hope this helps

        Comment

        • LizBent
          Member
          • Jan 2012
          • 31

          #5
          Thanks! I'm trying to figure out how to QC data before trying to use it in Velvet and Trinity.

          Comment

          • LizBent
            Member
            • Jan 2012
            • 31

            #6
            I've tried the 13-mer adapter end sequence with the FastXtoolkit (Clip), and it didn't remove any reads. However, when I use the full primer sequences, reads are clipped and removed. I'm going to try clipping the 13-mer sequence with CutAdapt, but I thought I would mention it in case anyone can tell me the difference between how these programs work.

            Comment

            • rahularjun86
              Member
              • Jan 2011
              • 58

              #7
              Hi,
              One could also try simple grep to have a rough idea regarding the adapter sequences.
              HTML Code:
              grep -c "^GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
              grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG$" *.fastq
              grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
              You can also try 13-mer sequence.
              Thanks,
              Rahul
              Rahul Sharma,
              Ph.D
              Frankfurt am Main, Germany

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              25 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              30 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              39 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              62 views
              0 reactions
              Last Post SEQadmin2  
              Working...