Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Process to remove primers, adapters, etc. from Illumina data

    Hi all,

    I have some Illumina paired-end (100 bp) read data and have seen this very useful page: http://intron.ccam.uchc.edu/groups/t...Sequences.html

    It has some primer adapter sequences and primer sequences in it. My question is, do I have to remove any additional sequences than the ones on this page and the primers I used to amplify my cDNA, such as the index primers? Is the sequence for index primers the same as the PE sequencing or PCR primers given in the link above, with just the index tag added?

    Should I worry about reverse complementing all of these and removing those sequences as well?

    Liz

  • #2
    It is always good to start with QC on your data. You will find there are several tools to do this.

    FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and Fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) are good ones to start with. There are utilities that will help you remove the adapters if they are present in your sequences. One more alternative is cutadapt (http://code.google.com/p/cutadapt/).
    Last edited by GenoMax; 01-27-2012, 05:45 AM.

    Comment


    • #3
      Hi, I'm aware of the FastXtoolkit and other tools mentioned. My question is not how to remove adapters and primers, but more whether I need to reverse complement adapters and primers and remove the reverse complemented sequences as well.

      Comment


      • #4
        We find that using the first 13bp of the Illumina adapter ('AGATCGGAAGAGC') efficiently removes adapter contamination for both paired-end files (the adapters on both sides share this sequence before they fork, and any of the Illumina multiplex barcodes should be further downstream of that).

        A typical command for Cutadapt could be

        ./cutadapt -f fastq -O $stringency -q 20 -a AGATCGGAAGAGC input_file.fastq

        $stringency would define the overlap with the adapter required for it to remove sequence from the end, the default is 3 I believe. This command would remove poor quality sequence as well as adapters from your FastQ file.

        You should only be careful with the option of removing sequences if they become too short, because this can throw off the sequence-by-sequence order of paired-end files which is required by many aligners.

        I hope this helps

        Comment


        • #5
          Thanks! I'm trying to figure out how to QC data before trying to use it in Velvet and Trinity.

          Comment


          • #6
            I've tried the 13-mer adapter end sequence with the FastXtoolkit (Clip), and it didn't remove any reads. However, when I use the full primer sequences, reads are clipped and removed. I'm going to try clipping the 13-mer sequence with CutAdapt, but I thought I would mention it in case anyone can tell me the difference between how these programs work.

            Comment


            • #7
              Hi,
              One could also try simple grep to have a rough idea regarding the adapter sequences.
              HTML Code:
              grep -c "^GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
              grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG$" *.fastq
              grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
              You can also try 13-mer sequence.
              Thanks,
              Rahul
              Rahul Sharma,
              Ph.D
              Frankfurt am Main, Germany

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X