Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina adapter trimming

    Hi All,

    I am a total newbies in this field. I have to assemble RNA seq data. Before that I need to trim the sequences. I have got 100bp illumina paired end reads in two files. I also got the adaptors sequences P5 and P7.
    5-AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC-(insert)-ACCTTAAGAGCCCACGGTTCCTTGAGGTCAGTGXXXXXXTAGAGCATACGGCAGAAGACGAAC-3

    But when for example I use the grep -c 'AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATC' file_name to count the adapters.i cannot find a single one. I am totally a fresher if any one can help me out in detail. I tried to read the on the forums different answers but I am confused.

    regards

  • #2
    You're pretty unlikely to find the entire adapter sequence in any of the reads. You'll want to look into something like cutadapt or trim_galore to make your life easier.

    Comment


    • #3
      Originally posted by dpryan View Post
      You're pretty unlikely to find the entire adapter sequence in any of the reads. You'll want to look into something like cutadapt or trim_galore to make your life easier.
      Hey Thanks dpryan ... I tried trim_galore today ... but still in the fastqc file I am getting these over represented sequences

      ATGACACTCAAACAGGCATGCTCCACGGAATACCATGGAGCGCAAGGTGC 1155666 2.5956349017221085 No Hit
      AATGACGCTCGAACAGGCATGCCCCTCGGAATACCAAGGGGCGCAATGTG 225179 0.5057538004361837 No Hit
      AAGACACTCAAACAGGCATGCCTCTCGGAATACCAAGAGGCGCAAGGTGC 218636 0.4910581711090531 No Hit
      GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA 119619 0.2686652123616139 Illumina RNA PCR Primer (100% over 50bp)
      GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA 111925 0.251384428005364 Illumina RNA PCR Primer (100% over 50bp)
      AAATGACGCTCAAACAGGCATGCCCTTTGGAATACCAAAGGGCGCAATGT 104210 0.2340564774843778 No Hit
      ACAAACCCTTGTGTCGAGGGCTGACTTTCAATAGATCGCAGCGAGGGAGC 71881 0.16144528987673504 No Hit
      GATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAA 46463 0.10435626248303084 Illumina RNA PCR Primer (100% over 50bp)

      So , do i need to remove all these also from my sequences. I hope i am not too much bugging you.

      Regards

      Comment


      • #4
        Adapter Trimming

        Hello.

        I have the same question.

        FastQC can return the output of which sequences are overrepresented. Does this mean we need to removed?

        How do you trim the adapters? You can use the ILLUMINACLIP but I don't know how to create the adapter.fa file.

        Advice?

        But this forum says that if you align with tophat you don't need to cut the adapters

        Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)



        "If you ignore the adapters , using the alignment in Tophat, actually filters the adapters out becuase
        they are not in the transcriptome, so when you are aligning your sequence ot a trasncriptome, the adapters will not get aliged
        because they are not in the transcriptome"

        Comment


        • #5
          I have a relatively dumb question. Doesnt the MiSeq have an integrated adaptor trimming option?

          Comment


          • #6
            The MiSeq has adapter trimming built in if you include it on the sample sheet. We generally do.

            Comment


            • #7
              Hello,

              With the HiSeq 2000, what is the default for adaptor trimming? Is it "on" or "off" when generating FASTQ files?

              Thanks

              Comment


              • #8
                To my knowledge, no trimming is performed by the HiSeq 2000. The HiSeq 2000 only calls the bases. Trimming the adapter sequences, if present, is a downstream step.

                Our local sequencing centre, with many HiSeq 2000 machines, never trims the adapters at the level of the HiSeq 2000. They do the trimming later, if necessary, with Trimmomatic.

                Comment


                • #9
                  Ok, thanks. I called Illumina and the HiSeq 2000 machine can do trimming - it a CLI flag on the FASTQ generation.

                  It turns out the adaptors were not trimmed.

                  - Regards

                  Comment


                  • #10
                    Good to know that the built-in software can do the trimming. I'd still rather have the raw data, and set the trimming parameters myself though.

                    Comment


                    • #11
                      It's a feature that's been in CASAVA and BCL2FASTQ for a few years, but it's never worked really well.

                      Comment


                      • #12
                        Trimmomatic: Which supplied illumina adapter file do I use?

                        Trimmomatic includes Illumina-supplied adapter fasta files:
                        NexteraPE-PE.fa
                        TruSeq2-SE.fa
                        TruSeq3-PE.fa
                        TruSeq2-PE.fa
                        TruSeq3-PE-2.fa
                        TruSeq3-SE.fa

                        I don't know which one to use. My data is paired end. When I asked the Primary Investigator, she gave me this response:

                        I'm not sure which of the adapter fa files it is. The index sequences are are from Epicenter: http://www.epibio.com/docs/default-s...s.pdf?sfvrsn=8 all are from set 1. As for the adapter sequences, they are from the "scriptseq kit".


                        I have been using TruSeq3-PE.fa, but only because I read this is common for recently sequenced data. I read in another forum TruSeq2-PE.fa is pretty generic, and should work. I am not sure what to do, and would appreciate some guidance. Thanks.

                        Comment


                        • #13
                          Hi. Okay you are using Trimmomatic.

                          You first need to know which prep kit was used on the data. For my experiment we had used ILLUMINA prep kit that was found on their website and you can easily download the list of adapters used in the experiment because the covariate file has the prep kit name. We used the TruSeq2 Prep kit (if I remember correctly)

                          The thing to realize is to understand how trimming works.

                          There are 3 ' and 5' adapter sequences that attach to both ends. The universal adapter attaches to the 5' end of read 1 and read 1 also has the indexed adapter on the 3' end.

                          when read 1 is sequenced into the NGS machine, the machine detects the Universal adapter (because there is a primer attached onto the universal adapter) and read 1 skips the universal adapter, and the actual read 1 is everything in the flow cell lane that is after the universal adapter (i.e. <read 1 content><adapter region>

                          Then since this is paired end data, the second read 2 is sequenced, and the second read ends up with the reverse complement of the universal adapter. So if you know the universal adapter used in the experiment, merely calculate the reverse compliment and enter that into the TruSeq-2.fa if it is not already there.


                          Now how to remove the universal adapter?
                          Well read 2 is generated by reading the opposite direction 5' --> 3' and now the indexed adapter is detected by the machine and skips it. So the read 2 contains the fragment content and also the reverse complement of the universal adapter.

                          So all you need to do when using trimmomatic is
                          1) make sure that trimmomatic removes all the content that FOLLOWS the match, and not the exact match itself
                          2) find the common index for all the indexed adapters and enter that into the adapter.fa file
                          3) enter the reverse complement of the universal adapter into the adapter.fa file.

                          Check the alignment files after trimming.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X