Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Method for Identifying Adapters??

    I have been given Illumina paired end sequences for a chicken genome. I do not have access to the experiment protocol and the adapter files nor do I know which Illumina machine or sequencing adapters were used.

    I'd like to perform a de novo assembly but would like to trim adapters first, if they haven't been trimmed already. I know there are a few tools I can use to trim my paired end sequence files like FastqcMcf and Trimmomatic, but I need to have an adapter .fa file or know which adapters were used (TruSeq2, 3, etc). Also is there a way to get an updated adapter list for Illumina sequences? The Trimmomatic program that I just installed only has Nextera, TruSeq2 and 3 .fa files.

    How can I discover which adapter sequences were used?

    Thank you for your time.

    PS - I considered looking at overrepresented sequences reported by FastQC but I can only look for kmers of 5 to 10 nucleotides in length. Below is sample header from one line in my forward file...

    FASTQC had the following for Encoding:
    Encoding Sanger / Illumina 1.9

    @HWI-ST531R:257:H7R0WADXX:2:1101:1240:1994 1:N:0:GTGAAA
    CTCACTTTGCCATGTTTCTATTTGAACAGATGATAATTTTACCTTTTGGGTGAAAAATAAAATACGCCTCTCTTTGCACTCTGTTATTTGCCAAAGTAGAG
    +
    @CCFFFFFHHHHHIIJJJJIJJJJJJJJJJJIIIIHJIEHIIJJJJIJJJBFHIJJJDIIJJEGHJJJIIJJJHHHHHHFFFFDFCDEEEEEDDDD@CDDC
    Last edited by nemis00; 10-17-2015, 05:32 PM.

  • #2
    All Illumina adapter sequences (legacy and current) has been described in Illumina Customer Sequence Letter:https://support.illumina.com/content...nce-letter.pdf

    Comment


    • #3
      Thank you. One of my searches led me to the link you provided. What I am looking for is a tool or method to detect adapters from my sequence information? Does such a tool exist?

      The link does not help.

      Comment


      • #4
        There are few software for trimming adapters. I hope bioinformaticians to provide a list and possibly recommend one for your particular application.

        Comment


        • #5
          I mentioned that in my post. I have not been provided adapters. Instead of waiting an entire weekend, I am curious if there is a way to detect adapters. I'm sorry, you are not very helpful. But thank you for your time. I should have also added the tag "bioinformatics."

          Comment


          • #6
            There are two options for preparing whole genome sequencing libraries for Illumina platforms. They either have used Nextera or TruSeq kits. Some reads in the library would be from short inserts (insert length shorter than read length) and therefore the 3’ end of read would contain adapter residues. You can search for partial Nextera adapter sequences (AGATGTGTATAAGAGACAG) and Truseq (CTCTTCCGATCT) in a subset of your reads to see which one has been used. If library has been prepared with another method you have to get that info from person preparing libraries.

            Comment


            • #7
              There is only paired-end sequence information. It would be nice to have libraries with fragment sizes to work with, but is all shotgun sequence information without restriction enzyme treated libraries.

              On my original question:

              I think I found a tool worth trying - skewer.

              I'll map to a reference, although I'm not too happy with the reference yet, using bowtie2 and compare map percent of reads mapped for original sequences versus adapter removed sequences and compare coverages as well. Thank you for your efforts.

              Comment


              • #8
                When we had this problem, we just tried different adapters with the first 100,000 reads of one sample, and picked the adapters that produced the greatest number of trimmed bases.

                Comment


                • #9
                  Ideally you should have minimal adapter contamination if this is a good library.
                  BBMap includes a file with standard illumina adapter sequences in the "resources" directory. You can use multiple adapter files to scan with "bbduk.sh" (the trimming program in BBMap). While there you could look at "reformat.sh" section and see if you can use the method for detecting unknown primers/k-mers, if you expect non-standard adapters.

                  Comment


                  • #10
                    If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):

                    bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

                    But as GenoMax mentioned, normally, you can find the adapter sequence used in the adapters.fa file included with BBMap. In that case, you can do this:

                    bbduk.sh in1=r1.fq in2=r2.fq k=23 ref=adapters.fa stats=stats.txt

                    stats.txt will then list the names of adapter sequences found, and their frequency.

                    Comment


                    • #11
                      Just two small additions to GenoMax/Brians answers:

                      You may see significant adapter contaminations already in the fastqc report -> E.g. if you have a significant 5' adapter contamination, you will see the adapter sequence in the "Per base sequence content".

                      If using bbduk for adapter identification, keep in mind that you can also add non-standard adapters simply by adding them as fasta records to the adapters.fa.gz file in the resources folder. By reducing the k-mer size (k parameter) you may also look for partially trimmed adapters. It may sometimes also be of interest to allow 1 error in the k-mer matching (hdist parameter). Last but not least: You will probably obtain the best result if you subset your data (eg use the first 500k reads -> reads parameter) and play a little with the settings to get the highest precision.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      8 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      49 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      66 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X