Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq-mcf paired end adapter trimming

    Hi!

    I'm working on a transcriptome de novo assembly, and I'm having some difficulties removing adapter contaminants from my 100 bp PE reads. According to FASTQC, I have no more than ~1% of my reads with overrepresented adapter sequences. For ex:

    90743 reads 0.4% of reads TruSeq Adapter, Index 6 (100% over 50bp)
    AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG
    42478 reads 0.10% of reads TruSeq Adapter, Index 6 (100% over 50bp)
    GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC

    I've been tried to remove these sequences using fastq-mcf, since this seems to work well for PE reads.

    However, I keep getting way more reads removed then what FastQC is telling me is present. I've been playing around with parameters, but without much improvement. I'm realizing now that the program trims partial adapter sequences from the ends, possibly even if there are just a few base pairs that match the adapter sequence? Is this generally what adapter trimming does?What if I'm only interested in trimming out the sequences overrepresented as described in FastQC, (full 65 bp-50 bp of adapter contaminants)?

    There are so many parameters for this program, and I'm not sure how to set them to remove only what I need... and now I'm not sure what exactly I'm "supposed" to be removing (full or partial adapter sequence matches...)




    Cheers

  • #2
    are the below adapters sequence (OK) for trimming of illumina 250PE DNA reads
    >NexteraUniversalAdapter
    CTGTCTCTTATACACATCT
    >TruSeq_Read1
    AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
    >TruSeq_Read2
    AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
    >Nextera_mate_pair_Read1
    CTGTCTCTTATACACATCT
    >Nextera_mate_pair_Read2
    AGATGTGTATAAGAGACAG
    >PolyA
    AAAAAAAAAAAAAAAAAAAAAAAAAAA
    >sv1
    AATGATACGGCGACCACCGAGATCTACACGCCTCCCTCGCGCCATCAG
    >sv2
    CAAGCAGAAGACGGCATACGAGAT
    >sv3_barcode
    CGGTCTGCCTTGCCAGCCCGCTCAG

    Comment


    • #3
      @mmmm
      It depends whether the library prep was done using a TruSeq kit or a Nextera kit.

      Comment


      • #4
        @gevieir
        The sequences that FastQC lists as over-represented are based on the first 50 bases (5' end) in the reads.

        Adapter sequences are usually found towards the ends of the reads (3' end), when the insert is shorter than the read length and so you read into the adapter sequence.

        So it's not surprising that you would get many more reads trimmed.

        Comment


        • #5
          library prep was done using nexteraxt so

          is there a harm to include (truseq) adapters??

          also, after using fastq-mcf (to trim adapters and remove bases of low qulaity <20)- then check on FastQC, still can see bases at 3' end of lower qulaity (~10-20 Q score)??

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X