Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming adapters with Cutadapt

    Hi everyone, I'm having some problems trying to figure out what sequence of adapter should I enter as input in cutadapt or trimmomatic to trim them from my fastqs.

    I have a set of fastqs, each of them with a set of reads of 51 bp, comencing with an N and then a series of letters corresponding to the read. I have also the information about the index sequence in each fastq, after demultiplexing, and two sequences determining the primers used. For instance, this is the information about one fastqc I have:

    @700470R:449:HVHH7BCXX:2:1101:1406:1948 1:N:0:GTGAAA
    NGCAGCATTGTACAGGGCTATGAAGATCGGAAGAGCACACGTCTGAACTCC
    +
    #<DDDEHIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIHIIIII
    @700470R:449:HVHH7BCXX:2:1101:1814:1992 1:N:0:GTGAAA
    NCCGGGTGCCGTAGGCTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCA
    +
    #<DDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH<DFGHIIIII
    @700470R:449:HVHH7BCXX:2:1101:2184:1885 1:N:0:GTGAAA
    NGGGGAGGTGGAGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTG
    +
    #<DDD<<CGHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
    The index sequence is, as determined in the header, GTGAAA. I also have information about the SR primer, which is:

    5 ́AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTGGGA3 ́

    and the Index primer, which is:

    5 ́CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT3 ́

    Substituting the NNNNNN with the index sequence provided in the header of the corresponding fastq, I would obtain the barcoded adapter used for sequencing, if I'm not wrong.

    So here is where I start getting lost. After doing fastqc analysis, I got a list with a bunch of sequences in the overrepresented sequences, corresponding to Illumina Multiplexing PCR primer, as if there were different adapters withing the whole fastq in the same file.

    So, here is my question:

    ¿What sequence should I include in cutadapt program to trim in this case, for instance? Should I include more than one? In my oppinion I should include the Index primer sequence substituting the NNNNNN with the index sequence (barcode), for each fastq, but I'm not sure whether this is correct or not, and whether I should include more sequences or not. Also I'm not sure about what parameters I should include to run cutadapt. I assume that I should add the variables -a and -g to include the adaptor sequence in both sides to be trimmed, or if just adding -a would work. Also wondering about Error Tolerance (-e) in matching letters in adapters (don't know what by default value is included if no specification is added). Also wondering about using Wildcards NNNNN as universal adapter or just creating a list for each barcode used in each sample fastq to be included as adapter variable. Also wondering if using Quality trimming would be usefull, although the average quality base call in each read is very high (over 30). And also wondering if ussing --trim-n option to trim possible flanking Ns in my reads...

    As you all see... quite lost I am...

  • #2
    Hi!

    In my experience, using trimmomatic, you can use the information about your platform to remove universal adapters from your reads, no need to know the exact index sequence.

    You can find these universal adapters as part of the trimmomatic package, or can be downloaded from here. Note that the adapter file to be specified in your trimming procedure depends on a combination of platform and nature of sequencing reads (paired/single end). You can find Trimmomatic usage info here. It's very clearly explained and quite self-explanatory, but write back here if you still have issues.

    Comment


    • #3
      Also consider doing a quality analysis of your fastq files before doing any trimming or proceeding in the pipeline. Use FastQC for the quality analysis and then use Trim_galore to trim the reads of adaptors in addition to the general quality improvement of the reads.

      Comment


      • #4
        I think this is what you need:

        cutadapt -a AGATCGGAAGAG -o YOUR_FILE.trim1.fq --minimum-length 15 YOUR_FILE.fastq.gz

        You don't need to put in the index sequence, as cutadapt will remove anything 3' of the adapter sequence, unless you specify otherwise. The minimum length command will throw out any reads less than the specified value. I think the default allowed error rate is 0.1, which should be fine.

        It does look like you can use the --trim-n option to remove the first N.

        It probably isn't necessary to quality trim, although you may want to quality filter before the adapter trimming. Also probably no need for the -g command, unless this was a particular kind of library where you expect to see adapter sequence at the 5' end of the read.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X