Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cutadapt and barcode sequences

    Hi all,

    I'm trying to trim 125bp PE illumina reads with cutadapt to remove reads with any adapter sequence presence and then keep PE reeds which have no adapter or barcode readthrough.

    I've run cutadapt with -a AGATCGGAAGAGC -A AGATCGGAAGAGC -m 125 which works fine for read 1's.

    Any adapter readthrough in read 2 will contain 1 of 24 sample barcodes prior to the common adapter being read and given that these are variable and the reads are to be used for SNP analysis, I need to remove these as well.

    The first thought was to use -A NNNNNNNNAGATCGGAAGAGC, but that removes all read 2's in the data set. I tried running – A with reverse complement of all 24 barcodes prior to the common adapter (ie there were 24 -A's) but this removed 34% of data, so I'm not sure if that was correct. I finally tried incorporating the reverse complement of the restriction enzyme used which would be read prior to both barcode and common adapter and that removed around 16% of reads.

    Could anyone advise on the best approach for this kind of analysis or the correct way to do it as I dont feel I am there yet. Cutadapt manual talks about using NNNNNNNN (we have an 8 base barcode) if the barcode is embedded within the adapter, but ours is read before and thus as tried above, -A NNNNNNNNAGATCGGAAGAGC, removes all reads. I tried embedding NNNNNNNN to represent variable barcode between restriction site and adapter, but am not sure if this is correct.

    Thanks very much for any help.

  • #2
    Take a look at Sabre: https://github.com/najoshi/sabre otherwise you could split the samples first before doing the adapter trimming.

    Comment


    • #3
      Hi genomax - appreciate the response.

      could you possibly expand on what you mean by splitting them?

      Many thanks

      Jamie

      Comment


      • #4
        @Jamie: You have inline barcodes at beginning of read 2, which I presume will be used to separate the multiplexed samples?

        I was saying that you could use that information to first bin your R1/R2 reads into sample pools and then trim them afterwards as separate pools. At that point you could trim the barcode using fixed length trimming (followed by barcode trimming, if necessary) as a two pass operation.

        Comment


        • #5
          Hi genomax,
          so samples have already been demultiplexed into libraries of 24 individuals and we have 30 libraries each with separate read 1 and read 2 files. I'm trying to run adapter removal on the 60 total files individually to remove the barcode/adapter read through at the end of read 2's, where the order of components on the read is wanted genomic sequence - restriction site - 8 bp barcode - adapter sequence.

          Comment


          • #6
            I see. So you don't want to separate the 24 individuals further as discriminated by that inline barcode?

            Any trimming should be done with both pairs of files (so in case a read gets dropped from read 2 then the corresponding read would be removed from read 1 keeping the order of reads in R1/R2 files in sync).

            I am not a regular cutadapt user but I can think of how you could do this with bbduk.sh. You could add all possible combinations of restriction site and the 8bp barcode (is that one per individual) in a file (or even to the adapters.fa file in the "resources" directory and then use that as input to scan against your data.

            Let me know if I am still missing something.

            Comment


            • #7
              Your solution sounds right to me, GenoMax. I think with cutadapt, you can specifiy different files for either a forward or reverse primer/adapter sequence, but the idea is the same. I use cutadapt with two fasta files for processing one of our sample types since the kit vendor does the same thing and I'm comparing data with them.

              bbduk is on my list of things to investigate this year, though.

              Comment


              • #8
                Hi Geno and Jessica,

                Apologies for the late thanks for your inputs into the query - very much apreciated.

                It seems the issue I've had is the default match in cutadapt is 3 between sequence and barcode and so obviously 3 N's match with every sequence . So the answer i smuch as you suggets but with increased match values to reduce random 3mer match

                Best to you both

                I've experimented with barcodes as a file and I think the answer will be to forget about using N's, use the multiple barcodes in a file and increase the default match value

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                39 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X