Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cutadapt and barcode sequences

    Hi all,

    I'm trying to trim 125bp PE illumina reads with cutadapt to remove reads with any adapter sequence presence and then keep PE reeds which have no adapter or barcode readthrough.

    I've run cutadapt with -a AGATCGGAAGAGC -A AGATCGGAAGAGC -m 125 which works fine for read 1's.

    Any adapter readthrough in read 2 will contain 1 of 24 sample barcodes prior to the common adapter being read and given that these are variable and the reads are to be used for SNP analysis, I need to remove these as well.

    The first thought was to use -A NNNNNNNNAGATCGGAAGAGC, but that removes all read 2's in the data set. I tried running – A with reverse complement of all 24 barcodes prior to the common adapter (ie there were 24 -A's) but this removed 34% of data, so I'm not sure if that was correct. I finally tried incorporating the reverse complement of the restriction enzyme used which would be read prior to both barcode and common adapter and that removed around 16% of reads.

    Could anyone advise on the best approach for this kind of analysis or the correct way to do it as I dont feel I am there yet. Cutadapt manual talks about using NNNNNNNN (we have an 8 base barcode) if the barcode is embedded within the adapter, but ours is read before and thus as tried above, -A NNNNNNNNAGATCGGAAGAGC, removes all reads. I tried embedding NNNNNNNN to represent variable barcode between restriction site and adapter, but am not sure if this is correct.

    Thanks very much for any help.

  • #2
    Take a look at Sabre: https://github.com/najoshi/sabre otherwise you could split the samples first before doing the adapter trimming.

    Comment


    • #3
      Hi genomax - appreciate the response.

      could you possibly expand on what you mean by splitting them?

      Many thanks

      Jamie

      Comment


      • #4
        @Jamie: You have inline barcodes at beginning of read 2, which I presume will be used to separate the multiplexed samples?

        I was saying that you could use that information to first bin your R1/R2 reads into sample pools and then trim them afterwards as separate pools. At that point you could trim the barcode using fixed length trimming (followed by barcode trimming, if necessary) as a two pass operation.

        Comment


        • #5
          Hi genomax,
          so samples have already been demultiplexed into libraries of 24 individuals and we have 30 libraries each with separate read 1 and read 2 files. I'm trying to run adapter removal on the 60 total files individually to remove the barcode/adapter read through at the end of read 2's, where the order of components on the read is wanted genomic sequence - restriction site - 8 bp barcode - adapter sequence.

          Comment


          • #6
            I see. So you don't want to separate the 24 individuals further as discriminated by that inline barcode?

            Any trimming should be done with both pairs of files (so in case a read gets dropped from read 2 then the corresponding read would be removed from read 1 keeping the order of reads in R1/R2 files in sync).

            I am not a regular cutadapt user but I can think of how you could do this with bbduk.sh. You could add all possible combinations of restriction site and the 8bp barcode (is that one per individual) in a file (or even to the adapters.fa file in the "resources" directory and then use that as input to scan against your data.

            Let me know if I am still missing something.

            Comment


            • #7
              Your solution sounds right to me, GenoMax. I think with cutadapt, you can specifiy different files for either a forward or reverse primer/adapter sequence, but the idea is the same. I use cutadapt with two fasta files for processing one of our sample types since the kit vendor does the same thing and I'm comparing data with them.

              bbduk is on my list of things to investigate this year, though.

              Comment


              • #8
                Hi Geno and Jessica,

                Apologies for the late thanks for your inputs into the query - very much apreciated.

                It seems the issue I've had is the default match in cutadapt is 3 between sequence and barcode and so obviously 3 N's match with every sequence . So the answer i smuch as you suggets but with increased match values to reduce random 3mer match

                Best to you both

                I've experimented with barcodes as a file and I think the answer will be to forget about using N's, use the multiple barcodes in a file and increase the default match value

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X