Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to remove first 15 bases of a sequence using unix or qiime

    So I'm pretty sure all of my sequences have a 15 base pair adaptor sequence at the beginning, and I'd like to remove it, so then my downstream analysis will play nicely within QIIME.

    I've tried changing my mapping file by adding 15bp to each barcode of (NNNNNNNNNNNNNNN), but QIIME says thats not a cool thing to do.

    I'm interested in other ways to do this, and again for the .qual file.

  • #2
    Hi,

    I am not entirely sure what you want to achieve. But is sounds like BBduk or BB reformat could easily do it (http://seqanswers.com/forums/showthread.php?t=58221 ) as could do some simple scripts.

    Comment


    • #3
      Can you post an example 2-3 sequences from the fasta and qual files? BBduk from BBMap can manage the fasta file trim (forcetrimleft=15) for sure but I am not certain about the qual file. You can try it yourself.

      We can always do an awk/perl solution if this does not work.
      Last edited by GenoMax; 05-13-2015, 02:37 PM. Reason: Changed parameter based on Brian's explanation

      Comment


      • #4
        You can do that with BBDuk:

        bbduk.sh in=reads.fq out=trimmed.fq ftl=15

        If you have fasta + qual files:

        bbduk.sh in=reads.fa qfin=reads.qual out=trimmed.fa qfout=trimmed.qual ftl=15

        If not all of the reads contain that, and you only want to trim the reads that have a specific sequence, then use the flags "ktrim=l literal=X k=15" instead, where X is the sequence you want to trim. If you have lots of sequences, you can separate them with commas.

        Edit - double-ninja'd!

        @GenoMax. Sorry the numbering is a little confusing for force-trim; these are the definitions:
        Code:
        	/** Trim left bases of the read to this position (exclusive, 0-based) */
        	private final int forceTrimLeft;
        	/** Trim right bases of the read after this position (exclusive, 0-based) */
        	private final int forceTrimRight;
        	/** Trim this many rightmost bases of the read */
        	private final int forceTrimRight2;
        So to make a 100bp read 85 bp long by trimming the right end, you would say "ftr=84" or "ftr2=15"; to do so by trimming the left side, it would be "ftl=15".
        Last edited by Brian Bushnell; 05-13-2015, 02:34 PM.

        Comment


        • #5
          Brian: Shouldn't that be ftl=14 (0-based correct)?

          Comment


          • #6
            Hi Geno-

            Here are some sample lines of the .fna file, its pretty standard.

            >SRR1502445.1
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCAACACAGGGGGATAGGNNNNNNNNNNNNNNNNNNNNNNN
            >SRR1502445.2
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
            >SRR1502445.3
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
            >SRR1502445.4
            GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNNNNNNNNNN

            Also- thanks for the BBDuk suggestions, however it'd be great to do this with just the base terminal commands rather than installing a new bioinformatics package.

            Comment


            • #7
              No installation is required for BBMap. Download, uncompress and start using (I assume you have Java installed).

              Comment


              • #8
                Originally posted by GenoMax View Post
                No installation is required for BBMap. Download, uncompress and start using (I assume you have Java installed).
                Those are conflicting statements, you can't accomplish not installing anything by installing something . I think I know what you mean though, no external dependencies. We don't really need anything complicated for this task, perl or awk should do fine as mentioned above.

                Here is an example using Perl:

                Code:
                perl -0076 -e 'while (<>) { chomp; my ($name, @parts) = split /\n/; my $seq = join "", @parts; next unless defined $name && defined $seq; substr($seq, 0, 15, ""); print join "\n", ">".$name, "$seq\n" }' seqs.fas
                You would repeat this for seqs.qual.
                Last edited by SES; 05-14-2015, 11:47 AM.

                Comment


                • #9
                  Originally posted by colinn View Post
                  So I'm pretty sure all of my sequences have a 15 base pair adaptor sequence at the beginning
                  Rather than clipping the first 15bp no matter what is in there I would explicitly look for the adapter sequence and trim that. There are tools for that, I use cutadapt, but I'm sure the BBtools can do it as well.

                  Comment


                  • #10
                    Originally posted by dariober View Post
                    Rather than clipping the first 15bp no matter what is in there I would explicitly look for the adapter sequence and trim that. There are tools for that, I use cutadapt, but I'm sure the BBtools can do it as well.
                    @dario: See this thread for background.

                    Comment


                    • #11
                      Originally posted by SES View Post

                      Here is an example using Perl:

                      Code:
                      perl -0076 -e 'while (<>) { chomp; my ($name, @parts) = split /\n/; my $seq = join "", @parts; next unless defined $name && defined $seq; substr($seq, 0, 15, ""); print join "\n", ">".$name, "$seq\n" }' seqs.fas
                      You would repeat this for seqs.qual.
                      Are we golfing?

                      perl -i'.bak' -pe 's/(>.*?\n).{15}/$1/' seqs.fas

                      perl -i'.bak' -pe 's/(>.*?\n)(\s*\d\s+){15}/$1/' seqs.qual

                      Warning: I did not test these.

                      --
                      Phillip

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X