Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Originally posted by arcolombo698 View Post
    Hello. Thank you in advance.

    do I just use the universal adapter as the input for cutadapt???
    See this guide: http://onetipperday.blogspot.com/201...torprimer.html

    Comment


    • #92
      Cutadapt

      Hello.

      Yes this website was helpful.

      Does cutadapt take a Fasta Adapter file which specifies which adapters to cut out? it does not appear that the -b, -g -a can cut the 28 different sequences I need trimmed.

      Thank you again in advance

      Comment


      • #93
        Use "trim galore" which is a wrapper for cutadapt to simplify things.

        Comment


        • #94
          There is a section in the README about this:


          In short, just run:

          Code:
          cutadapt -a AGATCGGAAGAGC -o trimmed.1.fastq.gz reads.1.fastq.gz
          cutadapt -a AGATCGGAAGAGC -o trimmed.2.fastq.gz reads.2.fastq.gz
          See the other sections in the README if you need to do more specialized things.

          Comment


          • #95
            Hi,
            I am using cutadapt for removing the adapter sequence. I have 2 adapter sequence.

            RNA 5Adapter (RA5)
            5 GUUCAGAGUUCUACAGUCCGACGAUC
            RNA 3?Adapter (RA3)
            5 TGGAATTCTCGGGTGCCAAGG

            The 1st one is 5' adapter and 2nd is 3' adapter.

            I am using the following command line to remove the adapter seq.

            cutadapt -a TGGAATTCTCGGGTGCCAAGG -g GUUCAGAGUUCUACAGUCCGACGAUC input.fastq > output.fastq

            Length Distribution I get
            Mean sequence length: 32.49 ± 10.53 bp
            Minimum length: 16 bp
            Maximum length: 51 bp
            Length range: 36 bp
            Mode length: 51 bp with 2,852,626 sequences


            And I found that the 5' adapter has U instead of T. Will that be fine?

            I tried replacing U with T GUUCAGAGUUCUACAGUCCGACGAUC > GTTCAGAGTTCTACAGTCCGACGATC and tried removing adapter sequence.

            cutadapt -a TGGAATTCTCGGGTGCCAAGG -g GTTCAGAGTTCTACAGTCCGACGATC input.fastq > output.fastq

            Length Distribution I get
            Mean sequence length: 31.26 ± 11.29 bp
            Minimum length: 1 bp
            Maximum length: 51 bp
            Length range: 51 bp
            Mode length: 51 bp with 2,805,271 sequences

            I get varied length distribution in both the cases. Which one should I choose..
            First is the command that I am using is right??

            Kindly let me know.

            Thanks in advance.

            Regards
            Vishwesh

            Comment


            • #96
              Originally posted by vishwesh View Post
              Hi,
              cutadapt -a TGGAATTCTCGGGTGCCAAGG -g GUUCAGAGUUCUACAGUCCGACGAUC input.fastq > output.fastq
              Cutadapt removes only one adapter per read, so you need to run it twice with each adapter or specify the option --times=2. Also, you should use specify the 3' adapter starting with a "^" like so: -g ^GTTCAGAG...

              And I found that the 5' adapter has U instead of T. Will that be fine?
              No, not in cutadapt versions up to 1.4.2. But since it's a very good idea to support this, I just added this feature to cutadapt: Starting with cutadapt 1.5, all Us will be automatically replaced with Ts in the adapter sequence.

              Comment


              • #97
                Hi guys

                I am using cutadapt 1.3 (I will update to the new version soon), but there is an issue. After trimming the adapter it leaves some empty lines.

                Is there a way not to leave empty lines? I don't want to write a script that parses again the file and fixes it.

                Thank you in advance
                Last edited by foivos; 04-23-2014, 04:43 AM.

                Comment


                • #98
                  Originally posted by foivos View Post
                  Hi guys

                  I am using cutadapt 1.3 (I will update to the new version soon), but there is an issue. After trimming the adapter it leaves some empty lines.

                  Is there a way not to leave empty lines? I don't want to write a script that parses again the file and fixes it.

                  Thank you in advance
                  Following only addresses issue of removing empty lines (I assume the results file is otherwise ok). It may be safer to write to a temp file instead of overwriting the original: http://stackoverflow.com/questions/1...om-a-unix-file
                  Last edited by GenoMax; 04-23-2014, 05:04 AM.

                  Comment


                  • #99
                    Originally posted by foivos View Post
                    I am using cutadapt 1.3 (I will update to the new version soon), but there is an issue. After trimming the adapter it leaves some empty lines.

                    Is there a way not to leave empty lines?
                    Are you talking about reads that have a length of zero? This will appear as empty lines in the output file. Use cutadapt's --minimum-length option and set it to 1 or some higher value to avoid getting empty reads.

                    Do not do what is described in the stackoverflow link because it will break your FASTQ file.

                    Comment


                    • No I will not remove the lines as described in stackoverflow.

                      I will isolate the problem, as it is part of a pipeline and I will make a new post soon.

                      Comment


                      • Here is what I get
                        @BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4054:2147 1:N:0:GCCAAT
                        TTAGGAAGAGGATAACAATTNGAAACAGTTGCTAAAACTCTATATGC
                        +
                        CCCFFFFFGHHHHJJJJJJJ#4AHGGIJIJJIJIJJJJJJJJJJJJJ
                        @BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4107:2164 1:N:0:GCCAAT
                        AGTACCCCATGGAC
                        +
                        ?1?DD?BDA:C;22
                        @BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4138:2178 1:N:0:GCCAAT
                        ATCGACACTTCGAACGCACTTGCGGCCCCGGGTTCCTCCCGGGGCTACGCC
                        +
                        CCCFFFFFHHHHHJJJJJJJJJIJJGGJJ:FG-5@D>EEH<?A@/'5<;;B
                        @BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4219:2179 1:N:0:GCCAAT

                        +

                        @BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4242:2199 1:Y:0:GCCAAT
                        CATACAGGACTCTTTCGAGGCCCTC
                        +
                        ==>A+2@<+?+?22<A+23)@C+1=
                        It keeps the identifier and the "+" and removes the adapter and the sequence.

                        I want it to remove everyting and not leave any gaps...

                        Comment


                        • You can do that in post-processing. Just put everything on one line using sed:

                          sed 'N;N;N;s/\\n/\\t/g'

                          then remove lines containing \t+\t and after change all \t to \n.

                          Marcel, is version 1.5 up yet? I can only find 1.4.2 as the latest version. If not, when do you anticipate 1.5 to be out?
                          Thanks!
                          Last edited by sp144; 07-31-2014, 03:37 PM.

                          Comment


                          • Originally posted by sp144 View Post
                            Marcel, is version 1.5 up yet? I can only find 1.4.2 as the latest version. If not, when do you anticipate 1.5 to be out?
                            Thanks!
                            Taking your question as a motivation: I've just released cutadapt 1.5! As always, see https://code.google.com/p/cutadapt/ for the changelog and download it from PyPI. Or, even better, just use "pip install cutadapt". Here is a copy of the changelog:
                            • Adapter sequences can now be read from a FASTA file. For example, write -a file:adapters.fasta to read 3' adapters from adapters.fasta. This works also for -b and -g. This fixes the long-standing issue #33. Note that cutadapt isn't really optimized for trimming dozens or even hundreds of adapters!
                            • There is now an option --mask-adapter, which can be used to not remove adapters, but to instead mask them with N characters. Thanks to Vittorio Zamboni for contributing this feature!
                            • U characters in the adapter sequence are automatically converted to T.
                            • Add the option -u/--cut, which can be used to unconditionally remove a number of bases from the beginning or end of each read.
                            • When the new option --quiet is used, no report is printed after all reads have been processed.
                            • When processing paired-end reads, cutadapt now checks whether the reads are properly paired.
                            • To handle paired-end reads, an option --untrimmed-paired-output was added.

                            Comment


                            • Hi mmartin,

                              I'm using the latest version (1.5) and I noticed the format of the info file doesn't seem to match exactly with the documentation on github (https://github.com/marcelm/cutadapt/...ster/README.md). According to it there's supposed to be 8 columns but I only get 7. Column 5 (Sequence of the read before the adapter match) seems to have been removed, yes?

                              It's not a big deal I don't think as I can recreate the full read by concatenating columns 5 and 6, like the page says ("The concatenation of the fields 5-6 yields the full read sequence."). Or am I missing something?

                              thanks!

                              Comment


                              • Originally posted by captainentropy View Post
                                According to it there's supposed to be 8 columns but I only get 7.
                                There should still be eight fields, but perhaps one of the columns is empty? In that case, you'd have two consecutive tabs within a single line and it'd appear as if you only have seven fields.

                                Column 5 (Sequence of the read before the adapter match) seems to have been removed, yes?
                                The format hasn't changed, but I realize that the wording in the README is confusing: The "Sequence of the read before the adapter match" is actually the "sequence of the read to the left of the adapter match".

                                I've tried to clarify all this in the README now. I've also fixed a mistake in the description of how to get the original read sequence: You need to concatenate columns 5-7, not columns 5-6. Hope that helps!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X