Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Martin, thanks for your efforts!

    Code:
    >$ cd /; python -c 'import cutadapt; print cutadapt.__file__'
    /usr/local/lib/python2.7/dist-packages/cutadapt/__init__.pyc
    pip version:

    Code:
    >$ pip --version
    pip 1.0 from /usr/lib/python2.7/dist-packages (python 2.7)
    Linux version:

    Code:
    >$ uname -a
    Linux dcdell 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
    Distribution version:

    Code:
    >$ cat /etc/issue
    Ubuntu 12.04.5 LTS \n \l
    Last edited by sidsv; 02-22-2015, 06:44 AM.

    Comment


    • In a virtual machine with Ubuntu 12.04.5, I could reproduce the pip problem (where it says 'permission denied'). It has been fixed, but not in the version that comes with Ubuntu 12.04 and there seems to be no workaround.

      I cannot reproduce the other problem of you getting 1.2.1 while you have 1.7.1 installed, but I can think of a few two more things to try: The first is to not install the program at all - just unpack the tar.gz and run cutadapt-1.7.1/bin/cutadapt (you cannot move the program anywhere if you do this). The second is to try to install the program using Python 3: 'python3.3 setup.py install --user'. The third option of course is to ask your system admin to upgrade the 'globally' installed cutadapt ('pip install --upgrade cutadapt' as root should work).

      Comment


      • Martin, thank you very much for help!

        I just ran cutadapt without installation as you said and it worked pretty fine! Maybe I'll ask our system admin to upgrade 'global' cutadapt installation.

        Thanks again!

        Comment


        • Great to hear you got it to work! Let me know if there are any further problems.

          Comment


          • cutadapt 1.8

            Hi, cutadapt 1.8 has been released. It (finally) comes with proper paired-end support (no need to run cutadapt twice), quality-trimming of 5' ends and filtering of reads that have too many N bases.
            Here is a copy of the full changelog:
            • Support single-pass paired-end trimming with the new -A/-G/-B/-U parameters. These work just like their -a/-g/-b/-u counterparts, but they specify sequences that are removed from the second read in a pair.

              Also, if you start using one of those options, the read modification options such as -q (quality trimming) are applied to *both* reads. For backwards compatibility, read modifications are applied to the first read only if neither of -A/-G/-B/-U is used. See the documentation for details.

              This feature has not been extensively tested, so please give feedback if something does not work.
            • The report output has been re-worked in order to accomodate the new paired-end trimming mode. This also changes the way the report looks like in single-end mode. It is hopefully now more accessible.
            • Chris Mitchell contributed a patch adding two new options: --trim-n removes any N bases from the read ends, and the --max-n option can be used to filter out reads with too many Ns.
            • Support notation for repeated bases in the adapter sequence: Write A{10} instead of AAAAAAAAAA. Useful for poly-A trimming: Use -a A{100} to get the longest possible tail.
            • Quality trimming at the 5' end of reads is now supported. Use -q 15,10 to trim the 5' end with a cutoff of 15 and the 3' end with a cutoff of 10.
            • Fix incorrectly reported statistics (> 100% trimmed bases) when --times set to a value greater than one.
            • Support .xz-compressed files (if running in Python 3.3 or later).
            • Started to use the GitHub issue tracker instead of Google Code. All old issues have been moved.

            Comment


            • Hi Martin,

              Great software!

              I want to use cutadapt for PacBio CCS data. However, the CCS data have the adaptor/primer in the middle of the long sequence, just like the following example:
              Code:
              [omit 2kb]ACTCCCAT[COLOR="Red"]GTACTCTGCGTTGATACCAC[/COLOR]TGCTTATCTCTCTCCTCCGTAGAGGGTGAGAGAGATAAGCAGTGGTA[COLOR="red"]TCAACGCAGAGTACATGGG[/COLOR]AGTCCTCACT[omit 2kb]
              I want to find a software to split the sequence according to the primer sequence pairs. Do you plan to integrate this feature in the future version?

              Best,
              Pengcheng

              Comment


              • Hi Pengcheng, I’ve thought about this briefly but wasn’t sure how much interest would be for it and how useful that would be. Could you perhaps post this to the issue tracker? If you don’t want to create an account, I’m happy to add the issue for you.

                Comment


                • Originally posted by mmartin View Post
                  Hi Pengcheng, I’ve thought about this briefly but wasn’t sure how much interest would be for it and how useful that would be. Could you perhaps post this to the issue tracker? If you don’t want to create an account, I’m happy to add the issue for you.
                  OK, I will create one issue there. Because I have write a perl script to do the work quick and dirt. Just follow the logic that:
                  First using blat locate the primer position; Second delete the primer sequence at two ends with a criteria "maxDist", which denote the maximum allowed distance to the end; Third, if the primer locate at the middle of the sequence, and the distance to both end greater than a minimum distance, the sequence will be split into two parts and split the quality correspondingly.

                  Comment


                  • Hi Martin,

                    Just another question when I using cutadapt. I want to cut the polyA/T for the PacBio sequences. One example here:
                    Code:
                    @c42683/f1p3/2901 isoform=c42683;full_length_coverage=1;non_full_length_coverage=3;isoform_length=2901
                    cttttcagacagaggtagtgatcgggtgtagttagtgcggttgagtttgtgcgTGTTCTAGGGTTTGAGTAAATTTGTGTGCACCATGAGCTGGCAGGATTATGTAGACAAGCAGTTGCTGGCTTCTAAATGTGTAACTAAAGCAGCAATTGCTGGACATGATGGAAATGTTTGGGCGAAATCGGATGGATTTGAAGTATCGAAAGAAGAAATTGCAAAGTTGGTGCAAGGTTTTGAAAAACAGGATATCTTGACGTCTTCTGGTGTCACGTTGGCAGGAAACAGATATATTTACCTGTCTGGAACTGATAGAGTTATTAGGGCAAAACTTGGAAAGGTTGGTGTGCATTGCATGAAGACACAGCAAGCTGTTGTAGTTTCACTATATGAGGATCCCATCCAACCACAACAGGCAGCCTCTGTTGTGGAAAAATTAGGGGATTACTTAGTGTCCTGTGGATACTAGAGGTATAATAGACTGTTCTCCTGTGGTGATATGAAGCAGCAGCAGCAACAACAGAGATGGTCGTTTTTTTCTATACAGCGAACTGTAATGTGCATCATGGTTCTGGCTAATAATTCAGTTTGAGTTTAAACTCATATGGTGAAAATCTTGAACTGCATTTTTCTTTTCTCAGTACCTTGTGTCTGAATGTTTGTGGCAAGTATTGCTTCAGTTTGATAATGAGGCACTTGAGCATATCATGGGCTCTTGGAAGTGGACAAATTGTTGGCAACGTTCTCCAAAGTACACTTTGGCTACTAATCCAAGGTGTACCTCGAAGTTCGCCTGGGATGAAAATTGAAGAATTGTAGCTGATAATAGTATGACCATTGCATTTGATACTGAGTCATTAGGATTTTTATAATCTCTAGTCCTGCTCTTTCTACCCCATATCTTTCCCTTTGCCGAGTGAAATTTTGTTAAAAATTAATTTATTGGCGAACTCTTCAATGCTTTAGAACCCAGTGTACTCATTCCTTGTCTATATGTATCACAACCAATTGTCGGAAGCTTGAATGACAAATATTGGTGGAGCTCAGAAGGAGGGCTTGCGCAGTGTGCGTAGTGTCTCCTGCCTGCACTGTGATTGTGGTGTAATTTTAAGATGGTTTGCAGACTATAAATATAATGGATAAAAGCTCGTTTTAGTGCTATGTTGCTGAAAGGATTTGTGGTGGCTTTGATTTTTACTACGGTCTAATTTAAACAATAATAGCTTTGTAAAAGAATTGAATATGGAATTTGAGAAATTTACCAAAATCAAATTTGTATAAAAGTATGGCATAGTCGGTTGAATGACTTCTACTGCAACATATTGTTGAAGTACTAGTTATTCAGCTAATGTGAAAACTGGGGGAATATGAATTTTACAGTAATCTTTTTTATGTAAAGCGTTAGTGtagcaaagttatatatcgttttttttttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttatttttttttttttatttttttttttttttttttttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttttgtttttatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttatgtttatttttttttttttttttttttttttttttttttttattttttttttttattttttttttttttttttgttttttttttattattttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttctttttttttttttaattttattttattttttttttttttttttttttttttttttttttttttttttttttttttttttttttattttttttttttttttttttttttttttgtgattttttttttttttttttttttttatttatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttattttttttttttttttttttttattttttttttttttttttttttttttttttttttttgttatttttttttttttatttatttttttttttttttttttttttttttttttttttgtttttttttttttttttatttttatttttttttttttattttttttttttttttttttatttttttgtttttttttttttttttttttattttttttttatttttttttttttttttttattttttttttaattttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttattattttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttgtttttttttttattttttttgttttttttttttttatttttttttttttttttttttttttttttttttttttttttttttttttttttgtttcctataatttaaagagatattttgtcatgtaaaatggaattatgcttggccccggcctttttgtgctctacttcaggggcgagagcaattttgtgtgccaattgtcattaccaacttgaggttagtctcctgtggatttattagcattttgtagctgattttatgaccactgtattcagtattttcgttcattcttatgagattgtactgttcccctcagggcccccaataaaattaagc
                    As you see, there a long "t" at the 3' end, contrary to our expected at the 5' end. So cutadapt give the results:
                    Code:
                    @c42683/f1p3/2901 isoform=c42683;full_length_coverage=1;non_full_length_coverage=3;isoform_length=2901
                    ttttttttttttttttttttttatttttttttttttttttttttttttttattattttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttgtttttttttttattt
                    tttttgttttttttttttttatttttttttttttttttttttttttttttttttttttttttttttttttttttgtttcctataatttaaagagatattttgtcatgtaaaatggaattatgcttggccccggc
                    ctttttgtgctctacttcaggggcgagagcaattttgtgtgccaattgtcattaccaacttgaggttagtctcctgtggatttattagcattttgtagctgattttatgaccactgtattcagtattttcgttc
                    attcttatgagattgtactgttcccctcagggcccccaataaaattaagc
                    You may say that the sequence at the 5' end maybe wrong. I have searched this sequence against NR database, give a functional protein result. So, I want to cut the polyA/T at both end and using a parameter to judge whether to cut, like a max distance to the end.

                    Comment


                    • Originally posted by pengchy View Post
                      Because I have write a perl script to do the work quick and dirt. Just follow the logic that:
                      First using blat locate the primer position...
                      Maybe you can use cutadapt’s --info-file option: In column 5, 6 and 7, it gives you (for each read), the sequence before the adapter, the sequence that matched the adapter and the sequence after the adapter. The read name is in column 1.

                      The info file currently does not contain qualities, but that is also easy: Edit cutadapt/scripts/cutadapt.py and change it in this way: https://gist.github.com/marcelm/8406e8a48995b766051c .

                      Example command-line:
                      Code:
                      cutadapt --info-file info.txt -o /dev/null -a ADAPTER input.fastq
                      Of course you’d still need to do some scripting and it’s just a suggestion. Feel free to use BLAT if that feels easier!

                      Regarding your second question, I’m not sure what you want exactly. Could you give a toy example that describes what should go into cutadapt and what should come out?

                      Comment


                      • Originally posted by mmartin View Post
                        Maybe you can use cutadapt’s --info-file option: In column 5, 6 and 7, it gives you (for each read), the sequence before the adapter, the sequence that matched the adapter and the sequence after the adapter. The read name is in column 1.

                        The info file currently does not contain qualities, but that is also easy: Edit cutadapt/scripts/cutadapt.py and change it in this way: https://gist.github.com/marcelm/8406e8a48995b766051c .

                        Example command-line:
                        Code:
                        cutadapt --info-file info.txt -o /dev/null -a ADAPTER input.fastq
                        Of course you’d still need to do some scripting and it’s just a suggestion. Feel free to use BLAT if that feels easier!

                        Regarding your second question, I’m not sure what you want exactly. Could you give a toy example that describes what should go into cutadapt and what should come out?
                        Hi Martin,

                        Thanks for your reply.

                        For the first question, I have initialize a issue at github: https://github.com/marcelm/cutadapt/issues/120

                        For the second question, It just like the first question. The question is: The PolyA is not always at the 3' end, It can be detected at 5' end several times. The similar to PolyT. It seems this question can be resolved by "-b" parameter.

                        I have tested the "-b" parameter, it cann't always give all positions as reported by "-a -g" used simultaneously. one example is:
                        Code:
                        -g AAGCAGTGGTATCAACGCAGAGTACATGGGG -a GTACTCTGCGTTGATACCACTGCTT
                        
                        c9492/f3p2579/4102      1       2041    2073
                        c21590/f1p174/3554      2       1826    1857
                        c12682/f10p5/1801       0       0       22
                        [COLOR="Red"]c19086/f1p10/1705       0       1678    1703[/COLOR]
                        
                        
                        -b AAGCAGTGGTATCAACGCAGAGTACATGGGG
                        c9492/f3p2579/4102      1       2041    2073
                        c21590/f1p174/3554      2       1826    1857
                        c12682/f10p5/1801       0       0       22
                        two runs, all other parameters are same except one is "-a -g" and another is "-b".
                        The colored hit is not detected by "-b"

                        BTW: the webpage https://gist.github.com/marcelm/8406e8a48995b766051c is not visible. Could you paste the key steps here? Thank you.
                        Last edited by pengchy; 04-16-2015, 06:37 PM. Reason: add a message

                        Comment


                        • Here is the "gist" (sorry, but the indentation is messed up):
                          Code:
                          diff --git i/cutadapt/scripts/cutadapt.py w/cutadapt/scripts/cutadapt.py
                          index 855721d..2eaf435 100755
                          --- i/cutadapt/scripts/cutadapt.py
                          +++ w/cutadapt/scripts/cutadapt.py
                          @@ -155,6 +155,7 @@ class AdapterCutter(object):
                          # TODO write only one line, even for multiple matches
                          for match in matches:
                          seq = match.read.sequence
                          + qualities = match.read.qualities
                          if match is None:
                          print(match.read.name, -1, seq, sep='\t', file=self.info_file)
                          else:
                          @@ -167,6 +168,9 @@ class AdapterCutter(object):
                          seq[match.rstart:match.rstop],
                          seq[match.rstop:],
                          match.adapter.name,
                          + qualities[0:match.rstart],
                          + qualities[match.rstart:match.rstop],
                          + qualities[match.rstop:],
                          sep='\t', file=self.info_file
                          )
                          Regarding the -g/-a vs -b thing: It should give the same results, but only if all the adapter sequences are identical. You used a different adapter sequence for -a, so then the results will not be the same.

                          I will need some time to work on the issue 120 you filed because I need to understand in detail what is going on.

                          Comment


                          • cutadapt 1.10 has just been released with support for "linked adapters" (5'/3' adapter pairs) and NextSeq-specific trimming, see the changelog at http://cutadapt.readthedocs.io/en/stable/changes.html .

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM
                            • seqadmin
                              The Impact of AI in Genomic Medicine
                              by seqadmin



                              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                              02-26-2024, 02:07 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-14-2024, 06:13 AM
                            0 responses
                            32 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-08-2024, 08:03 AM
                            0 responses
                            71 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-07-2024, 08:13 AM
                            0 responses
                            80 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-06-2024, 09:51 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X