Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Martin, thanks for your efforts!

    Code:
    >$ cd /; python -c 'import cutadapt; print cutadapt.__file__'
    /usr/local/lib/python2.7/dist-packages/cutadapt/__init__.pyc
    pip version:

    Code:
    >$ pip --version
    pip 1.0 from /usr/lib/python2.7/dist-packages (python 2.7)
    Linux version:

    Code:
    >$ uname -a
    Linux dcdell 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
    Distribution version:

    Code:
    >$ cat /etc/issue
    Ubuntu 12.04.5 LTS \n \l
    Last edited by sidsv; 02-22-2015, 06:44 AM.

    Comment


    • In a virtual machine with Ubuntu 12.04.5, I could reproduce the pip problem (where it says 'permission denied'). It has been fixed, but not in the version that comes with Ubuntu 12.04 and there seems to be no workaround.

      I cannot reproduce the other problem of you getting 1.2.1 while you have 1.7.1 installed, but I can think of a few two more things to try: The first is to not install the program at all - just unpack the tar.gz and run cutadapt-1.7.1/bin/cutadapt (you cannot move the program anywhere if you do this). The second is to try to install the program using Python 3: 'python3.3 setup.py install --user'. The third option of course is to ask your system admin to upgrade the 'globally' installed cutadapt ('pip install --upgrade cutadapt' as root should work).

      Comment


      • Martin, thank you very much for help!

        I just ran cutadapt without installation as you said and it worked pretty fine! Maybe I'll ask our system admin to upgrade 'global' cutadapt installation.

        Thanks again!

        Comment


        • Great to hear you got it to work! Let me know if there are any further problems.

          Comment


          • cutadapt 1.8

            Hi, cutadapt 1.8 has been released. It (finally) comes with proper paired-end support (no need to run cutadapt twice), quality-trimming of 5' ends and filtering of reads that have too many N bases.
            Here is a copy of the full changelog:
            • Support single-pass paired-end trimming with the new -A/-G/-B/-U parameters. These work just like their -a/-g/-b/-u counterparts, but they specify sequences that are removed from the second read in a pair.

              Also, if you start using one of those options, the read modification options such as -q (quality trimming) are applied to *both* reads. For backwards compatibility, read modifications are applied to the first read only if neither of -A/-G/-B/-U is used. See the documentation for details.

              This feature has not been extensively tested, so please give feedback if something does not work.
            • The report output has been re-worked in order to accomodate the new paired-end trimming mode. This also changes the way the report looks like in single-end mode. It is hopefully now more accessible.
            • Chris Mitchell contributed a patch adding two new options: --trim-n removes any N bases from the read ends, and the --max-n option can be used to filter out reads with too many Ns.
            • Support notation for repeated bases in the adapter sequence: Write A{10} instead of AAAAAAAAAA. Useful for poly-A trimming: Use -a A{100} to get the longest possible tail.
            • Quality trimming at the 5' end of reads is now supported. Use -q 15,10 to trim the 5' end with a cutoff of 15 and the 3' end with a cutoff of 10.
            • Fix incorrectly reported statistics (> 100% trimmed bases) when --times set to a value greater than one.
            • Support .xz-compressed files (if running in Python 3.3 or later).
            • Started to use the GitHub issue tracker instead of Google Code. All old issues have been moved.

            Comment


            • Hi Martin,

              Great software!

              I want to use cutadapt for PacBio CCS data. However, the CCS data have the adaptor/primer in the middle of the long sequence, just like the following example:
              Code:
              [omit 2kb]ACTCCCAT[COLOR="Red"]GTACTCTGCGTTGATACCAC[/COLOR]TGCTTATCTCTCTCCTCCGTAGAGGGTGAGAGAGATAAGCAGTGGTA[COLOR="red"]TCAACGCAGAGTACATGGG[/COLOR]AGTCCTCACT[omit 2kb]
              I want to find a software to split the sequence according to the primer sequence pairs. Do you plan to integrate this feature in the future version?

              Best,
              Pengcheng

              Comment


              • Hi Pengcheng, I’ve thought about this briefly but wasn’t sure how much interest would be for it and how useful that would be. Could you perhaps post this to the issue tracker? If you don’t want to create an account, I’m happy to add the issue for you.

                Comment


                • Originally posted by mmartin View Post
                  Hi Pengcheng, I’ve thought about this briefly but wasn’t sure how much interest would be for it and how useful that would be. Could you perhaps post this to the issue tracker? If you don’t want to create an account, I’m happy to add the issue for you.
                  OK, I will create one issue there. Because I have write a perl script to do the work quick and dirt. Just follow the logic that:
                  First using blat locate the primer position; Second delete the primer sequence at two ends with a criteria "maxDist", which denote the maximum allowed distance to the end; Third, if the primer locate at the middle of the sequence, and the distance to both end greater than a minimum distance, the sequence will be split into two parts and split the quality correspondingly.

                  Comment


                  • Hi Martin,

                    Just another question when I using cutadapt. I want to cut the polyA/T for the PacBio sequences. One example here:
                    Code:
                    @c42683/f1p3/2901 isoform=c42683;full_length_coverage=1;non_full_length_coverage=3;isoform_length=2901
                    cttttcagacagaggtagtgatcgggtgtagttagtgcggttgagtttgtgcgTGTTCTAGGGTTTGAGTAAATTTGTGTGCACCATGAGCTGGCAGGATTATGTAGACAAGCAGTTGCTGGCTTCTAAATGTGTAACTAAAGCAGCAATTGCTGGACATGATGGAAATGTTTGGGCGAAATCGGATGGATTTGAAGTATCGAAAGAAGAAATTGCAAAGTTGGTGCAAGGTTTTGAAAAACAGGATATCTTGACGTCTTCTGGTGTCACGTTGGCAGGAAACAGATATATTTACCTGTCTGGAACTGATAGAGTTATTAGGGCAAAACTTGGAAAGGTTGGTGTGCATTGCATGAAGACACAGCAAGCTGTTGTAGTTTCACTATATGAGGATCCCATCCAACCACAACAGGCAGCCTCTGTTGTGGAAAAATTAGGGGATTACTTAGTGTCCTGTGGATACTAGAGGTATAATAGACTGTTCTCCTGTGGTGATATGAAGCAGCAGCAGCAACAACAGAGATGGTCGTTTTTTTCTATACAGCGAACTGTAATGTGCATCATGGTTCTGGCTAATAATTCAGTTTGAGTTTAAACTCATATGGTGAAAATCTTGAACTGCATTTTTCTTTTCTCAGTACCTTGTGTCTGAATGTTTGTGGCAAGTATTGCTTCAGTTTGATAATGAGGCACTTGAGCATATCATGGGCTCTTGGAAGTGGACAAATTGTTGGCAACGTTCTCCAAAGTACACTTTGGCTACTAATCCAAGGTGTACCTCGAAGTTCGCCTGGGATGAAAATTGAAGAATTGTAGCTGATAATAGTATGACCATTGCATTTGATACTGAGTCATTAGGATTTTTATAATCTCTAGTCCTGCTCTTTCTACCCCATATCTTTCCCTTTGCCGAGTGAAATTTTGTTAAAAATTAATTTATTGGCGAACTCTTCAATGCTTTAGAACCCAGTGTACTCATTCCTTGTCTATATGTATCACAACCAATTGTCGGAAGCTTGAATGACAAATATTGGTGGAGCTCAGAAGGAGGGCTTGCGCAGTGTGCGTAGTGTCTCCTGCCTGCACTGTGATTGTGGTGTAATTTTAAGATGGTTTGCAGACTATAAATATAATGGATAAAAGCTCGTTTTAGTGCTATGTTGCTGAAAGGATTTGTGGTGGCTTTGATTTTTACTACGGTCTAATTTAAACAATAATAGCTTTGTAAAAGAATTGAATATGGAATTTGAGAAATTTACCAAAATCAAATTTGTATAAAAGTATGGCATAGTCGGTTGAATGACTTCTACTGCAACATATTGTTGAAGTACTAGTTATTCAGCTAATGTGAAAACTGGGGGAATATGAATTTTACAGTAATCTTTTTTATGTAAAGCGTTAGTGtagcaaagttatatatcgttttttttttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttatttttttttttttatttttttttttttttttttttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttttgtttttatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttatgtttatttttttttttttttttttttttttttttttttttattttttttttttattttttttttttttttttgttttttttttattattttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttctttttttttttttaattttattttattttttttttttttttttttttttttttttttttttttttttttttttttttttttttattttttttttttttttttttttttttttgtgattttttttttttttttttttttttatttatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttattttttttttttttttttttttattttttttttttttttttttttttttttttttttttgttatttttttttttttatttatttttttttttttttttttttttttttttttttttgtttttttttttttttttatttttatttttttttttttattttttttttttttttttttatttttttgtttttttttttttttttttttattttttttttatttttttttttttttttttattttttttttaattttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttattattttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttgtttttttttttattttttttgttttttttttttttatttttttttttttttttttttttttttttttttttttttttttttttttttttgtttcctataatttaaagagatattttgtcatgtaaaatggaattatgcttggccccggcctttttgtgctctacttcaggggcgagagcaattttgtgtgccaattgtcattaccaacttgaggttagtctcctgtggatttattagcattttgtagctgattttatgaccactgtattcagtattttcgttcattcttatgagattgtactgttcccctcagggcccccaataaaattaagc
                    As you see, there a long "t" at the 3' end, contrary to our expected at the 5' end. So cutadapt give the results:
                    Code:
                    @c42683/f1p3/2901 isoform=c42683;full_length_coverage=1;non_full_length_coverage=3;isoform_length=2901
                    ttttttttttttttttttttttatttttttttttttttttttttttttttattattttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttgtttttttttttattt
                    tttttgttttttttttttttatttttttttttttttttttttttttttttttttttttttttttttttttttttgtttcctataatttaaagagatattttgtcatgtaaaatggaattatgcttggccccggc
                    ctttttgtgctctacttcaggggcgagagcaattttgtgtgccaattgtcattaccaacttgaggttagtctcctgtggatttattagcattttgtagctgattttatgaccactgtattcagtattttcgttc
                    attcttatgagattgtactgttcccctcagggcccccaataaaattaagc
                    You may say that the sequence at the 5' end maybe wrong. I have searched this sequence against NR database, give a functional protein result. So, I want to cut the polyA/T at both end and using a parameter to judge whether to cut, like a max distance to the end.

                    Comment


                    • Originally posted by pengchy View Post
                      Because I have write a perl script to do the work quick and dirt. Just follow the logic that:
                      First using blat locate the primer position...
                      Maybe you can use cutadapt’s --info-file option: In column 5, 6 and 7, it gives you (for each read), the sequence before the adapter, the sequence that matched the adapter and the sequence after the adapter. The read name is in column 1.

                      The info file currently does not contain qualities, but that is also easy: Edit cutadapt/scripts/cutadapt.py and change it in this way: https://gist.github.com/marcelm/8406e8a48995b766051c .

                      Example command-line:
                      Code:
                      cutadapt --info-file info.txt -o /dev/null -a ADAPTER input.fastq
                      Of course you’d still need to do some scripting and it’s just a suggestion. Feel free to use BLAT if that feels easier!

                      Regarding your second question, I’m not sure what you want exactly. Could you give a toy example that describes what should go into cutadapt and what should come out?

                      Comment


                      • Originally posted by mmartin View Post
                        Maybe you can use cutadapt’s --info-file option: In column 5, 6 and 7, it gives you (for each read), the sequence before the adapter, the sequence that matched the adapter and the sequence after the adapter. The read name is in column 1.

                        The info file currently does not contain qualities, but that is also easy: Edit cutadapt/scripts/cutadapt.py and change it in this way: https://gist.github.com/marcelm/8406e8a48995b766051c .

                        Example command-line:
                        Code:
                        cutadapt --info-file info.txt -o /dev/null -a ADAPTER input.fastq
                        Of course you’d still need to do some scripting and it’s just a suggestion. Feel free to use BLAT if that feels easier!

                        Regarding your second question, I’m not sure what you want exactly. Could you give a toy example that describes what should go into cutadapt and what should come out?
                        Hi Martin,

                        Thanks for your reply.

                        For the first question, I have initialize a issue at github: https://github.com/marcelm/cutadapt/issues/120

                        For the second question, It just like the first question. The question is: The PolyA is not always at the 3' end, It can be detected at 5' end several times. The similar to PolyT. It seems this question can be resolved by "-b" parameter.

                        I have tested the "-b" parameter, it cann't always give all positions as reported by "-a -g" used simultaneously. one example is:
                        Code:
                        -g AAGCAGTGGTATCAACGCAGAGTACATGGGG -a GTACTCTGCGTTGATACCACTGCTT
                        
                        c9492/f3p2579/4102      1       2041    2073
                        c21590/f1p174/3554      2       1826    1857
                        c12682/f10p5/1801       0       0       22
                        [COLOR="Red"]c19086/f1p10/1705       0       1678    1703[/COLOR]
                        
                        
                        -b AAGCAGTGGTATCAACGCAGAGTACATGGGG
                        c9492/f3p2579/4102      1       2041    2073
                        c21590/f1p174/3554      2       1826    1857
                        c12682/f10p5/1801       0       0       22
                        two runs, all other parameters are same except one is "-a -g" and another is "-b".
                        The colored hit is not detected by "-b"

                        BTW: the webpage https://gist.github.com/marcelm/8406e8a48995b766051c is not visible. Could you paste the key steps here? Thank you.
                        Last edited by pengchy; 04-16-2015, 06:37 PM. Reason: add a message

                        Comment


                        • Here is the "gist" (sorry, but the indentation is messed up):
                          Code:
                          diff --git i/cutadapt/scripts/cutadapt.py w/cutadapt/scripts/cutadapt.py
                          index 855721d..2eaf435 100755
                          --- i/cutadapt/scripts/cutadapt.py
                          +++ w/cutadapt/scripts/cutadapt.py
                          @@ -155,6 +155,7 @@ class AdapterCutter(object):
                          # TODO write only one line, even for multiple matches
                          for match in matches:
                          seq = match.read.sequence
                          + qualities = match.read.qualities
                          if match is None:
                          print(match.read.name, -1, seq, sep='\t', file=self.info_file)
                          else:
                          @@ -167,6 +168,9 @@ class AdapterCutter(object):
                          seq[match.rstart:match.rstop],
                          seq[match.rstop:],
                          match.adapter.name,
                          + qualities[0:match.rstart],
                          + qualities[match.rstart:match.rstop],
                          + qualities[match.rstop:],
                          sep='\t', file=self.info_file
                          )
                          Regarding the -g/-a vs -b thing: It should give the same results, but only if all the adapter sequences are identical. You used a different adapter sequence for -a, so then the results will not be the same.

                          I will need some time to work on the issue 120 you filed because I need to understand in detail what is going on.

                          Comment


                          • cutadapt 1.10 has just been released with support for "linked adapters" (5'/3' adapter pairs) and NextSeq-specific trimming, see the changelog at http://cutadapt.readthedocs.io/en/stable/changes.html .

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Advancing Precision Medicine for Rare Diseases in Children
                              by seqadmin




                              Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                              12-16-2024, 07:57 AM
                            • seqadmin
                              Recent Advances in Sequencing Technologies
                              by seqadmin



                              Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                              Long-Read Sequencing
                              Long-read sequencing has seen remarkable advancements,...
                              12-02-2024, 01:49 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 12-17-2024, 10:28 AM
                            0 responses
                            33 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-13-2024, 08:24 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-12-2024, 07:41 AM
                            0 responses
                            34 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 12-11-2024, 07:45 AM
                            0 responses
                            46 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X