SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TruSeq adapter sequences? BIG_SNP Illumina/Solexa 35 03-16-2014 06:21 PM
TruSeq adapter sequences kirankm Sample Prep / Library Generation 4 05-10-2012 07:32 AM
SOLiD Adapter Sequences DrDTonge SOLiD 0 08-25-2011 01:14 AM
primer/adapter sequences nikiwilson Sample Prep / Library Generation 2 06-21-2011 02:36 PM
Adapter sequences SeqTruth General 2 04-27-2011 08:05 AM

Reply
 
Thread Tools
Old 02-22-2015, 06:42 AM   #121
sidsv
Junior Member
 
Location: Russia, St. Petersburg

Join Date: Feb 2015
Posts: 6
Default

Martin, thanks for your efforts!

Code:
>$ cd /; python -c 'import cutadapt; print cutadapt.__file__'
/usr/local/lib/python2.7/dist-packages/cutadapt/__init__.pyc
pip version:

Code:
>$ pip --version
pip 1.0 from /usr/lib/python2.7/dist-packages (python 2.7)
Linux version:

Code:
>$ uname -a
Linux dcdell 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Distribution version:

Code:
>$ cat /etc/issue
Ubuntu 12.04.5 LTS \n \l

Last edited by sidsv; 02-22-2015 at 06:44 AM.
sidsv is offline   Reply With Quote
Old 02-23-2015, 05:07 AM   #122
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

In a virtual machine with Ubuntu 12.04.5, I could reproduce the pip problem (where it says 'permission denied'). It has been fixed, but not in the version that comes with Ubuntu 12.04 and there seems to be no workaround.

I cannot reproduce the other problem of you getting 1.2.1 while you have 1.7.1 installed, but I can think of a few two more things to try: The first is to not install the program at all - just unpack the tar.gz and run cutadapt-1.7.1/bin/cutadapt (you cannot move the program anywhere if you do this). The second is to try to install the program using Python 3: 'python3.3 setup.py install --user'. The third option of course is to ask your system admin to upgrade the 'globally' installed cutadapt ('pip install --upgrade cutadapt' as root should work).
mmartin is offline   Reply With Quote
Old 02-24-2015, 03:04 AM   #123
sidsv
Junior Member
 
Location: Russia, St. Petersburg

Join Date: Feb 2015
Posts: 6
Default

Martin, thank you very much for help!

I just ran cutadapt without installation as you said and it worked pretty fine! Maybe I'll ask our system admin to upgrade 'global' cutadapt installation.

Thanks again!
sidsv is offline   Reply With Quote
Old 02-24-2015, 04:13 AM   #124
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Great to hear you got it to work! Let me know if there are any further problems.
mmartin is offline   Reply With Quote
Old 03-14-2015, 05:01 AM   #125
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default cutadapt 1.8

Hi, cutadapt 1.8 has been released. It (finally) comes with proper paired-end support (no need to run cutadapt twice), quality-trimming of 5' ends and filtering of reads that have too many N bases.
Here is a copy of the full changelog:
  • Support single-pass paired-end trimming with the new -A/-G/-B/-U parameters. These work just like their -a/-g/-b/-u counterparts, but they specify sequences that are removed from the second read in a pair.

    Also, if you start using one of those options, the read modification options such as -q (quality trimming) are applied to *both* reads. For backwards compatibility, read modifications are applied to the first read only if neither of -A/-G/-B/-U is used. See the documentation for details.

    This feature has not been extensively tested, so please give feedback if something does not work.
  • The report output has been re-worked in order to accomodate the new paired-end trimming mode. This also changes the way the report looks like in single-end mode. It is hopefully now more accessible.
  • Chris Mitchell contributed a patch adding two new options: --trim-n removes any N bases from the read ends, and the --max-n option can be used to filter out reads with too many Ns.
  • Support notation for repeated bases in the adapter sequence: Write A{10} instead of AAAAAAAAAA. Useful for poly-A trimming: Use -a A{100} to get the longest possible tail.
  • Quality trimming at the 5' end of reads is now supported. Use -q 15,10 to trim the 5' end with a cutoff of 15 and the 3' end with a cutoff of 10.
  • Fix incorrectly reported statistics (> 100% trimmed bases) when --times set to a value greater than one.
  • Support .xz-compressed files (if running in Python 3.3 or later).
  • Started to use the GitHub issue tracker instead of Google Code. All old issues have been moved.
mmartin is offline   Reply With Quote
Old 04-14-2015, 06:16 PM   #126
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Hi Martin,

Great software!

I want to use cutadapt for PacBio CCS data. However, the CCS data have the adaptor/primer in the middle of the long sequence, just like the following example:
Code:
[omit 2kb]ACTCCCATGTACTCTGCGTTGATACCACTGCTTATCTCTCTCCTCCGTAGAGGGTGAGAGAGATAAGCAGTGGTATCAACGCAGAGTACATGGGAGTCCTCACT[omit 2kb]
I want to find a software to split the sequence according to the primer sequence pairs. Do you plan to integrate this feature in the future version?

Best,
Pengcheng
pengchy is offline   Reply With Quote
Old 04-15-2015, 01:39 AM   #127
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Hi Pengcheng, Iíve thought about this briefly but wasnít sure how much interest would be for it and how useful that would be. Could you perhaps post this to the issue tracker? If you donít want to create an account, Iím happy to add the issue for you.
mmartin is offline   Reply With Quote
Old 04-16-2015, 07:23 AM   #128
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Quote:
Originally Posted by mmartin View Post
Hi Pengcheng, Iíve thought about this briefly but wasnít sure how much interest would be for it and how useful that would be. Could you perhaps post this to the issue tracker? If you donít want to create an account, Iím happy to add the issue for you.
OK, I will create one issue there. Because I have write a perl script to do the work quick and dirt. Just follow the logic that:
First using blat locate the primer position; Second delete the primer sequence at two ends with a criteria "maxDist", which denote the maximum allowed distance to the end; Third, if the primer locate at the middle of the sequence, and the distance to both end greater than a minimum distance, the sequence will be split into two parts and split the quality correspondingly.
pengchy is offline   Reply With Quote
Old 04-16-2015, 07:29 AM   #129
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Hi Martin,

Just another question when I using cutadapt. I want to cut the polyA/T for the PacBio sequences. One example here:
Code:
@c42683/f1p3/2901 isoform=c42683;full_length_coverage=1;non_full_length_coverage=3;isoform_length=2901
cttttcagacagaggtagtgatcgggtgtagttagtgcggttgagtttgtgcgTGTTCTAGGGTTTGAGTAAATTTGTGTGCACCATGAGCTGGCAGGATTATGTAGACAAGCAGTTGCTGGCTTCTAAATGTGTAACTAAAGCAGCAATTGCTGGACATGATGGAAATGTTTGGGCGAAATCGGATGGATTTGAAGTATCGAAAGAAGAAATTGCAAAGTTGGTGCAAGGTTTTGAAAAACAGGATATCTTGACGTCTTCTGGTGTCACGTTGGCAGGAAACAGATATATTTACCTGTCTGGAACTGATAGAGTTATTAGGGCAAAACTTGGAAAGGTTGGTGTGCATTGCATGAAGACACAGCAAGCTGTTGTAGTTTCACTATATGAGGATCCCATCCAACCACAACAGGCAGCCTCTGTTGTGGAAAAATTAGGGGATTACTTAGTGTCCTGTGGATACTAGAGGTATAATAGACTGTTCTCCTGTGGTGATATGAAGCAGCAGCAGCAACAACAGAGATGGTCGTTTTTTTCTATACAGCGAACTGTAATGTGCATCATGGTTCTGGCTAATAATTCAGTTTGAGTTTAAACTCATATGGTGAAAATCTTGAACTGCATTTTTCTTTTCTCAGTACCTTGTGTCTGAATGTTTGTGGCAAGTATTGCTTCAGTTTGATAATGAGGCACTTGAGCATATCATGGGCTCTTGGAAGTGGACAAATTGTTGGCAACGTTCTCCAAAGTACACTTTGGCTACTAATCCAAGGTGTACCTCGAAGTTCGCCTGGGATGAAAATTGAAGAATTGTAGCTGATAATAGTATGACCATTGCATTTGATACTGAGTCATTAGGATTTTTATAATCTCTAGTCCTGCTCTTTCTACCCCATATCTTTCCCTTTGCCGAGTGAAATTTTGTTAAAAATTAATTTATTGGCGAACTCTTCAATGCTTTAGAACCCAGTGTACTCATTCCTTGTCTATATGTATCACAACCAATTGTCGGAAGCTTGAATGACAAATATTGGTGGAGCTCAGAAGGAGGGCTTGCGCAGTGTGCGTAGTGTCTCCTGCCTGCACTGTGATTGTGGTGTAATTTTAAGATGGTTTGCAGACTATAAATATAATGGATAAAAGCTCGTTTTAGTGCTATGTTGCTGAAAGGATTTGTGGTGGCTTTGATTTTTACTACGGTCTAATTTAAACAATAATAGCTTTGTAAAAGAATTGAATATGGAATTTGAGAAATTTACCAAAATCAAATTTGTATAAAAGTATGGCATAGTCGGTTGAATGACTTCTACTGCAACATATTGTTGAAGTACTAGTTATTCAGCTAATGTGAAAACTGGGGGAATATGAATTTTACAGTAATCTTTTTTATGTAAAGCGTTAGTGtagcaaagttatatatcgttttttttttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttatttttttttttttatttttttttttttttttttttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttttgtttttatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttatgtttatttttttttttttttttttttttttttttttttttattttttttttttattttttttttttttttttgttttttttttattattttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttctttttttttttttaattttattttattttttttttttttttttttttttttttttttttttttttttttttttttttttttttattttttttttttttttttttttttttttgtgattttttttttttttttttttttttatttatttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttattttttttttttttttttttttattttttttttttttttttttttttttttttttttttgttatttttttttttttatttatttttttttttttttttttttttttttttttttttgtttttttttttttttttatttttatttttttttttttattttttttttttttttttttatttttttgtttttttttttttttttttttattttttttttatttttttttttttttttttattttttttttaattttttttttttttttttttattttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttattattttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttgtttttttttttattttttttgttttttttttttttatttttttttttttttttttttttttttttttttttttttttttttttttttttgtttcctataatttaaagagatattttgtcatgtaaaatggaattatgcttggccccggcctttttgtgctctacttcaggggcgagagcaattttgtgtgccaattgtcattaccaacttgaggttagtctcctgtggatttattagcattttgtagctgattttatgaccactgtattcagtattttcgttcattcttatgagattgtactgttcccctcagggcccccaataaaattaagc
As you see, there a long "t" at the 3' end, contrary to our expected at the 5' end. So cutadapt give the results:
Code:
@c42683/f1p3/2901 isoform=c42683;full_length_coverage=1;non_full_length_coverage=3;isoform_length=2901
ttttttttttttttttttttttatttttttttttttttttttttttttttattattttttttttttttttttttttttttttttttttttatttttttttttttttttttttttttttgtttttttttttattt
tttttgttttttttttttttatttttttttttttttttttttttttttttttttttttttttttttttttttttgtttcctataatttaaagagatattttgtcatgtaaaatggaattatgcttggccccggc
ctttttgtgctctacttcaggggcgagagcaattttgtgtgccaattgtcattaccaacttgaggttagtctcctgtggatttattagcattttgtagctgattttatgaccactgtattcagtattttcgttc
attcttatgagattgtactgttcccctcagggcccccaataaaattaagc
You may say that the sequence at the 5' end maybe wrong. I have searched this sequence against NR database, give a functional protein result. So, I want to cut the polyA/T at both end and using a parameter to judge whether to cut, like a max distance to the end.
pengchy is offline   Reply With Quote
Old 04-16-2015, 08:14 AM   #130
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by pengchy View Post
Because I have write a perl script to do the work quick and dirt. Just follow the logic that:
First using blat locate the primer position...
Maybe you can use cutadaptís --info-file option: In column 5, 6 and 7, it gives you (for each read), the sequence before the adapter, the sequence that matched the adapter and the sequence after the adapter. The read name is in column 1.

The info file currently does not contain qualities, but that is also easy: Edit cutadapt/scripts/cutadapt.py and change it in this way: https://gist.github.com/marcelm/8406e8a48995b766051c .

Example command-line:
Code:
cutadapt --info-file info.txt -o /dev/null -a ADAPTER input.fastq
Of course youíd still need to do some scripting and itís just a suggestion. Feel free to use BLAT if that feels easier!

Regarding your second question, Iím not sure what you want exactly. Could you give a toy example that describes what should go into cutadapt and what should come out?
mmartin is offline   Reply With Quote
Old 04-16-2015, 07:35 PM   #131
pengchy
Senior Member
 
Location: China

Join Date: Feb 2009
Posts: 116
Default

Quote:
Originally Posted by mmartin View Post
Maybe you can use cutadapt’s --info-file option: In column 5, 6 and 7, it gives you (for each read), the sequence before the adapter, the sequence that matched the adapter and the sequence after the adapter. The read name is in column 1.

The info file currently does not contain qualities, but that is also easy: Edit cutadapt/scripts/cutadapt.py and change it in this way: https://gist.github.com/marcelm/8406e8a48995b766051c .

Example command-line:
Code:
cutadapt --info-file info.txt -o /dev/null -a ADAPTER input.fastq
Of course you’d still need to do some scripting and it’s just a suggestion. Feel free to use BLAT if that feels easier!

Regarding your second question, I’m not sure what you want exactly. Could you give a toy example that describes what should go into cutadapt and what should come out?
Hi Martin,

Thanks for your reply.

For the first question, I have initialize a issue at github: https://github.com/marcelm/cutadapt/issues/120

For the second question, It just like the first question. The question is: The PolyA is not always at the 3' end, It can be detected at 5' end several times. The similar to PolyT. It seems this question can be resolved by "-b" parameter.

I have tested the "-b" parameter, it cann't always give all positions as reported by "-a -g" used simultaneously. one example is:
Code:
-g AAGCAGTGGTATCAACGCAGAGTACATGGGG -a GTACTCTGCGTTGATACCACTGCTT

c9492/f3p2579/4102      1       2041    2073
c21590/f1p174/3554      2       1826    1857
c12682/f10p5/1801       0       0       22
c19086/f1p10/1705       0       1678    1703


-b AAGCAGTGGTATCAACGCAGAGTACATGGGG
c9492/f3p2579/4102      1       2041    2073
c21590/f1p174/3554      2       1826    1857
c12682/f10p5/1801       0       0       22
two runs, all other parameters are same except one is "-a -g" and another is "-b".
The colored hit is not detected by "-b"

BTW: the webpage https://gist.github.com/marcelm/8406e8a48995b766051c is not visible. Could you paste the key steps here? Thank you.

Last edited by pengchy; 04-16-2015 at 07:37 PM. Reason: add a message
pengchy is offline   Reply With Quote
Old 04-22-2015, 06:52 AM   #132
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Here is the "gist" (sorry, but the indentation is messed up):
Code:
diff --git i/cutadapt/scripts/cutadapt.py w/cutadapt/scripts/cutadapt.py
index 855721d..2eaf435 100755
--- i/cutadapt/scripts/cutadapt.py
+++ w/cutadapt/scripts/cutadapt.py
@@ -155,6 +155,7 @@ class AdapterCutter(object):
# TODO write only one line, even for multiple matches
for match in matches:
seq = match.read.sequence
+ qualities = match.read.qualities
if match is None:
print(match.read.name, -1, seq, sep='\t', file=self.info_file)
else:
@@ -167,6 +168,9 @@ class AdapterCutter(object):
seq[match.rstart:match.rstop],
seq[match.rstop:],
match.adapter.name,
+ qualities[0:match.rstart],
+ qualities[match.rstart:match.rstop],
+ qualities[match.rstop:],
sep='\t', file=self.info_file
)
Regarding the -g/-a vs -b thing: It should give the same results, but only if all the adapter sequences are identical. You used a different adapter sequence for -a, so then the results will not be the same.

I will need some time to work on the issue 120 you filed because I need to understand in detail what is going on.
mmartin is offline   Reply With Quote
Old 05-19-2016, 08:09 AM   #133
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

cutadapt 1.10 has just been released with support for "linked adapters" (5'/3' adapter pairs) and NextSeq-specific trimming, see the changelog at http://cutadapt.readthedocs.io/en/stable/changes.html .
mmartin is offline   Reply With Quote
Reply

Tags
adapter trimming, color space, microrna, mirna, solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO