SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
trim adapter from Illumina Genome Analyzer IIe miRNA reads NicoBxl Bioinformatics 5 01-02-2014 05:31 AM
Checking the Quality of RRBS libraries before actually running them twang11 Sample Prep / Library Generation 0 02-22-2012 04:18 PM
trim 3' adapter sequence for mRNA-Seq? slny Bioinformatics 14 06-14-2011 06:15 AM
csfasta quality hard trimming do i need to hard trim the qual file? KevinLam Bioinformatics 2 05-13-2010 02:27 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 04-14-2016, 06:57 AM   #101
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

Hi Nathan,
It is difficult so guess what is going on (or rather wrong) because all seems fine and there are no error messages at all. Just generally, Trim Galore would in a first pass generate two _trimmed.fq files, and then validate these afterwards (length constraints etc) and give rise to two val.fq files. Once that has finished the trimmed.fq files should be deleted again.

Trim Galore doesn't use a lot of memory so that should not be the problem. Have you checked that you are not running out of disk space? You should be able to gzip the input files, and/or specify --gzip to keep file sizes smaller. And have you checked (maybe run 'top' in another terminal) if Trim Galore is still running or if it has been killed? Maybe it is still running but just very very slowly, e.g. if a network connection or tmp drive is getting full or the like... If it helps I could create you an FTP site and try running it on your files over here to see if there is something unusual? Best, Felix
fkrueger is offline   Reply With Quote
Old 04-14-2016, 08:53 AM   #102
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

looks like it does stop using the CPU, it's still running in activity monitor (Python) but no CPU usage. 0% tmp getting full sounds probable. would the tmp dir be on the drive where the script is housed or on the drive where the outputs are being generated? thanks for the help. it's all local so i don't think network would have any issues. also thanks for the offer of FTP, but hopefully i'll get this figured out soon.
bowen is offline   Reply With Quote
Old 04-14-2016, 09:04 AM   #103
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

here are the open files for the Python process (parent process Perl)
/Volumes/CHR1_BIOINF_WORKING/RNASeq_working
/System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
/System/Library/Frameworks/Python.framework/Versions/2.7/Python
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_locale.so
/Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_align.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_collections.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/operator.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/itertools.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_heapq.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_functools.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_struct.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cStringIO.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/zlib.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/select.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/fcntl.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/binascii.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/bz2.so
/Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_seqio.so
/Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_qualtrim.so
/usr/lib/dyld
/private/var/db/dyld/dyld_shared_cache_x86_64
->0x99a5dbc1f67c4a4f
->0x99a5dbc1f67c51cf
/Volumes/CHR1_BIOINF_WORKING/RNASeq_working/index11_GGCTAC_L001-L002_R1_001.fastq
bowen is offline   Reply With Quote
Old 04-14-2016, 09:10 AM   #104
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

and here's a sample of the process now while it's at CPU 0%.

Sampling process 19136 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling Python (pid 19136) every 1 millisecond
Process: Python [19136]
Path: /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Load Address: 0x10ab3c000
Identifier: Python
Version: ???
Code Type: X86-64
Parent Process: perl5.18 [19132]

Date/Time: 2016-04-14 13:04:45.752 -0400
Launch Time: 2016-04-14 10:18:54.276 -0400
OS Version: Mac OS X 10.11.4 (15E65)
Report Version: 7
Analysis Tool: /usr/bin/sample
----

Call graph:
2909 Thread_1674560 DispatchQueue_1: com.apple.main-thread (serial)
2909 start (in libdyld.dylib) + 1 [0x7fff90e945ad]
2909 Py_Main (in Python) + 3137 [0x10abf7011]
2909 PyRun_SimpleFileExFlags (in Python) + 698 [0x10abe5634]
2909 PyRun_FileExFlags (in Python) + 133 [0x10abe5ae5]
2909 ??? (in Python) load address 0x10ab43000 + 0xa2a42 [0x10abe5a42]
2909 PyEval_EvalCode (in Python) + 54 [0x10abc5d8c]
2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
2909 PyEval_EvalFrameEx (in Python) + 11609 [0x10abc930c]
2909 ??? (in Python) load address 0x10ab43000 + 0x894ae [0x10abcc4ae]
2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
2909 PyEval_EvalFrameEx (in Python) + 11609 [0x10abc930c]
2909 ??? (in Python) load address 0x10ab43000 + 0x894ae [0x10abcc4ae]
2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
2909 PyEval_EvalFrameEx (in Python) + 13400 [0x10abc9a0b]
2909 ??? (in Python) load address 0x10ab43000 + 0x810af [0x10abc40af]
2909 PyFile_WriteObject (in Python) + 338 [0x10ab63fb3]
2909 ??? (in Python) load address 0x10ab43000 + 0x39b09 [0x10ab7cb09]
2909 ??? (in Python) load address 0x10ab43000 + 0x42a31 [0x10ab85a31]
2909 fwrite (in libsystem_c.dylib) + 153 [0x7fff8984f34a]
2909 __sfvwrite (in libsystem_c.dylib) + 194 [0x7fff8984edcb]
2909 _swrite (in libsystem_c.dylib) + 87 [0x7fff89854202]
2909 __write_nocancel (in libsystem_kernel.dylib) + 10 [0x7fff810d5612]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
__write_nocancel (in libsystem_kernel.dylib) 2909

Binary Images:
0x10ab3c000 - 0x10ab3cfff org.python.python (2.7.10 - 2.7.10) <307E6E15-ECF7-3BB2-AF06-3E8D23DFDECA> /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
0x10ab43000 - 0x10ac34ff7 org.python.python (2.7.10 - 2.7.10) <83AFAAA7-BDFA-354D-8A7A-8F40A30ACB91> /System/Library/Frameworks/Python.framework/Versions/2.7/Python
0x10affb000 - 0x10affcfff _locale.so (94) <4394AC91-22AE-3D7D-85C4-792A4F35F3F2> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_locale.so
0x10b081000 - 0x10b093ffb +_align.so (0) <85EBC770-BB23-375D-99F8-85B587E4DC9C> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_align.so
0x10b0a3000 - 0x10b0a5fff _collections.so (94) <5FEB3871-0B8F-3233-876C-0E81CF581963> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_collections.so
0x10b0ac000 - 0x10b0affff operator.so (94) <D60F7C86-DED4-34F8-BA1B-106E044B6F83> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/operator.so
0x10b0b6000 - 0x10b0bafff itertools.so (94) <889782F7-5414-3881-BAAB-83CACDFDF0C5> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/itertools.so
0x10b0c4000 - 0x10b0c5fff _heapq.so (94) <9200023E-75BA-3F20-843C-398C3709CA88> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_heapq.so
0x10b0cb000 - 0x10b0ccff7 time.so (94) <94E8BF2A-7841-32AD-8722-6B2526999CA1> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so
0x10b113000 - 0x10b116ff7 strop.so (94) <44D8B4D6-D536-31EE-94EA-4F3C0FC773FA> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
0x10b11c000 - 0x10b11dfff _functools.so (94) <49B479ED-A07D-322D-9A29-AFF4CA084219> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_functools.so
0x10b162000 - 0x10b165fff _struct.so (94) <0DCC6B47-A763-3AA6-82C5-B6A58073286B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_struct.so
0x10b16c000 - 0x10b16dfff cStringIO.so (94) <EC2054BE-E4CD-38B3-BBFB-4FEFB76CF1EF> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cStringIO.so
0x10b2b3000 - 0x10b2b5fff zlib.so (94) <72EB0E79-95F2-316C-B49C-A259FEA56658> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/zlib.so
0x10b2bb000 - 0x10b2cafff _io.so (94) <39FEF2EC-8D20-33A6-B91F-EF7B2FAE9009> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
0x10b2db000 - 0x10b2ddfff select.so (94) <22170D1C-40EF-303A-8BB7-A48E783F9350> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/select.so
0x10b2e4000 - 0x10b2e5fff fcntl.so (94) <419069D5-A61F-3925-B320-EA7B9E38F44B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/fcntl.so
0x10b2ea000 - 0x10b2ecfff binascii.so (94) <9044E1C3-221F-3B79-847A-C9C3D8FEA9FD> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/binascii.so
0x10b2f1000 - 0x10b2f4fff bz2.so (94) <435D683B-3940-3669-8CF8-AF280F0B5B9C> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/bz2.so
0x10b2fb000 - 0x10b308fff +_seqio.so (0) <026B8553-7FE9-3560-B184-D7D2B49AF1DC> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_seqio.so
0x10b356000 - 0x10b358fff +_qualtrim.so (0) <16A07A2B-280F-3822-AA42-7B44F426CEF4> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_qualtrim.so
0x7fff657a7000 - 0x7fff657de0d7 dyld (0.0 - ???) <D9B236BC-4AC1-325F-B3EF-3F06DBDA7119> /usr/lib/dyld
0x7fff810be000 - 0x7fff810dcff7 libsystem_kernel.dylib (3248.40.184) <88C17B7F-1CD8-3979-A1A9-F7BDB4FCE789> /usr/lib/system/libsystem_kernel.dylib
0x7fff81390000 - 0x7fff81395ff7 libmacho.dylib (875.1) <318264FA-58F1-39D8-8285-1F6254EE410E> /usr/lib/system/libmacho.dylib
0x7fff81396000 - 0x7fff8139efff libsystem_networkextension.dylib (385.40.36) <66095DC7-6539-38F2-95EE-458F15F6D014> /usr/lib/system/libsystem_networkextension.dylib
0x7fff8139f000 - 0x7fff813a7fff libcopyfile.dylib (127) <A48637BC-F3F2-34F2-BB68-4C65FD012832> /usr/lib/system/libcopyfile.dylib
0x7fff818da000 - 0x7fff818dbfff libsystem_secinit.dylib (20) <32B1A8C6-DC84-3F4F-B8CE-9A52B47C3E6B> /usr/lib/system/libsystem_secinit.dylib
0x7fff82082000 - 0x7fff82082ff7 libunc.dylib (29) <DDB1E947-C775-33B8-B461-63E5EB698F0E> /usr/lib/system/libunc.dylib
0x7fff83006000 - 0x7fff83011ff7 libcommonCrypto.dylib (60075.40.2) <B9D08EB8-FB35-3F7B-8A1C-6FCE3F07B7E7> /usr/lib/system/libcommonCrypto.dylib
0x7fff83248000 - 0x7fff8325fff7 libsystem_asl.dylib (323.40.3) <007F9094-317A-33EA-AF62-BAEAAB48C0F7> /usr/lib/system/libsystem_asl.dylib
0x7fff8326c000 - 0x7fff83275ff3 libsystem_notify.dylib (150.40.1) <D48BDE34-0F7E-34CA-A0FF-C578E39987CC> /usr/lib/system/libsystem_notify.dylib
0x7fff83a9a000 - 0x7fff83ab6ff7 libsystem_malloc.dylib (67.40.1) <5748E8B2-F81C-34C6-8B13-456213127678> /usr/lib/system/libsystem_malloc.dylib
0x7fff8459c000 - 0x7fff84602ff7 libsystem_network.dylib (583.40.20) <269E5ADD-6922-31E2-8D55-7B777263AC0D> /usr/lib/system/libsystem_network.dylib
0x7fff8461f000 - 0x7fff84623fff libcache.dylib (75) <9548AAE9-2AB7-3525-9ECE-A2A7C4688447> /usr/lib/system/libcache.dylib
0x7fff858c9000 - 0x7fff858cafff libsystem_blocks.dylib (65) <1244D9D5-F6AA-35BB-B307-86851C24B8E5> /usr/lib/system/libsystem_blocks.dylib
0x7fff85f29000 - 0x7fff85f31fef libsystem_platform.dylib (74.40.2) <29A905EF-6777-3C33-82B0-6C3A88C4BA15> /usr/lib/system/libsystem_platform.dylib
0x7fff85fe4000 - 0x7fff85fe6ff7 libquarantine.dylib (80) <0F4169F0-0C84-3A25-B3AE-E47B3586D908> /usr/lib/system/libquarantine.dylib
0x7fff86398000 - 0x7fff863c7ffb libsystem_m.dylib (3105) <08E1A4B2-6448-3DFE-A58C-ACC7335BE7E4> /usr/lib/system/libsystem_m.dylib
0x7fff87175000 - 0x7fff87382fff libicucore.A.dylib (551.51) <35315A29-E21C-3CC5-8BD6-E07A3AE8FC0D> /usr/lib/libicucore.A.dylib
0x7fff89766000 - 0x7fff89768fff libsystem_coreservices.dylib (19.2) <1B3F5AFC-FFCD-3ECB-8B9A-5538366FB20D> /usr/lib/system/libsystem_coreservices.dylib
0x7fff89810000 - 0x7fff8989dfff libsystem_c.dylib (1082.20.4) <CDEBF2BB-A578-30F5-846F-96274951C3C5> /usr/lib/system/libsystem_c.dylib
0x7fff89d4a000 - 0x7fff89d61ff7 libsystem_coretls.dylib (83.40.5) <C90DAE38-4082-381C-A185-2A6A8B677628> /usr/lib/system/libsystem_coretls.dylib
0x7fff8a196000 - 0x7fff8a1a7ff7 libsystem_trace.dylib (201.10.3) <25104542-5251-3E8D-B14A-9E37207218BC> /usr/lib/system/libsystem_trace.dylib
0x7fff8af3c000 - 0x7fff8af4dff7 libz.1.dylib (61.20.1) <B3EBB42F-48E3-3287-9F0D-308E04D407AC> /usr/lib/libz.1.dylib
0x7fff8b64f000 - 0x7fff8b657ffb libsystem_dnssd.dylib (625.40.20) <86A05653-DCA0-3345-B29F-F320029AA05E> /usr/lib/system/libsystem_dnssd.dylib
0x7fff8bd80000 - 0x7fff8bda9fff libc++abi.dylib (125) <DCCC8177-3D09-35BC-9784-2A04FEC4C71B> /usr/lib/libc++abi.dylib
0x7fff8c2e0000 - 0x7fff8c2e7ff7 libcompiler_rt.dylib (62) <A13ECF69-F59F-38AE-8609-7B731450FBCD> /usr/lib/system/libcompiler_rt.dylib
0x7fff8cce5000 - 0x7fff8cce6ffb libremovefile.dylib (41) <552EF39E-14D7-363E-9059-4565AC2F894E> /usr/lib/system/libremovefile.dylib
0x7fff8d19f000 - 0x7fff8d216feb libcorecrypto.dylib (335.40.8) <9D300121-CAF8-3894-8774-DF38FA65F238> /usr/lib/system/libcorecrypto.dylib
0x7fff8d3c9000 - 0x7fff8d3cafff libDiagnosticMessagesClient.dylib (100) <4243B6B4-21E9-355B-9C5A-95A216233B96> /usr/lib/libDiagnosticMessagesClient.dylib
0x7fff8d70d000 - 0x7fff8d716ff7 libsystem_pthread.dylib (138.10.4) <3DD1EF4C-1D1B-3ABF-8CC6-B3B1CEEE9559> /usr/lib/system/libsystem_pthread.dylib
0x7fff8d83a000 - 0x7fff8d83aff7 liblaunch.dylib (765.40.36) <1CD7619D-AF2E-34D1-8EC6-8021CF473D9B> /usr/lib/system/liblaunch.dylib
0x7fff90b54000 - 0x7fff90b55ffb libSystem.B.dylib (1226.10.1) <CD307E99-FC5C-3575-BCCE-0C861AA63124> /usr/lib/libSystem.B.dylib
0x7fff90c9a000 - 0x7fff90cc7fff libdispatch.dylib (501.40.12) <C7499857-61A5-3D7D-A5EA-65DCC8C3DF92> /usr/lib/system/libdispatch.dylib
0x7fff90ceb000 - 0x7fff90cf0ff3 libunwind.dylib (35.3) <F6EB48E5-4D12-359A-AB54-C937FBBE9043> /usr/lib/system/libunwind.dylib
0x7fff90e91000 - 0x7fff90e94ffb libdyld.dylib (360.21) <8390E026-F7DE-3C32-9486-3DFF6BD131B0> /usr/lib/system/libdyld.dylib
0x7fff91bb5000 - 0x7fff9202bfff com.apple.CoreFoundation (6.9 - 1258.1) <943A1383-DA6A-3DC0-ABCD-D9AEB3D0D34D> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
0x7fff92038000 - 0x7fff9203aff7 libsystem_configuration.dylib (802.40.13) <3DEB7DF9-6804-37E1-BC83-0166882FF0FF> /usr/lib/system/libsystem_configuration.dylib
0x7fff920d9000 - 0x7fff9212cff7 libc++.1.dylib (120.1) <8FC3D139-8055-3498-9AC5-6467CB7F4D14> /usr/lib/libc++.1.dylib
0x7fff9212e000 - 0x7fff92499657 libobjc.A.dylib (680) <D55D5807-1FBE-32A5-9105-44D7AFE68C27> /usr/lib/libobjc.A.dylib
0x7fff9249a000 - 0x7fff924e0ff7 libauto.dylib (186) <999E610F-41FC-32A3-ADCA-5EC049B65DFB> /usr/lib/libauto.dylib
0x7fff9283a000 - 0x7fff92863ff7 libxpc.dylib (765.40.36) <2CC7CF36-66D4-301B-A6D8-EBAE7405B008> /usr/lib/system/libxpc.dylib
0x7fff92e6d000 - 0x7fff92e7bff7 libbz2.1.0.dylib (38) <28E54258-C0FE-38D4-AB76-1734CACCB344> /usr/lib/libbz2.1.0.dylib
0x7fff9335a000 - 0x7fff93383fff libsystem_info.dylib (477.40.5) <6B01C09E-A3E5-3C71-B370-D0CABD11A436> /usr/lib/system/libsystem_info.dylib
0x7fff9352b000 - 0x7fff9352bff7 libkeymgr.dylib (28) <8371CE54-5FDD-3CE9-B3DF-E98C761B6FE0> /usr/lib/system/libkeymgr.dylib
0x7fff943bc000 - 0x7fff943bffff libsystem_sandbox.dylib (460.40.33) <30671DCC-265F-325A-B33D-11CD336B3DA3> /usr/lib/system/libsystem_sandbox.dylib
Sample analysis of process 19136 written to file /dev/stdout
bowen is offline   Reply With Quote
Old 04-15-2016, 02:35 AM   #105
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

I have to admit that I don't exactly know what is going on, but so far I can't see any indication that (or why) Trim Galore would be failing. Just in case I am attaching the latest development version which you might want to give a whirl.

Alternatively there is a chance that Python or Cutadapt are somehow stalling, so not finishing but also not using a noticable chunk of the CPU anymore. Could you try to run Cutadapt on its own on the file with the same command that Trim Galore is invoking to see if that runs to completion?

Code:
cutadapt -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC -o out.fq index21_GTTTCG_L001-L002_R1_001.fastq
Attached Files
File Type: zip trim_galore.zip (15.1 KB, 1 views)
fkrueger is offline   Reply With Quote
Old 04-15-2016, 05:03 AM   #106
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

Thanks,
Will try cutadapt script you graciously suggested. The Python process envoked by trim_galore is still running, it just has gone to 0% CPU. It may be something quirky about the way the last instance of Python was installed on my machine. I may look into that.
bowen is offline   Reply With Quote
Old 04-15-2016, 05:33 AM   #107
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

cutadapt worked fine, wrote a single out.fq file that appears to be right size, etc.
thanks,
Nathan
bowen is offline   Reply With Quote
Old 04-15-2016, 05:36 AM   #108
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

That's weird, then I can't even blame Python for it... Have you tried the version of Trim Galore I attached?
fkrueger is offline   Reply With Quote
Old 04-15-2016, 05:45 AM   #109
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

am running it now. here's the .txt report of one that is hanging. then followed by .txt report of the one that ran correctly:

hanging:
SUMMARISING RUN PARAMETERS
==========================
Input filename: index21_GTTTCG_L001-L002_R1_001.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.9.1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp

completed: two different .txt files, as it was a paired set

SUMMARISING RUN PARAMETERS
==========================
Input filename: index23_GAGTGG_L001-L002_R1_001.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.9.1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp


This is cutadapt 1.9.1 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC index23_GAGTGG_L001-L002_R1_001.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 850.42 s (30 us/read; 2.01 M reads/minute).

=== Summary ===

Total reads processed: 28,485,339
Reads with adapters: 16,948,174 (59.5%)
Reads written (passing filters): 28,485,339 (100.0%)

Total basepairs processed: 3,589,152,714 bp
Quality-trimmed: 6,608,523 bp (0.2%)
Total written (filtered): 3,299,244,643 bp (91.9%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16948174 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
A: 17.7%
C: 32.2%
G: 32.3%
T: 17.5%
none/other: 0.3%

Overview of removed sequences
length count expect max.err error counts
1 3822853 7121334.8 0 3822853
2 1342103 1780333.7 0 1342103
3 562083 445083.4 0 562083
4 338904 111270.9 0 338904
5 305930 27817.7 0 305930
6 292726 6954.4 0 292726
7 293029 1738.6 0 293029
8 295266 434.7 0 295266
9 314323 108.7 0 313753 570
10 314996 27.2 1 308528 6468
11 294907 6.8 1 288423 6484
12 301352 1.7 1 294428 6924
13 306345 0.4 1 298669 7676
14 308686 0.4 1 301195 7491
15 292770 0.4 1 285216 7554
16 297400 0.4 1 289598 7802
17 293038 0.4 1 285169 7869
18 289457 0.4 1 281581 7876
19 299679 0.4 1 291706 7973
20 297202 0.4 1 289126 8076
21 300660 0.4 1 292170 8490
22 286125 0.4 1 278406 7719
23 269566 0.4 1 261467 8099
24 274398 0.4 1 266280 8118
25 264378 0.4 1 257631 6747
26 255768 0.4 1 248915 6853
27 256549 0.4 1 249833 6716
28 257268 0.4 1 250252 7016
29 244909 0.4 1 238288 6621
30 234995 0.4 1 229910 5085
31 229964 0.4 1 224161 5803
32 224294 0.4 1 219567 4727
33 205954 0.4 1 201559 4395
34 201649 0.4 1 197346 4303
35 197177 0.4 1 192772 4405
36 166376 0.4 1 162610 3766
37 164498 0.4 1 160888 3610
38 154155 0.4 1 150759 3396
39 149720 0.4 1 146403 3317
40 147538 0.4 1 144226 3312
41 139000 0.4 1 135824 3176
42 117928 0.4 1 115068 2860
43 150887 0.4 1 147697 3190
44 74900 0.4 1 73163 1737
45 85284 0.4 1 83407 1877
46 84122 0.4 1 82252 1870
47 79113 0.4 1 77412 1701
48 71166 0.4 1 69632 1534
49 71476 0.4 1 69884 1592
50 64563 0.4 1 63123 1440
51 62861 0.4 1 61509 1352
52 53459 0.4 1 52311 1148
53 50043 0.4 1 48999 1044
54 46537 0.4 1 45491 1046
55 41425 0.4 1 40523 902
56 33841 0.4 1 33125 716
57 30992 0.4 1 30383 609
58 27455 0.4 1 26913 542
59 27536 0.4 1 26969 567
60 23792 0.4 1 23350 442
61 21538 0.4 1 21073 465
62 18972 0.4 1 18504 468
63 18545 0.4 1 18096 449
64 15370 0.4 1 14978 392
65 14415 0.4 1 14016 399
66 12971 0.4 1 12621 350
67 11121 0.4 1 10788 333
68 10333 0.4 1 10010 323
69 9483 0.4 1 9121 362
70 8785 0.4 1 8313 472
71 8295 0.4 1 7621 674
72 7952 0.4 1 6994 958
73 8569 0.4 1 6772 1797
74 11545 0.4 1 6819 4726
75 40013 0.4 1 7295 32718
76 23307 0.4 1 21496 1811
77 4013 0.4 1 3591 422
78 1490 0.4 1 1251 239
79 792 0.4 1 628 164
80 599 0.4 1 448 151
81 481 0.4 1 342 139
82 463 0.4 1 314 149
83 445 0.4 1 281 164
84 410 0.4 1 248 162
85 358 0.4 1 212 146
86 346 0.4 1 180 166
87 300 0.4 1 138 162
88 283 0.4 1 137 146
89 249 0.4 1 108 141
90 224 0.4 1 100 124
91 221 0.4 1 89 132
92 203 0.4 1 64 139
93 180 0.4 1 61 119
94 143 0.4 1 36 107
95 157 0.4 1 34 123
96 160 0.4 1 30 130
97 124 0.4 1 25 99
98 133 0.4 1 24 109
99 108 0.4 1 17 91
100 119 0.4 1 6 113
101 109 0.4 1 15 94
102 100 0.4 1 4 96
103 95 0.4 1 8 87
104 92 0.4 1 6 86
105 113 0.4 1 1 112
106 113 0.4 1 5 108
107 118 0.4 1 5 113
108 123 0.4 1 2 121
109 121 0.4 1 2 119
110 134 0.4 1 2 132
111 119 0.4 1 5 114
112 127 0.4 1 11 116
113 116 0.4 1 14 102
114 139 0.4 1 13 126
115 126 0.4 1 5 121
116 123 0.4 1 7 116
117 157 0.4 1 5 152
118 161 0.4 1 2 159
119 167 0.4 1 2 165
120 205 0.4 1 6 199
121 273 0.4 1 8 265
122 250 0.4 1 7 243
123 428 0.4 1 10 418
124 774 0.4 1 2 772
125 1930 0.4 1 6 1924
126 3376 0.4 1 4 3372


RUN STATISTICS FOR INPUT FILE: index23_GAGTGG_L001-L002_R1_001.fastq
=============================================
28485339 sequences processed in total

completed 2:

SUMMARISING RUN PARAMETERS
==========================
Input filename: index23_GAGTGG_L001-L002_R2_001.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.9.1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp


This is cutadapt 1.9.1 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC index23_GAGTGG_L001-L002_R2_001.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 894.72 s (31 us/read; 1.91 M reads/minute).

=== Summary ===

Total reads processed: 28,485,339
Reads with adapters: 18,157,389 (63.7%)
Reads written (passing filters): 28,485,339 (100.0%)

Total basepairs processed: 3,589,152,714 bp
Quality-trimmed: 11,062,843 bp (0.3%)
Total written (filtered): 3,294,805,118 bp (91.8%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18157389 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
A: 18.4%
C: 28.5%
G: 39.5%
T: 13.3%
none/other: 0.3%

Overview of removed sequences
length count expect max.err error counts
1 4668833 7121334.8 0 4668833
2 1551414 1780333.7 0 1551414
3 711239 445083.4 0 711239
4 344741 111270.9 0 344741
5 310825 27817.7 0 310825
6 297044 6954.4 0 297044
7 303310 1738.6 0 303310
8 287034 434.7 0 287034
9 333133 108.7 0 332495 638
10 312592 27.2 1 307059 5533
11 286908 6.8 1 281440 5468
12 324224 1.7 1 317722 6502
13 281828 0.4 1 275947 5881
14 364803 0.4 1 357092 7711
15 248030 0.4 1 242074 5956
16 294938 0.4 1 288213 6725
17 374860 0.4 1 366200 8660
18 212737 0.4 1 207831 4906
19 319230 0.4 1 312760 6470
20 271034 0.4 1 264706 6328
21 280398 0.4 1 273829 6569
22 283482 0.4 1 276986 6496
23 273291 0.4 1 266428 6863
24 316548 0.4 1 308790 7758
25 217447 0.4 1 211611 5836
26 259869 0.4 1 253384 6485
27 271923 0.4 1 265176 6747
28 264485 0.4 1 258666 5819
29 221191 0.4 1 215980 5211
30 285155 0.4 1 279264 5891
31 181353 0.4 1 177458 3895
32 214689 0.4 1 210267 4422
33 214503 0.4 1 210240 4263
34 212749 0.4 1 208272 4477
35 177297 0.4 1 173613 3684
36 173905 0.4 1 170248 3657
37 164687 0.4 1 161249 3438
38 162012 0.4 1 158619 3393
39 139466 0.4 1 136617 2849
40 139275 0.4 1 136201 3074
41 133976 0.4 1 131087 2889
42 143008 0.4 1 139705 3303
43 98394 0.4 1 96109 2285
44 101487 0.4 1 99195 2292
45 108791 0.4 1 106222 2569
46 80329 0.4 1 78401 1928
47 76402 0.4 1 74674 1728
48 74376 0.4 1 72843 1533
49 64568 0.4 1 63238 1330
50 67016 0.4 1 65615 1401
51 72066 0.4 1 70715 1351
52 43802 0.4 1 42900 902
53 51200 0.4 1 50281 919
54 39798 0.4 1 38959 839
55 41369 0.4 1 40582 787
56 34005 0.4 1 33317 688
57 29637 0.4 1 29067 570
58 28815 0.4 1 28227 588
59 25798 0.4 1 25286 512
60 24621 0.4 1 24062 559
61 22089 0.4 1 21540 549
62 20351 0.4 1 19643 708
63 18922 0.4 1 18103 819
64 17531 0.4 1 16360 1171
65 17085 0.4 1 14933 2152
66 18828 0.4 1 14404 4424
67 50417 0.4 1 15488 34929
68 68997 0.4 1 64574 4423
69 10693 0.4 1 9940 753
70 3339 0.4 1 3028 311
71 1922 0.4 1 1694 228
72 1216 0.4 1 1031 185
73 910 0.4 1 748 162
74 824 0.4 1 635 189
75 688 0.4 1 524 164
76 631 0.4 1 479 152
77 556 0.4 1 371 185
78 518 0.4 1 351 167
79 403 0.4 1 264 139
80 412 0.4 1 227 185
81 359 0.4 1 215 144
82 325 0.4 1 188 137
83 279 0.4 1 170 109
84 259 0.4 1 138 121
85 198 0.4 1 104 94
86 180 0.4 1 90 90
87 174 0.4 1 81 93
88 176 0.4 1 74 102
89 152 0.4 1 58 94
90 132 0.4 1 41 91
91 136 0.4 1 32 104
92 120 0.4 1 33 87
93 98 0.4 1 26 72
94 111 0.4 1 16 95
95 112 0.4 1 19 93
96 84 0.4 1 12 72
97 100 0.4 1 12 88
98 78 0.4 1 13 65
99 75 0.4 1 11 64
100 101 0.4 1 5 96
101 79 0.4 1 8 71
102 59 0.4 1 5 54
103 81 0.4 1 1 80
104 78 0.4 1 2 76
105 78 0.4 1 2 76
106 103 0.4 1 2 101
107 74 0.4 1 5 69
108 76 0.4 1 3 73
109 104 0.4 1 0 104
110 64 0.4 1 1 63
111 98 0.4 1 7 91
112 81 0.4 1 4 77
113 89 0.4 1 8 81
114 100 0.4 1 11 89
115 73 0.4 1 2 71
116 115 0.4 1 5 110
117 112 0.4 1 1 111
118 125 0.4 1 3 122
119 131 0.4 1 2 129
120 124 0.4 1 3 121
121 137 0.4 1 4 133
122 166 0.4 1 3 163
123 230 0.4 1 1 229
124 422 0.4 1 0 422
125 1138 0.4 1 2 1136
126 1931 0.4 1 2 1929


RUN STATISTICS FOR INPUT FILE: index23_GAGTGG_L001-L002_R2_001.fastq
=============================================
28485339 sequences processed in total

Total number of sequences analysed for the sequence pair length validation: 28485339

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 72318 (0.25%)
bowen is offline   Reply With Quote
Old 04-15-2016, 07:57 AM   #110
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

so now my question is are the processes that are running with 0% still going, and I should just be patient?
bowen is offline   Reply With Quote
Old 04-15-2016, 09:13 AM   #111
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

To be perfectly honest I really don't know why your Python threads are slowing down to 0%, all individual pieces of software seem to run fine (and to completion apart from this sample). Maybe someone else can chip in here?

If it just doesn't finished why don't you just modify the Cutadapt command you tried above to run as paired-end sample (this might require you specify and adapter 2, but you can use the same sequence for that). Sorry I can't be of more help, I have never seen such a behaviour before...
fkrueger is offline   Reply With Quote
Old 04-19-2016, 08:21 AM   #112
bowen
Junior Member
 
Location: SouthEast

Join Date: Sep 2009
Posts: 9
Default

Hi Felix,
Although this may not be advisable to others, I decided to # out the installed python on my .bash_profile and install python with brew. Things are working fine now. Sorry for the trouble. maybe was an IDLE issue, not sure. again, i appreciate your time and all the best to you.
bowen is offline   Reply With Quote
Old 04-19-2016, 11:51 AM   #113
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

Glad that it seems to be working now though! Best, Felix
fkrueger is offline   Reply With Quote
Old 05-23-2016, 09:15 AM   #114
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

most of my samples have a spike-in lambda unmethylated DNA. I mapped them to lambda genome and calculated the efficiency. I do not have spike-in for one sample. I have to check the conversion efficiency for the red unmethylated C (see below) introduced at the 3 when end-repair was done. for my bismark pipeline, trim_galore will remove this unmehtylated C if there is adaptor contamination. I read it here http://www.bioinformatics.babraham.a...RRBS_Guide.pdf



How do I calculate it? I need to take the fastqs and trim-off the adaptors, but not the last 2 bases at 3', map with bismark, and check how many Ts are at the end of the each read? Is there any script to do so?

Thanks, Ming
crazyhottommy is offline   Reply With Quote
Old 05-23-2016, 11:37 AM   #115
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

Hi Ming,

I used to have a script that would do this, I can send it over tomorrow if I manage to find it. Best, Felix
fkrueger is offline   Reply With Quote
Old 05-23-2016, 12:19 PM   #116
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by fkrueger View Post
Hi Ming,

I used to have a script that would do this, I can send it over tomorrow if I manage to find it. Best, Felix
That would be very helpful!
Thanks!
crazyhottommy is offline   Reply With Quote
Old 05-24-2016, 04:20 AM   #117
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

Here it is. It is looking for an overlap with the adapter from the end, as it stands 5bp (you can change this in this line: my $required_adapter_overlap = 5; ), and should then give you some useful output about the conversion efficiency at the end. You may want to run it with a few different lengths to see if that makes a difference. Let me know if there are any questions. Cheers, Felix
Attached Files
File Type: pl find_RRBS_non_conversion.pl (6.0 KB, 1 views)
fkrueger is offline   Reply With Quote
Old 05-24-2016, 06:57 AM   #118
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default check 4 cases?

Quote:
Originally Posted by fkrueger View Post
Here it is. It is looking for an overlap with the adapter from the end, as it stands 5bp (you can change this in this line: my $required_adapter_overlap = 5; ), and should then give you some useful output about the conversion efficiency at the end. You may want to run it with a few different lengths to see if that makes a difference. Let me know if there are any questions. Cheers, Felix
Thanks for the script.
what I thought:

I will need to check CCG + adaptor or TCG + adaptor for unmethylated filled-in Cs.
and CTG + adaptor or TTG + adaptor for methylated filled-in Cs.

e.g. a full-length read:

TGGATGTTGGTTGTGGTTAGTATTCGAGATCGGAAG

It stats with TGG, so it is not methylated in the genome, but check at the bold part, it start with TCG, so the filed-in C are unmethylated (not converted successfully by bisulfite)

I only saw in your script you checked TTG + adaptor and TCG+ adaptor.
Do I need to check CTG and CCG as well?

Please let me know if I am correct or not RRBS is new for me.
crazyhottommy is offline   Reply With Quote
Old 05-24-2016, 07:25 AM   #119
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 614
Default

I would say in theory yes, but since we were working in mammalian genomes where you would expect a non-CG methylation of <1% we just assumed that the C before the CG is always converted.

While looking at the script I noticed that there is another place you need to change, because when we did this back in 2011 our reads were 40 bp long.

So you need to locate the lines (should be two times) that say
Code:
my $poi = 40 - length($rest)-3;
and change the 40 to your read length, or even better change it to

Code:
length($sequence)
so that this works for any read length.
fkrueger is offline   Reply With Quote
Old 05-24-2016, 07:45 AM   #120
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by fkrueger View Post
I would say in theory yes, but since we were working in mammalian genomes where you would expect a non-CG methylation of <1% we just assumed that the C before the CG is always converted.

While looking at the script I noticed that there is another place you need to change, because when we did this back in 2011 our reads were 40 bp long.

So you need to locate the lines (should be two times) that say
Code:
my $poi = 40 - length($rest)-3;
and change the 40 to your read length, or even better change it to

Code:
length($sequence)
so that this works for any read length.
Thanks, I noticed that as well and change the length accordingly.

We are checking the bisulfite conversion rate.
Although in the human genome, non-CpGs are unmethylated, if there is a bisulfite conversion failure, they will remain as Cs, and not converted to Ts.
we will miss a lot of sequences with CCG and probably few of the CTG
crazyhottommy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO