Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thanks,
    Will try cutadapt script you graciously suggested. The Python process envoked by trim_galore is still running, it just has gone to 0% CPU. It may be something quirky about the way the last instance of Python was installed on my machine. I may look into that.

    Comment


    • cutadapt worked fine, wrote a single out.fq file that appears to be right size, etc.
      thanks,
      Nathan

      Comment


      • That's weird, then I can't even blame Python for it... Have you tried the version of Trim Galore I attached?

        Comment


        • am running it now. here's the .txt report of one that is hanging. then followed by .txt report of the one that ran correctly:

          hanging:
          SUMMARISING RUN PARAMETERS
          ==========================
          Input filename: index21_GTTTCG_L001-L002_R1_001.fastq
          Trimming mode: paired-end
          Trim Galore version: 0.4.1
          Cutadapt version: 1.9.1
          Quality Phred score cutoff: 20
          Quality encoding type selected: ASCII+33
          Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
          Maximum trimming error rate: 0.1 (default)
          Minimum required adapter overlap (stringency): 1 bp
          Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp

          completed: two different .txt files, as it was a paired set

          SUMMARISING RUN PARAMETERS
          ==========================
          Input filename: index23_GAGTGG_L001-L002_R1_001.fastq
          Trimming mode: paired-end
          Trim Galore version: 0.4.1
          Cutadapt version: 1.9.1
          Quality Phred score cutoff: 20
          Quality encoding type selected: ASCII+33
          Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
          Maximum trimming error rate: 0.1 (default)
          Minimum required adapter overlap (stringency): 1 bp
          Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp


          This is cutadapt 1.9.1 with Python 2.7.10
          Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC index23_GAGTGG_L001-L002_R1_001.fastq
          Trimming 1 adapter with at most 10.0% errors in single-end mode ...
          Finished in 850.42 s (30 us/read; 2.01 M reads/minute).

          === Summary ===

          Total reads processed: 28,485,339
          Reads with adapters: 16,948,174 (59.5%)
          Reads written (passing filters): 28,485,339 (100.0%)

          Total basepairs processed: 3,589,152,714 bp
          Quality-trimmed: 6,608,523 bp (0.2%)
          Total written (filtered): 3,299,244,643 bp (91.9%)

          === Adapter 1 ===

          Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16948174 times.

          No. of allowed errors:
          0-9 bp: 0; 10-13 bp: 1

          Bases preceding removed adapters:
          A: 17.7%
          C: 32.2%
          G: 32.3%
          T: 17.5%
          none/other: 0.3%

          Overview of removed sequences
          length count expect max.err error counts
          1 3822853 7121334.8 0 3822853
          2 1342103 1780333.7 0 1342103
          3 562083 445083.4 0 562083
          4 338904 111270.9 0 338904
          5 305930 27817.7 0 305930
          6 292726 6954.4 0 292726
          7 293029 1738.6 0 293029
          8 295266 434.7 0 295266
          9 314323 108.7 0 313753 570
          10 314996 27.2 1 308528 6468
          11 294907 6.8 1 288423 6484
          12 301352 1.7 1 294428 6924
          13 306345 0.4 1 298669 7676
          14 308686 0.4 1 301195 7491
          15 292770 0.4 1 285216 7554
          16 297400 0.4 1 289598 7802
          17 293038 0.4 1 285169 7869
          18 289457 0.4 1 281581 7876
          19 299679 0.4 1 291706 7973
          20 297202 0.4 1 289126 8076
          21 300660 0.4 1 292170 8490
          22 286125 0.4 1 278406 7719
          23 269566 0.4 1 261467 8099
          24 274398 0.4 1 266280 8118
          25 264378 0.4 1 257631 6747
          26 255768 0.4 1 248915 6853
          27 256549 0.4 1 249833 6716
          28 257268 0.4 1 250252 7016
          29 244909 0.4 1 238288 6621
          30 234995 0.4 1 229910 5085
          31 229964 0.4 1 224161 5803
          32 224294 0.4 1 219567 4727
          33 205954 0.4 1 201559 4395
          34 201649 0.4 1 197346 4303
          35 197177 0.4 1 192772 4405
          36 166376 0.4 1 162610 3766
          37 164498 0.4 1 160888 3610
          38 154155 0.4 1 150759 3396
          39 149720 0.4 1 146403 3317
          40 147538 0.4 1 144226 3312
          41 139000 0.4 1 135824 3176
          42 117928 0.4 1 115068 2860
          43 150887 0.4 1 147697 3190
          44 74900 0.4 1 73163 1737
          45 85284 0.4 1 83407 1877
          46 84122 0.4 1 82252 1870
          47 79113 0.4 1 77412 1701
          48 71166 0.4 1 69632 1534
          49 71476 0.4 1 69884 1592
          50 64563 0.4 1 63123 1440
          51 62861 0.4 1 61509 1352
          52 53459 0.4 1 52311 1148
          53 50043 0.4 1 48999 1044
          54 46537 0.4 1 45491 1046
          55 41425 0.4 1 40523 902
          56 33841 0.4 1 33125 716
          57 30992 0.4 1 30383 609
          58 27455 0.4 1 26913 542
          59 27536 0.4 1 26969 567
          60 23792 0.4 1 23350 442
          61 21538 0.4 1 21073 465
          62 18972 0.4 1 18504 468
          63 18545 0.4 1 18096 449
          64 15370 0.4 1 14978 392
          65 14415 0.4 1 14016 399
          66 12971 0.4 1 12621 350
          67 11121 0.4 1 10788 333
          68 10333 0.4 1 10010 323
          69 9483 0.4 1 9121 362
          70 8785 0.4 1 8313 472
          71 8295 0.4 1 7621 674
          72 7952 0.4 1 6994 958
          73 8569 0.4 1 6772 1797
          74 11545 0.4 1 6819 4726
          75 40013 0.4 1 7295 32718
          76 23307 0.4 1 21496 1811
          77 4013 0.4 1 3591 422
          78 1490 0.4 1 1251 239
          79 792 0.4 1 628 164
          80 599 0.4 1 448 151
          81 481 0.4 1 342 139
          82 463 0.4 1 314 149
          83 445 0.4 1 281 164
          84 410 0.4 1 248 162
          85 358 0.4 1 212 146
          86 346 0.4 1 180 166
          87 300 0.4 1 138 162
          88 283 0.4 1 137 146
          89 249 0.4 1 108 141
          90 224 0.4 1 100 124
          91 221 0.4 1 89 132
          92 203 0.4 1 64 139
          93 180 0.4 1 61 119
          94 143 0.4 1 36 107
          95 157 0.4 1 34 123
          96 160 0.4 1 30 130
          97 124 0.4 1 25 99
          98 133 0.4 1 24 109
          99 108 0.4 1 17 91
          100 119 0.4 1 6 113
          101 109 0.4 1 15 94
          102 100 0.4 1 4 96
          103 95 0.4 1 8 87
          104 92 0.4 1 6 86
          105 113 0.4 1 1 112
          106 113 0.4 1 5 108
          107 118 0.4 1 5 113
          108 123 0.4 1 2 121
          109 121 0.4 1 2 119
          110 134 0.4 1 2 132
          111 119 0.4 1 5 114
          112 127 0.4 1 11 116
          113 116 0.4 1 14 102
          114 139 0.4 1 13 126
          115 126 0.4 1 5 121
          116 123 0.4 1 7 116
          117 157 0.4 1 5 152
          118 161 0.4 1 2 159
          119 167 0.4 1 2 165
          120 205 0.4 1 6 199
          121 273 0.4 1 8 265
          122 250 0.4 1 7 243
          123 428 0.4 1 10 418
          124 774 0.4 1 2 772
          125 1930 0.4 1 6 1924
          126 3376 0.4 1 4 3372


          RUN STATISTICS FOR INPUT FILE: index23_GAGTGG_L001-L002_R1_001.fastq
          =============================================
          28485339 sequences processed in total

          completed 2:

          SUMMARISING RUN PARAMETERS
          ==========================
          Input filename: index23_GAGTGG_L001-L002_R2_001.fastq
          Trimming mode: paired-end
          Trim Galore version: 0.4.1
          Cutadapt version: 1.9.1
          Quality Phred score cutoff: 20
          Quality encoding type selected: ASCII+33
          Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
          Maximum trimming error rate: 0.1 (default)
          Minimum required adapter overlap (stringency): 1 bp
          Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp


          This is cutadapt 1.9.1 with Python 2.7.10
          Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC index23_GAGTGG_L001-L002_R2_001.fastq
          Trimming 1 adapter with at most 10.0% errors in single-end mode ...
          Finished in 894.72 s (31 us/read; 1.91 M reads/minute).

          === Summary ===

          Total reads processed: 28,485,339
          Reads with adapters: 18,157,389 (63.7%)
          Reads written (passing filters): 28,485,339 (100.0%)

          Total basepairs processed: 3,589,152,714 bp
          Quality-trimmed: 11,062,843 bp (0.3%)
          Total written (filtered): 3,294,805,118 bp (91.8%)

          === Adapter 1 ===

          Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18157389 times.

          No. of allowed errors:
          0-9 bp: 0; 10-13 bp: 1

          Bases preceding removed adapters:
          A: 18.4%
          C: 28.5%
          G: 39.5%
          T: 13.3%
          none/other: 0.3%

          Overview of removed sequences
          length count expect max.err error counts
          1 4668833 7121334.8 0 4668833
          2 1551414 1780333.7 0 1551414
          3 711239 445083.4 0 711239
          4 344741 111270.9 0 344741
          5 310825 27817.7 0 310825
          6 297044 6954.4 0 297044
          7 303310 1738.6 0 303310
          8 287034 434.7 0 287034
          9 333133 108.7 0 332495 638
          10 312592 27.2 1 307059 5533
          11 286908 6.8 1 281440 5468
          12 324224 1.7 1 317722 6502
          13 281828 0.4 1 275947 5881
          14 364803 0.4 1 357092 7711
          15 248030 0.4 1 242074 5956
          16 294938 0.4 1 288213 6725
          17 374860 0.4 1 366200 8660
          18 212737 0.4 1 207831 4906
          19 319230 0.4 1 312760 6470
          20 271034 0.4 1 264706 6328
          21 280398 0.4 1 273829 6569
          22 283482 0.4 1 276986 6496
          23 273291 0.4 1 266428 6863
          24 316548 0.4 1 308790 7758
          25 217447 0.4 1 211611 5836
          26 259869 0.4 1 253384 6485
          27 271923 0.4 1 265176 6747
          28 264485 0.4 1 258666 5819
          29 221191 0.4 1 215980 5211
          30 285155 0.4 1 279264 5891
          31 181353 0.4 1 177458 3895
          32 214689 0.4 1 210267 4422
          33 214503 0.4 1 210240 4263
          34 212749 0.4 1 208272 4477
          35 177297 0.4 1 173613 3684
          36 173905 0.4 1 170248 3657
          37 164687 0.4 1 161249 3438
          38 162012 0.4 1 158619 3393
          39 139466 0.4 1 136617 2849
          40 139275 0.4 1 136201 3074
          41 133976 0.4 1 131087 2889
          42 143008 0.4 1 139705 3303
          43 98394 0.4 1 96109 2285
          44 101487 0.4 1 99195 2292
          45 108791 0.4 1 106222 2569
          46 80329 0.4 1 78401 1928
          47 76402 0.4 1 74674 1728
          48 74376 0.4 1 72843 1533
          49 64568 0.4 1 63238 1330
          50 67016 0.4 1 65615 1401
          51 72066 0.4 1 70715 1351
          52 43802 0.4 1 42900 902
          53 51200 0.4 1 50281 919
          54 39798 0.4 1 38959 839
          55 41369 0.4 1 40582 787
          56 34005 0.4 1 33317 688
          57 29637 0.4 1 29067 570
          58 28815 0.4 1 28227 588
          59 25798 0.4 1 25286 512
          60 24621 0.4 1 24062 559
          61 22089 0.4 1 21540 549
          62 20351 0.4 1 19643 708
          63 18922 0.4 1 18103 819
          64 17531 0.4 1 16360 1171
          65 17085 0.4 1 14933 2152
          66 18828 0.4 1 14404 4424
          67 50417 0.4 1 15488 34929
          68 68997 0.4 1 64574 4423
          69 10693 0.4 1 9940 753
          70 3339 0.4 1 3028 311
          71 1922 0.4 1 1694 228
          72 1216 0.4 1 1031 185
          73 910 0.4 1 748 162
          74 824 0.4 1 635 189
          75 688 0.4 1 524 164
          76 631 0.4 1 479 152
          77 556 0.4 1 371 185
          78 518 0.4 1 351 167
          79 403 0.4 1 264 139
          80 412 0.4 1 227 185
          81 359 0.4 1 215 144
          82 325 0.4 1 188 137
          83 279 0.4 1 170 109
          84 259 0.4 1 138 121
          85 198 0.4 1 104 94
          86 180 0.4 1 90 90
          87 174 0.4 1 81 93
          88 176 0.4 1 74 102
          89 152 0.4 1 58 94
          90 132 0.4 1 41 91
          91 136 0.4 1 32 104
          92 120 0.4 1 33 87
          93 98 0.4 1 26 72
          94 111 0.4 1 16 95
          95 112 0.4 1 19 93
          96 84 0.4 1 12 72
          97 100 0.4 1 12 88
          98 78 0.4 1 13 65
          99 75 0.4 1 11 64
          100 101 0.4 1 5 96
          101 79 0.4 1 8 71
          102 59 0.4 1 5 54
          103 81 0.4 1 1 80
          104 78 0.4 1 2 76
          105 78 0.4 1 2 76
          106 103 0.4 1 2 101
          107 74 0.4 1 5 69
          108 76 0.4 1 3 73
          109 104 0.4 1 0 104
          110 64 0.4 1 1 63
          111 98 0.4 1 7 91
          112 81 0.4 1 4 77
          113 89 0.4 1 8 81
          114 100 0.4 1 11 89
          115 73 0.4 1 2 71
          116 115 0.4 1 5 110
          117 112 0.4 1 1 111
          118 125 0.4 1 3 122
          119 131 0.4 1 2 129
          120 124 0.4 1 3 121
          121 137 0.4 1 4 133
          122 166 0.4 1 3 163
          123 230 0.4 1 1 229
          124 422 0.4 1 0 422
          125 1138 0.4 1 2 1136
          126 1931 0.4 1 2 1929


          RUN STATISTICS FOR INPUT FILE: index23_GAGTGG_L001-L002_R2_001.fastq
          =============================================
          28485339 sequences processed in total

          Total number of sequences analysed for the sequence pair length validation: 28485339

          Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 72318 (0.25%)

          Comment


          • so now my question is are the processes that are running with 0% still going, and I should just be patient?

            Comment


            • To be perfectly honest I really don't know why your Python threads are slowing down to 0%, all individual pieces of software seem to run fine (and to completion apart from this sample). Maybe someone else can chip in here?

              If it just doesn't finished why don't you just modify the Cutadapt command you tried above to run as paired-end sample (this might require you specify and adapter 2, but you can use the same sequence for that). Sorry I can't be of more help, I have never seen such a behaviour before...

              Comment


              • Hi Felix,
                Although this may not be advisable to others, I decided to # out the installed python on my .bash_profile and install python with brew. Things are working fine now. Sorry for the trouble. maybe was an IDLE issue, not sure. again, i appreciate your time and all the best to you.

                Comment


                • Glad that it seems to be working now though! Best, Felix

                  Comment


                  • most of my samples have a spike-in lambda unmethylated DNA. I mapped them to lambda genome and calculated the efficiency. I do not have spike-in for one sample. I have to check the conversion efficiency for the red unmethylated C (see below) introduced at the 3’ when end-repair was done. for my bismark pipeline, trim_galore will remove this unmehtylated C if there is adaptor contamination. I read it here http://www.bioinformatics.babraham.a...RRBS_Guide.pdf



                    How do I calculate it? I need to take the fastqs and trim-off the adaptors, but not the last 2 bases at 3', map with bismark, and check how many Ts are at the end of the each read? Is there any script to do so?

                    Thanks, Ming

                    Comment


                    • Hi Ming,

                      I used to have a script that would do this, I can send it over tomorrow if I manage to find it. Best, Felix

                      Comment


                      • Originally posted by fkrueger View Post
                        Hi Ming,

                        I used to have a script that would do this, I can send it over tomorrow if I manage to find it. Best, Felix
                        That would be very helpful!
                        Thanks!

                        Comment


                        • Here it is. It is looking for an overlap with the adapter from the end, as it stands 5bp (you can change this in this line: my $required_adapter_overlap = 5; ), and should then give you some useful output about the conversion efficiency at the end. You may want to run it with a few different lengths to see if that makes a difference. Let me know if there are any questions. Cheers, Felix
                          Attached Files

                          Comment


                          • check 4 cases?

                            Originally posted by fkrueger View Post
                            Here it is. It is looking for an overlap with the adapter from the end, as it stands 5bp (you can change this in this line: my $required_adapter_overlap = 5; ), and should then give you some useful output about the conversion efficiency at the end. You may want to run it with a few different lengths to see if that makes a difference. Let me know if there are any questions. Cheers, Felix
                            Thanks for the script.
                            what I thought:

                            I will need to check CCG + adaptor or TCG + adaptor for unmethylated filled-in Cs.
                            and CTG + adaptor or TTG + adaptor for methylated filled-in Cs.

                            e.g. a full-length read:

                            TGGATGTTGGTTGTGGTTAGTATTCGAGATCGGAAG

                            It stats with TGG, so it is not methylated in the genome, but check at the bold part, it start with TCG, so the filed-in C are unmethylated (not converted successfully by bisulfite)

                            I only saw in your script you checked TTG + adaptor and TCG+ adaptor.
                            Do I need to check CTG and CCG as well?

                            Please let me know if I am correct or not RRBS is new for me.

                            Comment


                            • I would say in theory yes, but since we were working in mammalian genomes where you would expect a non-CG methylation of <1% we just assumed that the C before the CG is always converted.

                              While looking at the script I noticed that there is another place you need to change, because when we did this back in 2011 our reads were 40 bp long.

                              So you need to locate the lines (should be two times) that say
                              Code:
                              my $poi = 40 - length($rest)-3;
                              and change the 40 to your read length, or even better change it to

                              Code:
                              length($sequence)
                              so that this works for any read length.

                              Comment


                              • Originally posted by fkrueger View Post
                                I would say in theory yes, but since we were working in mammalian genomes where you would expect a non-CG methylation of <1% we just assumed that the C before the CG is always converted.

                                While looking at the script I noticed that there is another place you need to change, because when we did this back in 2011 our reads were 40 bp long.

                                So you need to locate the lines (should be two times) that say
                                Code:
                                my $poi = 40 - length($rest)-3;
                                and change the 40 to your read length, or even better change it to

                                Code:
                                length($sequence)
                                so that this works for any read length.
                                Thanks, I noticed that as well and change the length accordingly.

                                We are checking the bisulfite conversion rate.
                                Although in the human genome, non-CpGs are unmethylated, if there is a bisulfite conversion failure, they will remain as Cs, and not converted to Ts.
                                we will miss a lot of sequences with CCG and probably few of the CTG

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Innovations in Spatial Biology
                                  by seqadmin


                                  Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                                  3D Genomics
                                  While spatial biology often involves studying proteins and RNAs in their...
                                  Yesterday, 07:30 PM
                                • seqadmin
                                  Advancing Precision Medicine for Rare Diseases in Children
                                  by seqadmin




                                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                  12-16-2024, 07:57 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 12-30-2024, 01:35 PM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-17-2024, 10:28 AM
                                0 responses
                                41 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-13-2024, 08:24 AM
                                0 responses
                                55 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 12-12-2024, 07:41 AM
                                0 responses
                                40 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X