SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
paired-end adapter trimming vinay052003 Bioinformatics 16 05-02-2017 08:58 PM
Paired-end Illumina RNA-seq adapter trimming fabrice Bioinformatics 8 01-05-2015 08:48 AM
fastq-mcf paired end adapter trimming gevielr Bioinformatics 4 04-17-2014 03:52 AM
TopHat extremely low paired mapping rate. PLS HELP! Felix.Lee RNA Sequencing 0 05-22-2013 02:00 PM

Reply
 
Thread Tools
Old 02-10-2017, 05:13 PM   #1
foolishbrat
Member
 
Location: South East Asia

Join Date: Nov 2008
Posts: 44
Default Why mapping rate is low after adapter trimming paired-end data?

I have the pipeline with the following trimming and mapping parameter:



Code:
 trim_galore --stringency 5 --dont_gzip --trim1 --length 30 -q 0 -a CTGTCTCTTATACACATCT  -o $TRIMGALORE_DIR $READ1 $READ2
    
 bowtie $Bowtie_Index_File -1 $TRIMGALORE_R1 -2 $TRIMGALORE_R2 --threads 5 -m 1 -v 2 -S -I 0 -X 2000
Raw paired-end mapping (without adapter trimming) gives this result

Code:
**********************************************
    Stats for BAM file(s):
    **********************************************
    
    Total reads:       169555726
    Mapped reads:      97870984    (57.722%)
    Forward strand:    120620234    (71.139%)
    Reverse strand:    48935492    (28.861%)
    Failed QC:         0    (0%)
    Duplicates:        0    (0%)
    Paired-end reads:  169555726    (100%)
    'Proper-pairs':    97870984    (57.722%)
    Both pairs mapped: 97870984    (57.722%)
    Read 1:            84777863
    Read 2:            84777863
    Singletons:        0    (0%)
    Average insert size (absolute value): 125.862
    Median insert size (absolute value): 99
But after running it with Trim_Galore I get this:

Code:
  **********************************************
    Stats for BAM file(s):
    **********************************************
    
    Total reads:       168972000
    Mapped reads:      314228    (0.185965%)
    Forward strand:    168814886    (99.907%)
    Reverse strand:    157114    (0.0929823%)
    Failed QC:         0    (0%)
    Duplicates:        0    (0%)
    Paired-end reads:  168972000    (100%)
    'Proper-pairs':    314228    (0.185965%)
    Both pairs mapped: 314228    (0.185965%)
    Read 1:            84486000
    Read 2:            84486000
    Singletons:        0    (0%)
    Average insert size (absolute value): 571.141
    Median insert size (absolute value): 297
So the mapping rates drop radically from 57% to 0.18%.

I find it strange because the trim galore only use trim away very little sequence.
Why was it and what's the right parameter should I use for both Bowtie and Cutadapt/trim_galore?
Code:
     SUMMARISING RUN PARAMETERS
    ==========================
    Input filename: /home/ubuntu/R1_001.fastq
    Trimming mode: single-end
    Trim Galore version: 0.4.1
    Cutadapt version: 1.12
    Quality Phred score cutoff: 0
    Quality encoding type selected: ASCII+33
    Adapter sequence: 'CTGTCTCTTATACACATCT' ()
    Maximum trimming error rate: 0.1 (default)
    Minimum required adapter overlap (stringency): 5 bp
    Minimum required sequence length before a sequence gets removed: 30 bp
    All sequences will be trimmed by 1 bp on their 3' end to avoid problems with invalid paired-end alignments with Bowtie 1
    Writing final adapter and quality trimmed output to BB_89_S1_R1_001_trimmed.fq
      >>> Now performing quality (cutoff 0) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATACACATCT' from file /home/ubuntu/R1_001.fastq <<< 
    10000000 sequences processed
    20000000 sequences processed
    30000000 sequences processed
    40000000 sequences processed
    50000000 sequences processed
    60000000 sequences processed
    70000000 sequences processed
    80000000 sequences processed
    This is cutadapt 1.12 with Python 2.7.13
    Command line parameters: -f fastq -e 0.1 -q 0 -O 5 -a CTGTCTCTTATACACATCT /home/ubuntu/R1_001.fastq
    Trimming 1 adapter with at most 10.0% errors in single-end mode ...
    Finished in 804.75 s (9 us/read; 6.32 M reads/minute).
    === Summary ===
    Total reads processed:              84,777,863
    Reads with adapters:                   424,374 (0.5%)
    Reads written (passing filters):    84,777,863 (100.0%)
    Total basepairs processed: 3,052,003,068 bp
    Quality-trimmed:                       0 bp (0.0%)
    Total written (filtered):  3,048,935,811 bp (99.9%)
    === Adapter 1 ===
    Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 424374 times.
    No. of allowed errors:
    0-9 bp: 0; 10-19 bp: 1
    Bases preceding removed adapters:
      A: 13.7%
      C: 36.7%
      G: 25.3%
      T: 24.2%
      none/other: 0.0%
    Overview of removed sequences
    length  count   expect  max.err error counts
    5   132511  82790.9 0   132511
    6   72231   20697.7 0   72231
    7   50748   5174.4  0   50748
    8   43106   1293.6  0   43106
    9   53661   323.4   0   50630 3031
    10  50660   80.9    1   44674 5986
    11  8616    20.2    1   6320 2296
    12  2894    5.1 1   1601 1293
    13  2928    1.3 1   2706 222
    14  995 0.3 1   906 89
    15  2681    0.1 1   2570 111
    16  620 0.0 1   598 22
    17  1622    0.0 1   1578 44
    18  263 0.0 1   255 8
    19  460 0.0 1   446 14
    20  86  0.0 1   84 2
    21  107 0.0 1   103 4
    22  49  0.0 1   42 7
    23  28  0.0 1   23 5
    24  11  0.0 1   10 1
    25  8   0.0 1   7 1
    26  7   0.0 1   6 1
    27  23  0.0 1   22 1
    28  8   0.0 1   8
    29  1   0.0 1   1
    33  2   0.0 1   1 1
    35  1   0.0 1   0 1
    36  47  0.0 1   45 2
    RUN STATISTICS FOR INPUT FILE: /home/ubuntu/R1_001.fastq
    =============================================
    84777863 sequences processed in total
    Sequences removed because they became shorter than the length cutoff of 30 bp:  291863 (0.3%)
    Writing report to 'trim_galore_dir/R2_001.fastq_trimming_report.txt'
    SUMMARISING RUN PARAMETERS
    ==========================
    Input filename: /home/ubuntu/R2_001.fastq
    Trimming mode: single-end
    Trim Galore version: 0.4.1
    Cutadapt version: 1.12
    Quality Phred score cutoff: 0
    Quality encoding type selected: ASCII+33
    Adapter sequence: 'CTGTCTCTTATACACATCT' ()
    Maximum trimming error rate: 0.1 (default)
    Minimum required adapter overlap (stringency): 5 bp
    Minimum required sequence length before a sequence gets removed: 30 bp
    All sequences will be trimmed by 1 bp on their 3' end to avoid problems with invalid paired-end alignments with Bowtie 1
    Writing final adapter and quality trimmed output to BB_89_S1_R2_001_trimmed.fq
      >>> Now performing quality (cutoff 0) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATACACATCT' from file /home/ubuntu/R2_001.fastq <<< 
    10000000 sequences processed
    20000000 sequences processed
    30000000 sequences processed
    40000000 sequences processed
    50000000 sequences processed
    60000000 sequences processed
    70000000 sequences processed
    80000000 sequences processed
    This is cutadapt 1.12 with Python 2.7.13
    Command line parameters: -f fastq -e 0.1 -q 0 -O 5 -a CTGTCTCTTATACACATCT /home/ubuntu/R2_001.fastq
    Trimming 1 adapter with at most 10.0% errors in single-end mode ...
    Finished in 814.43 s (10 us/read; 6.25 M reads/minute).
    === Summary ===
    Total reads processed:              84,777,863
    Reads with adapters:                   423,690 (0.5%)
    Reads written (passing filters):    84,777,863 (100.0%)
    Total basepairs processed: 3,052,003,068 bp
    Quality-trimmed:                       0 bp (0.0%)
    Total written (filtered):  3,048,945,359 bp (99.9%)
    === Adapter 1 ===
    Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 423690 times.
    No. of allowed errors:
    0-9 bp: 0; 10-19 bp: 1
    Bases preceding removed adapters:
      A: 13.7%
      C: 36.1%
      G: 25.4%
      T: 24.8%
      none/other: 0.0%
    Overview of removed sequences
    length  count   expect  max.err error counts
    5   132783  82790.9 0   132783
    6   72585   20697.7 0   72585
    7   50354   5174.4  0   50354
    8   43191   1293.6  0   43191
    9   53096   323.4   0   49979 3117
    10  50491   80.9    1   44467 6024
    11  8598    20.2    1   6158 2440
    12  2915    5.1 1   1566 1349
    13  2880    1.3 1   2625 255
    14  959 0.3 1   865 94
    15  2597    0.1 1   2493 104
    16  601 0.0 1   570 31
    17  1573    0.0 1   1520 53
    18  256 0.0 1   240 16
    19  447 0.0 1   430 17
    20  83  0.0 1   77 6
    21  101 0.0 1   87 14
    22  46  0.0 1   40 6
    23  28  0.0 1   26 2
    24  12  0.0 1   10 2
    25  8   0.0 1   7 1
    26  7   0.0 1   6 1
    27  23  0.0 1   20 3
    28  8   0.0 1   7 1
    29  1   0.0 1   1
    33  2   0.0 1   1 1
    36  45  0.0 1   44 1
foolishbrat is offline   Reply With Quote
Old 02-10-2017, 08:11 PM   #2
jdk787
josh kinman
 
Location: Austin

Join Date: Apr 2014
Posts: 61
Default

It looks like you are trimming paired end files in single end mode.
Try switching to paired end mode when trimming.

From Cutadapt User Guide
Code:
cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
__________________
Josh Kinman
jdk787 is offline   Reply With Quote
Old 02-11-2017, 04:35 AM   #3
foolishbrat
Member
 
Location: South East Asia

Join Date: Nov 2008
Posts: 44
Default

Quote:
Originally Posted by jdk787 View Post
It looks like you are trimming paired end files in single end mode.
Try switching to paired end mode when trimming.

From Cutadapt User Guide
Code:
cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
Hi Josh,

Tried that. But still give low mapping rate.

FB.
foolishbrat is offline   Reply With Quote
Old 02-11-2017, 06:04 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Are you sure you haven't made some kind of syntax error when running Bowtie after the trimming, or maybe there is something wrong with the format of the files after trimming?

Because, as you mentioned, cutadapt only removed adapters from a small number of reads, so you would expect at most a small difference in the number of reads aligned, and usually you would expect an improvement.
mastal is offline   Reply With Quote
Old 02-11-2017, 10:26 AM   #5
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Cross-posted and answered here:

https://www.biostars.org/p/236127/
Brian Bushnell is offline   Reply With Quote
Reply

Tags
adapter, bowtie, cutadapt, mapping, trimgalore

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:22 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO