Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why mapping rate is low after adapter trimming paired-end data?

    I have the pipeline with the following trimming and mapping parameter:



    Code:
     trim_galore --stringency 5 --dont_gzip --trim1 --length 30 -q 0 -a CTGTCTCTTATACACATCT  -o $TRIMGALORE_DIR $READ1 $READ2
        
     bowtie $Bowtie_Index_File -1 $TRIMGALORE_R1 -2 $TRIMGALORE_R2 --threads 5 -m 1 -v 2 -S -I 0 -X 2000
    Raw paired-end mapping (without adapter trimming) gives this result

    Code:
    **********************************************
        Stats for BAM file(s):
        **********************************************
        
        Total reads:       169555726
        Mapped reads:      97870984    (57.722%)
        Forward strand:    120620234    (71.139%)
        Reverse strand:    48935492    (28.861%)
        Failed QC:         0    (0%)
        Duplicates:        0    (0%)
        Paired-end reads:  169555726    (100%)
        'Proper-pairs':    97870984    (57.722%)
        Both pairs mapped: 97870984    (57.722%)
        Read 1:            84777863
        Read 2:            84777863
        Singletons:        0    (0%)
        Average insert size (absolute value): 125.862
        Median insert size (absolute value): 99
    But after running it with Trim_Galore I get this:

    Code:
      **********************************************
        Stats for BAM file(s):
        **********************************************
        
        Total reads:       168972000
        Mapped reads:      314228    (0.185965%)
        Forward strand:    168814886    (99.907%)
        Reverse strand:    157114    (0.0929823%)
        Failed QC:         0    (0%)
        Duplicates:        0    (0%)
        Paired-end reads:  168972000    (100%)
        'Proper-pairs':    314228    (0.185965%)
        Both pairs mapped: 314228    (0.185965%)
        Read 1:            84486000
        Read 2:            84486000
        Singletons:        0    (0%)
        Average insert size (absolute value): 571.141
        Median insert size (absolute value): 297
    So the mapping rates drop radically from 57% to 0.18%.

    I find it strange because the trim galore only use trim away very little sequence.
    Why was it and what's the right parameter should I use for both Bowtie and Cutadapt/trim_galore?
    Code:
         SUMMARISING RUN PARAMETERS
        ==========================
        Input filename: /home/ubuntu/R1_001.fastq
        Trimming mode: single-end
        Trim Galore version: 0.4.1
        Cutadapt version: 1.12
        Quality Phred score cutoff: 0
        Quality encoding type selected: ASCII+33
        Adapter sequence: 'CTGTCTCTTATACACATCT' ()
        Maximum trimming error rate: 0.1 (default)
        Minimum required adapter overlap (stringency): 5 bp
        Minimum required sequence length before a sequence gets removed: 30 bp
        All sequences will be trimmed by 1 bp on their 3' end to avoid problems with invalid paired-end alignments with Bowtie 1
        Writing final adapter and quality trimmed output to BB_89_S1_R1_001_trimmed.fq
          >>> Now performing quality (cutoff 0) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATACACATCT' from file /home/ubuntu/R1_001.fastq <<< 
        10000000 sequences processed
        20000000 sequences processed
        30000000 sequences processed
        40000000 sequences processed
        50000000 sequences processed
        60000000 sequences processed
        70000000 sequences processed
        80000000 sequences processed
        This is cutadapt 1.12 with Python 2.7.13
        Command line parameters: -f fastq -e 0.1 -q 0 -O 5 -a CTGTCTCTTATACACATCT /home/ubuntu/R1_001.fastq
        Trimming 1 adapter with at most 10.0% errors in single-end mode ...
        Finished in 804.75 s (9 us/read; 6.32 M reads/minute).
        === Summary ===
        Total reads processed:              84,777,863
        Reads with adapters:                   424,374 (0.5%)
        Reads written (passing filters):    84,777,863 (100.0%)
        Total basepairs processed: 3,052,003,068 bp
        Quality-trimmed:                       0 bp (0.0%)
        Total written (filtered):  3,048,935,811 bp (99.9%)
        === Adapter 1 ===
        Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 424374 times.
        No. of allowed errors:
        0-9 bp: 0; 10-19 bp: 1
        Bases preceding removed adapters:
          A: 13.7%
          C: 36.7%
          G: 25.3%
          T: 24.2%
          none/other: 0.0%
        Overview of removed sequences
        length  count   expect  max.err error counts
        5   132511  82790.9 0   132511
        6   72231   20697.7 0   72231
        7   50748   5174.4  0   50748
        8   43106   1293.6  0   43106
        9   53661   323.4   0   50630 3031
        10  50660   80.9    1   44674 5986
        11  8616    20.2    1   6320 2296
        12  2894    5.1 1   1601 1293
        13  2928    1.3 1   2706 222
        14  995 0.3 1   906 89
        15  2681    0.1 1   2570 111
        16  620 0.0 1   598 22
        17  1622    0.0 1   1578 44
        18  263 0.0 1   255 8
        19  460 0.0 1   446 14
        20  86  0.0 1   84 2
        21  107 0.0 1   103 4
        22  49  0.0 1   42 7
        23  28  0.0 1   23 5
        24  11  0.0 1   10 1
        25  8   0.0 1   7 1
        26  7   0.0 1   6 1
        27  23  0.0 1   22 1
        28  8   0.0 1   8
        29  1   0.0 1   1
        33  2   0.0 1   1 1
        35  1   0.0 1   0 1
        36  47  0.0 1   45 2
        RUN STATISTICS FOR INPUT FILE: /home/ubuntu/R1_001.fastq
        =============================================
        84777863 sequences processed in total
        Sequences removed because they became shorter than the length cutoff of 30 bp:  291863 (0.3%)
        Writing report to 'trim_galore_dir/R2_001.fastq_trimming_report.txt'
        SUMMARISING RUN PARAMETERS
        ==========================
        Input filename: /home/ubuntu/R2_001.fastq
        Trimming mode: single-end
        Trim Galore version: 0.4.1
        Cutadapt version: 1.12
        Quality Phred score cutoff: 0
        Quality encoding type selected: ASCII+33
        Adapter sequence: 'CTGTCTCTTATACACATCT' ()
        Maximum trimming error rate: 0.1 (default)
        Minimum required adapter overlap (stringency): 5 bp
        Minimum required sequence length before a sequence gets removed: 30 bp
        All sequences will be trimmed by 1 bp on their 3' end to avoid problems with invalid paired-end alignments with Bowtie 1
        Writing final adapter and quality trimmed output to BB_89_S1_R2_001_trimmed.fq
          >>> Now performing quality (cutoff 0) and adapter trimming in a single pass for the adapter sequence: 'CTGTCTCTTATACACATCT' from file /home/ubuntu/R2_001.fastq <<< 
        10000000 sequences processed
        20000000 sequences processed
        30000000 sequences processed
        40000000 sequences processed
        50000000 sequences processed
        60000000 sequences processed
        70000000 sequences processed
        80000000 sequences processed
        This is cutadapt 1.12 with Python 2.7.13
        Command line parameters: -f fastq -e 0.1 -q 0 -O 5 -a CTGTCTCTTATACACATCT /home/ubuntu/R2_001.fastq
        Trimming 1 adapter with at most 10.0% errors in single-end mode ...
        Finished in 814.43 s (10 us/read; 6.25 M reads/minute).
        === Summary ===
        Total reads processed:              84,777,863
        Reads with adapters:                   423,690 (0.5%)
        Reads written (passing filters):    84,777,863 (100.0%)
        Total basepairs processed: 3,052,003,068 bp
        Quality-trimmed:                       0 bp (0.0%)
        Total written (filtered):  3,048,945,359 bp (99.9%)
        === Adapter 1 ===
        Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 423690 times.
        No. of allowed errors:
        0-9 bp: 0; 10-19 bp: 1
        Bases preceding removed adapters:
          A: 13.7%
          C: 36.1%
          G: 25.4%
          T: 24.8%
          none/other: 0.0%
        Overview of removed sequences
        length  count   expect  max.err error counts
        5   132783  82790.9 0   132783
        6   72585   20697.7 0   72585
        7   50354   5174.4  0   50354
        8   43191   1293.6  0   43191
        9   53096   323.4   0   49979 3117
        10  50491   80.9    1   44467 6024
        11  8598    20.2    1   6158 2440
        12  2915    5.1 1   1566 1349
        13  2880    1.3 1   2625 255
        14  959 0.3 1   865 94
        15  2597    0.1 1   2493 104
        16  601 0.0 1   570 31
        17  1573    0.0 1   1520 53
        18  256 0.0 1   240 16
        19  447 0.0 1   430 17
        20  83  0.0 1   77 6
        21  101 0.0 1   87 14
        22  46  0.0 1   40 6
        23  28  0.0 1   26 2
        24  12  0.0 1   10 2
        25  8   0.0 1   7 1
        26  7   0.0 1   6 1
        27  23  0.0 1   20 3
        28  8   0.0 1   7 1
        29  1   0.0 1   1
        33  2   0.0 1   1 1
        36  45  0.0 1   44 1

  • #2
    It looks like you are trimming paired end files in single end mode.
    Try switching to paired end mode when trimming.

    From Cutadapt User Guide
    Code:
    cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
    Josh Kinman

    Comment


    • #3
      Originally posted by jdk787 View Post
      It looks like you are trimming paired end files in single end mode.
      Try switching to paired end mode when trimming.

      From Cutadapt User Guide
      Code:
      cutadapt -a ADAPTER_FWD -A ADAPTER_REV -o out.1.fastq -p out.2.fastq reads.1.fastq reads.2.fastq
      Hi Josh,

      Tried that. But still give low mapping rate.

      FB.

      Comment


      • #4
        Are you sure you haven't made some kind of syntax error when running Bowtie after the trimming, or maybe there is something wrong with the format of the files after trimming?

        Because, as you mentioned, cutadapt only removed adapters from a small number of reads, so you would expect at most a small difference in the number of reads aligned, and usually you would expect an improvement.

        Comment


        • #5
          Cross-posted and answered here:

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM
          • seqadmin
            The Impact of AI in Genomic Medicine
            by seqadmin



            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
            02-26-2024, 02:07 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-14-2024, 06:13 AM
          0 responses
          33 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-08-2024, 08:03 AM
          0 responses
          72 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-07-2024, 08:13 AM
          0 responses
          81 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-06-2024, 09:51 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X