Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • priya
    Member
    • Apr 2013
    • 57

    Issues with ChipSeq mapping

    Hi,
    I have ChipSeq data, and I mapped the reads with bowtie aligner using mm10 as reference genome.
    Out of 16M reads processed, only 3M reads are uniquely mapped and 11M reads failed to align and 2M reads are not uniquely mapped.
    It looks soo strange that most of the reads are failed to align, I also tried to collect the unaligned sequences and run blast for several species on random 5000 sequences, the highest percentage of unaligned sequences were coming from Mouse.

    I couldnt find explaination, if there is not much species contamination in the chipseq samples, then why the reads are not aligning to the mouse genome.

    Also, choice of aligner instead of bowtie, if I use BWA will it make any difference..
    What could be the considerations for this problem??


    Here is the bowtie log

    # reads processed: 16561198
    # reads with at least one reported alignment: 3860809 (23.31%)
    # reads that failed to align: 10781729 (65.10%)
    # reads with alignments suppressed due to -m: 1918660 (11.59%)
    Reported 3860809 alignments to 1 output stream(s)
  • Chipper
    Senior Member
    • Mar 2008
    • 323

    #2
    My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.

    Comment

    • priya
      Member
      • Apr 2013
      • 57

      #3
      Originally posted by Chipper View Post
      My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.
      I checked through FASTQC and there is nothing poping in "overrepresented sequences"
      Is there any other ways to check if there is any problems with adapter contamination..

      The read length is 43 bp..

      Comment

      • Chipper
        Senior Member
        • Mar 2008
        • 323

        #4
        Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.

        Comment

        • priya
          Member
          • Apr 2013
          • 57

          #5
          Originally posted by Chipper View Post
          Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.
          Thank you for suggestions. I tried to look into those unaligned reads. For that, while running bowtie I add an extra parameter --un to save the unaligned reads.

          bowtie -t -p 1 -m 1 $index/mm10 -q -S sample.fastq --un unaligned.fq > sample.sam

          If I go through those unaligned reads fastq file, it doesnt contain N:s..

          For Example: If i just paste here only the second line (read sequence) from unaligned fastq file omitting the other 3 lines, they looks like below without N:s

          ATAAAACTGTATTTTTTTGTGAAGAATCAACAACAAGTGGGAC
          CCGGGCTTAGACAGCTCACATGAAAGGAAGGCCGTGCCACCTT
          CACTCGTTGTGAATCTATCCACCAAGTCAGATTTGAAAAATGC
          CAGTACAGACATAACTTATTAAAGCCTCTAGCAGGACAGCAAA
          CACTAAACAGGAACTGCAAACACAAATATGTTTGGCACACAAA
          TGTTTTAATTTTTTTTTTACAAATGTATCCATTATTATCGTTG
          GGGTAAGTTTGGCGCCGTGAGTGAAGGGGGCTTTGTTGCGGAA
          AATCTGTCTGTCCGTCTGTTCGTCTATCTGTCTGTCCGTCTGT
          GTGTGTGTGATGGGTCAGGTGTGTGTGTGTGTGTGTGTGATGC
          AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
          CCCCATTAGTTCCTGTCAAGGCAGAAGCTACTCTTCCTGGGGT
          ATTATGTCTTTGAGCAAGTTTATGTTTGAGTTAGTGAATTCAT
          TTTCTAAATTTTCCACCTTTTTCAGTTTTCCTCGCCATATTTC
          CATAATGTGATTTTGCCGTTGTTCTGTCTCTTTATTACATATC
          GACCAATGTTTAGTTTCTTAGAAATCAGATGCATGAAATAACC
          GCCGAACGACTCCTCTACCTCCTGCACCACTAACGCCCCCAAA

          Comment

          • Richard Finney
            Senior Member
            • Feb 2009
            • 701

            #6
            Sanity check:
            reverse complement of

            AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
            is
            ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT

            matches perfectly to mm10:chr6:103,649,175-103,649,217

            via http://genome.ucsc.edu/cgi-bin/hgBlat

            BLAT Search Results

            ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
            ---------------------------------------------------------------------------------------------------
            browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43


            Other reads match mm10 also.

            Verify that your reference sequence is good.
            Check distribution of chromosomes in your output sam; are any missing?
            Last edited by Richard Finney; 04-24-2015, 08:41 AM.

            Comment

            • Chipper
              Senior Member
              • Mar 2008
              • 323

              #7
              Reading the manual is also a good start:

              -m <int>
              Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

              Why would you use -m 1 with 43 bp reads?

              Comment

              • priya
                Member
                • Apr 2013
                • 57

                #8
                Originally posted by Richard Finney View Post
                Sanity check:
                reverse complement of

                AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
                is
                ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT

                matches perfectly to mm10:chr6:103,649,175-103,649,217

                via http://genome.ucsc.edu/cgi-bin/hgBlat

                BLAT Search Results

                ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
                ---------------------------------------------------------------------------------------------------
                browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43


                Other reads match mm10 also.

                Verify that your reference sequence is good.
                Check distribution of chromosomes in your output sam; are any missing?
                Hi Richard,
                Thank you ! I see that the most of reverse complement of reads matches to mm10. It looks little bit strange at this point?

                Also I checked my SAM header, it looks like all the chromosomes are included
                @SQ SN:chr1 LN:197195432
                @SQ SN:chr2 LN:181748087
                @SQ SN:chr3 LN:159599783
                @SQ SN:chr4 LN:155630120
                @SQ SN:chr5 LN:152537259
                @SQ SN:chr6 LN:149517037
                @SQ SN:chr7 LN:152524553
                @SQ SN:chr8 LN:131738871
                @SQ SN:chr9 LN:124076172
                @SQ SN:chr10 LN:129993255
                @SQ SN:chr11 LN:121843856
                @SQ SN:chr12 LN:121257530
                @SQ SN:chr13 LN:120284312
                @SQ SN:chr14 LN:125194864
                @SQ SN:chr15 LN:103494974
                @SQ SN:chr16 LN:98319150
                @SQ SN:chr17 LN:95272651
                @SQ SN:chr18 LN:90772031
                @SQ SN:chr19 LN:61342430
                @SQ SN:chrX LN:166650296
                @SQ SN:chrY LN:15902555
                @SQ SN:chrM LN:16299

                To cross-check: instead of using mm10 index built by me, I tried mm9 bowtie index available on Bowtie page , ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/

                But it doesnt make any change in the alignment percentage

                mm9- bowtie log
                # reads processed: 16561198
                # reads with at least one reported alignment: 3864950 (23.34%)
                # reads that failed to align: 10834676 (65.42%)
                # reads with alignments suppressed due to -m: 1861572 (11.24%)
                Last edited by priya; 05-06-2015, 05:56 AM.

                Comment

                • priya
                  Member
                  • Apr 2013
                  • 57

                  #9
                  Originally posted by Chipper View Post
                  Reading the manual is also a good start:

                  -m <int>
                  Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

                  Why would you use -m 1 with 43 bp reads?
                  In order to obtain uniquely mapped reads , discarding the multiple hits.
                  I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp
                  Last edited by priya; 04-27-2015, 01:42 AM.

                  Comment

                  • Chipper
                    Senior Member
                    • Mar 2008
                    • 323

                    #10
                    Originally posted by priya View Post
                    In order to obtain uniquely mapped reads , discarding the multiple hits.
                    I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp
                    Just use BWA and filter on mapping quality (e.g -q 20) to get uniquely mapped reads if you don't understand the bowtie options.
                    -m 1 will suppress even perfect matches if there are any other reported alignments within the tolerated error rate.

                    Comment

                    • SylvainL
                      Senior Member
                      • Feb 2012
                      • 180

                      #11
                      Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...

                      Comment

                      • priya
                        Member
                        • Apr 2013
                        • 57

                        #12
                        Originally posted by SylvainL View Post
                        Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...
                        I cross-checked with human genome, but only 0.2 % of unaligned reads are aligned to human genome. I dont think its the contamination issue. I have done sanity check..i could not find huge contamination of other species in the sample..

                        Comment

                        • SylvainL
                          Senior Member
                          • Feb 2012
                          • 180

                          #13
                          Ok, can you re-run bowtie with --max overM_reads.fastq and --un Unaligned_reads.fastq just to be sure that the reads which do not map are really not mapping on Mouse? If I remeber well, if you only use --un, all the reads which are excluded from the mapping will be in this file (which may explain why you have reads perfectly mapping on mouse in this file, if they map more than once...)

                          Comment

                          • priya
                            Member
                            • Apr 2013
                            • 57

                            #14
                            I found answer to my question.. changing the bowtie or bwa parameters improved my aligning percentages to 2-5% but still left with lot of unaligned reads..
                            This paper explains it,


                            I tried mapping my unaligned reads with Shrimp2 aligner with default settings, high percentage of unaligned reads got mapped.

                            Comment

                            • Richard Finney
                              Senior Member
                              • Feb 2009
                              • 701

                              #15
                              I'm not quite getting the paper ( http://www.nature.com/srep/2015/1503...srep08635.html )

                              When/where in the process are the reads getting modified so that they can't map?

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...