Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with ChipSeq mapping

    Hi,
    I have ChipSeq data, and I mapped the reads with bowtie aligner using mm10 as reference genome.
    Out of 16M reads processed, only 3M reads are uniquely mapped and 11M reads failed to align and 2M reads are not uniquely mapped.
    It looks soo strange that most of the reads are failed to align, I also tried to collect the unaligned sequences and run blast for several species on random 5000 sequences, the highest percentage of unaligned sequences were coming from Mouse.

    I couldnt find explaination, if there is not much species contamination in the chipseq samples, then why the reads are not aligning to the mouse genome.

    Also, choice of aligner instead of bowtie, if I use BWA will it make any difference..
    What could be the considerations for this problem??


    Here is the bowtie log

    # reads processed: 16561198
    # reads with at least one reported alignment: 3860809 (23.31%)
    # reads that failed to align: 10781729 (65.10%)
    # reads with alignments suppressed due to -m: 1918660 (11.59%)
    Reported 3860809 alignments to 1 output stream(s)

  • #2
    My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.

    Comment


    • #3
      Originally posted by Chipper View Post
      My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.
      I checked through FASTQC and there is nothing poping in "overrepresented sequences"
      Is there any other ways to check if there is any problems with adapter contamination..

      The read length is 43 bp..

      Comment


      • #4
        Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.

        Comment


        • #5
          Originally posted by Chipper View Post
          Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.
          Thank you for suggestions. I tried to look into those unaligned reads. For that, while running bowtie I add an extra parameter --un to save the unaligned reads.

          bowtie -t -p 1 -m 1 $index/mm10 -q -S sample.fastq --un unaligned.fq > sample.sam

          If I go through those unaligned reads fastq file, it doesnt contain N:s..

          For Example: If i just paste here only the second line (read sequence) from unaligned fastq file omitting the other 3 lines, they looks like below without N:s

          ATAAAACTGTATTTTTTTGTGAAGAATCAACAACAAGTGGGAC
          CCGGGCTTAGACAGCTCACATGAAAGGAAGGCCGTGCCACCTT
          CACTCGTTGTGAATCTATCCACCAAGTCAGATTTGAAAAATGC
          CAGTACAGACATAACTTATTAAAGCCTCTAGCAGGACAGCAAA
          CACTAAACAGGAACTGCAAACACAAATATGTTTGGCACACAAA
          TGTTTTAATTTTTTTTTTACAAATGTATCCATTATTATCGTTG
          GGGTAAGTTTGGCGCCGTGAGTGAAGGGGGCTTTGTTGCGGAA
          AATCTGTCTGTCCGTCTGTTCGTCTATCTGTCTGTCCGTCTGT
          GTGTGTGTGATGGGTCAGGTGTGTGTGTGTGTGTGTGTGATGC
          AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
          CCCCATTAGTTCCTGTCAAGGCAGAAGCTACTCTTCCTGGGGT
          ATTATGTCTTTGAGCAAGTTTATGTTTGAGTTAGTGAATTCAT
          TTTCTAAATTTTCCACCTTTTTCAGTTTTCCTCGCCATATTTC
          CATAATGTGATTTTGCCGTTGTTCTGTCTCTTTATTACATATC
          GACCAATGTTTAGTTTCTTAGAAATCAGATGCATGAAATAACC
          GCCGAACGACTCCTCTACCTCCTGCACCACTAACGCCCCCAAA

          Comment


          • #6
            Sanity check:
            reverse complement of

            AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
            is
            ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT

            matches perfectly to mm10:chr6:103,649,175-103,649,217

            via http://genome.ucsc.edu/cgi-bin/hgBlat

            BLAT Search Results

            ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
            ---------------------------------------------------------------------------------------------------
            browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43


            Other reads match mm10 also.

            Verify that your reference sequence is good.
            Check distribution of chromosomes in your output sam; are any missing?
            Last edited by Richard Finney; 04-24-2015, 08:41 AM.

            Comment


            • #7
              Reading the manual is also a good start:

              -m <int>
              Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

              Why would you use -m 1 with 43 bp reads?

              Comment


              • #8
                Originally posted by Richard Finney View Post
                Sanity check:
                reverse complement of

                AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
                is
                ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT

                matches perfectly to mm10:chr6:103,649,175-103,649,217

                via http://genome.ucsc.edu/cgi-bin/hgBlat

                BLAT Search Results

                ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
                ---------------------------------------------------------------------------------------------------
                browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43


                Other reads match mm10 also.

                Verify that your reference sequence is good.
                Check distribution of chromosomes in your output sam; are any missing?
                Hi Richard,
                Thank you ! I see that the most of reverse complement of reads matches to mm10. It looks little bit strange at this point?

                Also I checked my SAM header, it looks like all the chromosomes are included
                @SQ SN:chr1 LN:197195432
                @SQ SN:chr2 LN:181748087
                @SQ SN:chr3 LN:159599783
                @SQ SN:chr4 LN:155630120
                @SQ SN:chr5 LN:152537259
                @SQ SN:chr6 LN:149517037
                @SQ SN:chr7 LN:152524553
                @SQ SN:chr8 LN:131738871
                @SQ SN:chr9 LN:124076172
                @SQ SN:chr10 LN:129993255
                @SQ SN:chr11 LN:121843856
                @SQ SN:chr12 LN:121257530
                @SQ SN:chr13 LN:120284312
                @SQ SN:chr14 LN:125194864
                @SQ SN:chr15 LN:103494974
                @SQ SN:chr16 LN:98319150
                @SQ SN:chr17 LN:95272651
                @SQ SN:chr18 LN:90772031
                @SQ SN:chr19 LN:61342430
                @SQ SN:chrX LN:166650296
                @SQ SN:chrY LN:15902555
                @SQ SN:chrM LN:16299

                To cross-check: instead of using mm10 index built by me, I tried mm9 bowtie index available on Bowtie page , ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/

                But it doesnt make any change in the alignment percentage

                mm9- bowtie log
                # reads processed: 16561198
                # reads with at least one reported alignment: 3864950 (23.34%)
                # reads that failed to align: 10834676 (65.42%)
                # reads with alignments suppressed due to -m: 1861572 (11.24%)
                Last edited by priya; 05-06-2015, 05:56 AM.

                Comment


                • #9
                  Originally posted by Chipper View Post
                  Reading the manual is also a good start:

                  -m <int>
                  Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

                  Why would you use -m 1 with 43 bp reads?
                  In order to obtain uniquely mapped reads , discarding the multiple hits.
                  I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp
                  Last edited by priya; 04-27-2015, 01:42 AM.

                  Comment


                  • #10
                    Originally posted by priya View Post
                    In order to obtain uniquely mapped reads , discarding the multiple hits.
                    I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp
                    Just use BWA and filter on mapping quality (e.g -q 20) to get uniquely mapped reads if you don't understand the bowtie options.
                    -m 1 will suppress even perfect matches if there are any other reported alignments within the tolerated error rate.

                    Comment


                    • #11
                      Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...

                      Comment


                      • #12
                        Originally posted by SylvainL View Post
                        Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...
                        I cross-checked with human genome, but only 0.2 % of unaligned reads are aligned to human genome. I dont think its the contamination issue. I have done sanity check..i could not find huge contamination of other species in the sample..

                        Comment


                        • #13
                          Ok, can you re-run bowtie with --max overM_reads.fastq and --un Unaligned_reads.fastq just to be sure that the reads which do not map are really not mapping on Mouse? If I remeber well, if you only use --un, all the reads which are excluded from the mapping will be in this file (which may explain why you have reads perfectly mapping on mouse in this file, if they map more than once...)

                          Comment


                          • #14
                            I found answer to my question.. changing the bowtie or bwa parameters improved my aligning percentages to 2-5% but still left with lot of unaligned reads..
                            This paper explains it,


                            I tried mapping my unaligned reads with Shrimp2 aligner with default settings, high percentage of unaligned reads got mapped.

                            Comment


                            • #15
                              I'm not quite getting the paper ( http://www.nature.com/srep/2015/1503...srep08635.html )

                              When/where in the process are the reads getting modified so that they can't map?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              71 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              80 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X