SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
repeating chipseq or chipseq with another antibody hawainpanda Sample Prep / Library Generation 2 03-12-2015 03:56 PM
Bowtie2 transcriptome mapping issues Kates106 RNA Sequencing 14 12-02-2014 04:57 PM
ChIPseq - how to qc the results ? memento Illumina/Solexa 0 03-06-2014 09:46 AM
Indexing ChIPseq libraries using Illumina's TruSeq and ChIPseq kits Alex Clop Epigenetics 6 11-08-2012 11:07 AM
ChipSeq library construction from tissue and DNA purity issues Theorbe27 Sample Prep / Library Generation 2 06-10-2011 05:59 AM

Reply
 
Thread Tools
Old 04-24-2015, 05:51 AM   #1
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default Issues with ChipSeq mapping

Hi,
I have ChipSeq data, and I mapped the reads with bowtie aligner using mm10 as reference genome.
Out of 16M reads processed, only 3M reads are uniquely mapped and 11M reads failed to align and 2M reads are not uniquely mapped.
It looks soo strange that most of the reads are failed to align, I also tried to collect the unaligned sequences and run blast for several species on random 5000 sequences, the highest percentage of unaligned sequences were coming from Mouse.

I couldnt find explaination, if there is not much species contamination in the chipseq samples, then why the reads are not aligning to the mouse genome.

Also, choice of aligner instead of bowtie, if I use BWA will it make any difference..
What could be the considerations for this problem??


Here is the bowtie log

# reads processed: 16561198
# reads with at least one reported alignment: 3860809 (23.31%)
# reads that failed to align: 10781729 (65.10%)
# reads with alignments suppressed due to -m: 1918660 (11.59%)
Reported 3860809 alignments to 1 output stream(s)
priya is offline   Reply With Quote
Old 04-24-2015, 06:22 AM   #2
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.
Chipper is offline   Reply With Quote
Old 04-24-2015, 06:31 AM   #3
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default

Quote:
Originally Posted by Chipper View Post
My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.
I checked through FASTQC and there is nothing poping in "overrepresented sequences"
Is there any other ways to check if there is any problems with adapter contamination..

The read length is 43 bp..
priya is offline   Reply With Quote
Old 04-24-2015, 06:41 AM   #4
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.
Chipper is offline   Reply With Quote
Old 04-24-2015, 07:21 AM   #5
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default

Quote:
Originally Posted by Chipper View Post
Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.
Thank you for suggestions. I tried to look into those unaligned reads. For that, while running bowtie I add an extra parameter --un to save the unaligned reads.

bowtie -t -p 1 -m 1 $index/mm10 -q -S sample.fastq --un unaligned.fq > sample.sam

If I go through those unaligned reads fastq file, it doesnt contain N:s..

For Example: If i just paste here only the second line (read sequence) from unaligned fastq file omitting the other 3 lines, they looks like below without N:s

ATAAAACTGTATTTTTTTGTGAAGAATCAACAACAAGTGGGAC
CCGGGCTTAGACAGCTCACATGAAAGGAAGGCCGTGCCACCTT
CACTCGTTGTGAATCTATCCACCAAGTCAGATTTGAAAAATGC
CAGTACAGACATAACTTATTAAAGCCTCTAGCAGGACAGCAAA
CACTAAACAGGAACTGCAAACACAAATATGTTTGGCACACAAA
TGTTTTAATTTTTTTTTTACAAATGTATCCATTATTATCGTTG
GGGTAAGTTTGGCGCCGTGAGTGAAGGGGGCTTTGTTGCGGAA
AATCTGTCTGTCCGTCTGTTCGTCTATCTGTCTGTCCGTCTGT
GTGTGTGTGATGGGTCAGGTGTGTGTGTGTGTGTGTGTGATGC
AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
CCCCATTAGTTCCTGTCAAGGCAGAAGCTACTCTTCCTGGGGT
ATTATGTCTTTGAGCAAGTTTATGTTTGAGTTAGTGAATTCAT
TTTCTAAATTTTCCACCTTTTTCAGTTTTCCTCGCCATATTTC
CATAATGTGATTTTGCCGTTGTTCTGTCTCTTTATTACATATC
GACCAATGTTTAGTTTCTTAGAAATCAGATGCATGAAATAACC
GCCGAACGACTCCTCTACCTCCTGCACCACTAACGCCCCCAAA
priya is offline   Reply With Quote
Old 04-24-2015, 08:36 AM   #6
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 701
Default

Sanity check:
reverse complement of

AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
is
ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT

matches perfectly to mm10:chr6:103,649,175-103,649,217

via http://genome.ucsc.edu/cgi-bin/hgBlat

BLAT Search Results

ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
---------------------------------------------------------------------------------------------------
browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43


Other reads match mm10 also.

Verify that your reference sequence is good.
Check distribution of chromosomes in your output sam; are any missing?

Last edited by Richard Finney; 04-24-2015 at 08:41 AM.
Richard Finney is offline   Reply With Quote
Old 04-24-2015, 11:20 AM   #7
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Reading the manual is also a good start:

-m <int>
Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

Why would you use -m 1 with 43 bp reads?
Chipper is offline   Reply With Quote
Old 04-27-2015, 01:25 AM   #8
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default

Quote:
Originally Posted by Richard Finney View Post
Sanity check:
reverse complement of

AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT
is
ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT

matches perfectly to mm10:chr6:103,649,175-103,649,217

via http://genome.ucsc.edu/cgi-bin/hgBlat

BLAT Search Results

ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN
---------------------------------------------------------------------------------------------------
browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43


Other reads match mm10 also.

Verify that your reference sequence is good.
Check distribution of chromosomes in your output sam; are any missing?
Hi Richard,
Thank you ! I see that the most of reverse complement of reads matches to mm10. It looks little bit strange at this point?

Also I checked my SAM header, it looks like all the chromosomes are included
@SQ SN:chr1 LN:197195432
@SQ SN:chr2 LN:181748087
@SQ SN:chr3 LN:159599783
@SQ SN:chr4 LN:155630120
@SQ SN:chr5 LN:152537259
@SQ SN:chr6 LN:149517037
@SQ SN:chr7 LN:152524553
@SQ SN:chr8 LN:131738871
@SQ SN:chr9 LN:124076172
@SQ SN:chr10 LN:129993255
@SQ SN:chr11 LN:121843856
@SQ SN:chr12 LN:121257530
@SQ SN:chr13 LN:120284312
@SQ SN:chr14 LN:125194864
@SQ SN:chr15 LN:103494974
@SQ SN:chr16 LN:98319150
@SQ SN:chr17 LN:95272651
@SQ SN:chr18 LN:90772031
@SQ SN:chr19 LN:61342430
@SQ SN:chrX LN:166650296
@SQ SN:chrY LN:15902555
@SQ SN:chrM LN:16299

To cross-check: instead of using mm10 index built by me, I tried mm9 bowtie index available on Bowtie page , ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/

But it doesnt make any change in the alignment percentage

mm9- bowtie log
# reads processed: 16561198
# reads with at least one reported alignment: 3864950 (23.34%)
# reads that failed to align: 10834676 (65.42%)
# reads with alignments suppressed due to -m: 1861572 (11.24%)

Last edited by priya; 05-06-2015 at 05:56 AM.
priya is offline   Reply With Quote
Old 04-27-2015, 01:30 AM   #9
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default

Quote:
Originally Posted by Chipper View Post
Reading the manual is also a good start:

-m <int>
Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).

Why would you use -m 1 with 43 bp reads?
In order to obtain uniquely mapped reads , discarding the multiple hits.
I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp

Last edited by priya; 04-27-2015 at 01:42 AM.
priya is offline   Reply With Quote
Old 04-27-2015, 02:04 AM   #10
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Quote:
Originally Posted by priya View Post
In order to obtain uniquely mapped reads , discarding the multiple hits.
I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp
Just use BWA and filter on mapping quality (e.g -q 20) to get uniquely mapped reads if you don't understand the bowtie options.
-m 1 will suppress even perfect matches if there are any other reported alignments within the tolerated error rate.
Chipper is offline   Reply With Quote
Old 04-27-2015, 02:45 AM   #11
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 175
Default

Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...
SylvainL is offline   Reply With Quote
Old 04-27-2015, 11:47 PM   #12
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default

Quote:
Originally Posted by SylvainL View Post
Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...
I cross-checked with human genome, but only 0.2 % of unaligned reads are aligned to human genome. I dont think its the contamination issue. I have done sanity check..i could not find huge contamination of other species in the sample..
priya is offline   Reply With Quote
Old 04-28-2015, 02:08 AM   #13
SylvainL
Senior Member
 
Location: Geneva

Join Date: Feb 2012
Posts: 175
Default

Ok, can you re-run bowtie with --max overM_reads.fastq and --un Unaligned_reads.fastq just to be sure that the reads which do not map are really not mapping on Mouse? If I remeber well, if you only use --un, all the reads which are excluded from the mapping will be in this file (which may explain why you have reads perfectly mapping on mouse in this file, if they map more than once...)
SylvainL is offline   Reply With Quote
Old 05-06-2015, 05:53 AM   #14
priya
Member
 
Location: sweden

Join Date: Apr 2013
Posts: 57
Default

I found answer to my question.. changing the bowtie or bwa parameters improved my aligning percentages to 2-5% but still left with lot of unaligned reads..
This paper explains it,
http://www.nature.com/srep/2015/1503...srep08635.html

I tried mapping my unaligned reads with Shrimp2 aligner with default settings, high percentage of unaligned reads got mapped.
priya is offline   Reply With Quote
Old 05-06-2015, 11:13 AM   #15
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 701
Default

I'm not quite getting the paper ( http://www.nature.com/srep/2015/1503...srep08635.html )

When/where in the process are the reads getting modified so that they can't map?
Richard Finney is offline   Reply With Quote
Old 05-07-2015, 01:41 PM   #16
Chipper
Senior Member
 
Location: Sweden

Join Date: Mar 2008
Posts: 324
Default

Quote:
Originally Posted by Richard Finney View Post
I'm not quite getting the paper ( http://www.nature.com/srep/2015/1503...srep08635.html )

When/where in the process are the reads getting modified so that they can't map?
They modified the settings to make sure lots of valid alignments are not reported, just like (bowtie -m 1 -v 3). Then they use a better aligner and find more usable reads. Some interesting things about contaminants and quite ambitious to reanalyze all datasets but why not use a better aligner in the first place.
Chipper is offline   Reply With Quote
Old 10-14-2015, 10:42 AM   #17
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Well, it's very arguable that Shrimp2 is "better" than BWA/Bowtie2 (whatever that means).

Thank you for the paper though, definitely interesting take on the problem.
apredeus is offline   Reply With Quote
Reply

Tags
bowtie, chipseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO