![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
repeating chipseq or chipseq with another antibody | hawainpanda | Sample Prep / Library Generation | 2 | 03-12-2015 04:56 PM |
Bowtie2 transcriptome mapping issues | Kates106 | RNA Sequencing | 14 | 12-02-2014 05:57 PM |
ChIPseq - how to qc the results ? | memento | Illumina/Solexa | 0 | 03-06-2014 10:46 AM |
Indexing ChIPseq libraries using Illumina's TruSeq and ChIPseq kits | Alex Clop | Epigenetics | 6 | 11-08-2012 12:07 PM |
ChipSeq library construction from tissue and DNA purity issues | Theorbe27 | Sample Prep / Library Generation | 2 | 06-10-2011 06:59 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]()
Hi,
I have ChipSeq data, and I mapped the reads with bowtie aligner using mm10 as reference genome. Out of 16M reads processed, only 3M reads are uniquely mapped and 11M reads failed to align and 2M reads are not uniquely mapped. It looks soo strange that most of the reads are failed to align, I also tried to collect the unaligned sequences and run blast for several species on random 5000 sequences, the highest percentage of unaligned sequences were coming from Mouse. I couldnt find explaination, if there is not much species contamination in the chipseq samples, then why the reads are not aligning to the mouse genome. Also, choice of aligner instead of bowtie, if I use BWA will it make any difference.. What could be the considerations for this problem?? ![]() ![]() Here is the bowtie log # reads processed: 16561198 # reads with at least one reported alignment: 3860809 (23.31%) # reads that failed to align: 10781729 (65.10%) # reads with alignments suppressed due to -m: 1918660 (11.59%) Reported 3860809 alignments to 1 output stream(s) |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
My guess is you didn't trim your reads and are reading into the adaptor. Try bwa mem if you have 100 or 125 bp reads.
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]() Quote:
Is there any other ways to check if there is any problems with adapter contamination.. The read length is 43 bp.. |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
Adapter dimers should be found by FASTQC so the only things I can think of then is if you have N:s in the reads or very low quality ends.
|
![]() |
![]() |
![]() |
#5 | |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]() Quote:
bowtie -t -p 1 -m 1 $index/mm10 -q -S sample.fastq --un unaligned.fq > sample.sam If I go through those unaligned reads fastq file, it doesnt contain N:s.. For Example: If i just paste here only the second line (read sequence) from unaligned fastq file omitting the other 3 lines, they looks like below without N:s ATAAAACTGTATTTTTTTGTGAAGAATCAACAACAAGTGGGAC CCGGGCTTAGACAGCTCACATGAAAGGAAGGCCGTGCCACCTT CACTCGTTGTGAATCTATCCACCAAGTCAGATTTGAAAAATGC CAGTACAGACATAACTTATTAAAGCCTCTAGCAGGACAGCAAA CACTAAACAGGAACTGCAAACACAAATATGTTTGGCACACAAA TGTTTTAATTTTTTTTTTACAAATGTATCCATTATTATCGTTG GGGTAAGTTTGGCGCCGTGAGTGAAGGGGGCTTTGTTGCGGAA AATCTGTCTGTCCGTCTGTTCGTCTATCTGTCTGTCCGTCTGT GTGTGTGTGATGGGTCAGGTGTGTGTGTGTGTGTGTGTGATGC AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT CCCCATTAGTTCCTGTCAAGGCAGAAGCTACTCTTCCTGGGGT ATTATGTCTTTGAGCAAGTTTATGTTTGAGTTAGTGAATTCAT TTTCTAAATTTTCCACCTTTTTCAGTTTTCCTCGCCATATTTC CATAATGTGATTTTGCCGTTGTTCTGTCTCTTTATTACATATC GACCAATGTTTAGTTTCTTAGAAATCAGATGCATGAAATAACC GCCGAACGACTCCTCTACCTCCTGCACCACTAACGCCCCCAAA |
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Sanity check:
reverse complement of AGAACATATTAGATGAGTGAGTTACACTGAAAAACACATTCGT is ACGAATGTGTTTTTCAGTGTAACTCACTCATCTAATATGTTCT matches perfectly to mm10:chr6:103,649,175-103,649,217 via http://genome.ucsc.edu/cgi-bin/hgBlat BLAT Search Results ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO STRAND START END SPAN --------------------------------------------------------------------------------------------------- browser details x10 43 1 43 43 100.0% 6 - 103649175 103649217 43 Other reads match mm10 also. Verify that your reference sequence is good. Check distribution of chromosomes in your output sam; are any missing? Last edited by Richard Finney; 04-24-2015 at 09:41 AM. |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]()
Reading the manual is also a good start:
-m <int> Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details). Why would you use -m 1 with 43 bp reads? |
![]() |
![]() |
![]() |
#8 | |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]() Quote:
Thank you ! I see that the most of reverse complement of reads matches to mm10. It looks little bit strange at this point? Also I checked my SAM header, it looks like all the chromosomes are included @SQ SN:chr1 LN:197195432 @SQ SN:chr2 LN:181748087 @SQ SN:chr3 LN:159599783 @SQ SN:chr4 LN:155630120 @SQ SN:chr5 LN:152537259 @SQ SN:chr6 LN:149517037 @SQ SN:chr7 LN:152524553 @SQ SN:chr8 LN:131738871 @SQ SN:chr9 LN:124076172 @SQ SN:chr10 LN:129993255 @SQ SN:chr11 LN:121843856 @SQ SN:chr12 LN:121257530 @SQ SN:chr13 LN:120284312 @SQ SN:chr14 LN:125194864 @SQ SN:chr15 LN:103494974 @SQ SN:chr16 LN:98319150 @SQ SN:chr17 LN:95272651 @SQ SN:chr18 LN:90772031 @SQ SN:chr19 LN:61342430 @SQ SN:chrX LN:166650296 @SQ SN:chrY LN:15902555 @SQ SN:chrM LN:16299 To cross-check: instead of using mm10 index built by me, I tried mm9 bowtie index available on Bowtie page , ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/ But it doesnt make any change in the alignment percentage mm9- bowtie log # reads processed: 16561198 # reads with at least one reported alignment: 3864950 (23.34%) # reads that failed to align: 10834676 (65.42%) # reads with alignments suppressed due to -m: 1861572 (11.24%) Last edited by priya; 05-06-2015 at 06:56 AM. |
|
![]() |
![]() |
![]() |
#9 | |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]() Quote:
I am not sure whether I can introduce more mismatches (default chosen was :2 )as i have short read sequences of 40-43 bp Last edited by priya; 04-27-2015 at 02:42 AM. |
|
![]() |
![]() |
![]() |
#10 | |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]() Quote:
-m 1 will suppress even perfect matches if there are any other reported alignments within the tolerated error rate. |
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Geneva Join Date: Feb 2012
Posts: 179
|
![]()
Can you try mapping the unaligned reads on human? Once I got up to 92% human contamination in a ChIPseq library... After checking, the technician didn't use gloves when she took an aliquot to sent to sequencing...
|
![]() |
![]() |
![]() |
#12 |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]()
I cross-checked with human genome, but only 0.2 % of unaligned reads are aligned to human genome. I dont think its the contamination issue. I have done sanity check..i could not find huge contamination of other species in the sample..
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Geneva Join Date: Feb 2012
Posts: 179
|
![]()
Ok, can you re-run bowtie with --max overM_reads.fastq and --un Unaligned_reads.fastq just to be sure that the reads which do not map are really not mapping on Mouse? If I remeber well, if you only use --un, all the reads which are excluded from the mapping will be in this file (which may explain why you have reads perfectly mapping on mouse in this file, if they map more than once...)
|
![]() |
![]() |
![]() |
#14 |
Member
Location: sweden Join Date: Apr 2013
Posts: 57
|
![]()
I found answer to my question.. changing the bowtie or bwa parameters improved my aligning percentages to 2-5% but still left with lot of unaligned reads..
This paper explains it, http://www.nature.com/srep/2015/1503...srep08635.html I tried mapping my unaligned reads with Shrimp2 aligner with default settings, high percentage of unaligned reads got mapped. |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
I'm not quite getting the paper ( http://www.nature.com/srep/2015/1503...srep08635.html )
When/where in the process are the reads getting modified so that they can't map? |
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: Sweden Join Date: Mar 2008
Posts: 324
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#17 |
Senior Member
Location: Bioinformatics Institute, SPb Join Date: Jul 2012
Posts: 151
|
![]()
Well, it's very arguable that Shrimp2 is "better" than BWA/Bowtie2 (whatever that means).
Thank you for the paper though, definitely interesting take on the problem. |
![]() |
![]() |
![]() |
Tags |
bowtie, chipseq |
Thread Tools | |
|
|