![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Illumina DNA sequence specific strand bias involving orphan reads | Tally | Bioinformatics | 1 | 06-21-2012 06:26 AM |
Had a question, Need some advice | break4minutes | Introductions | 2 | 06-07-2012 11:15 AM |
Sequencing advice | nkaushik | Bioinformatics | 2 | 05-31-2012 02:23 PM |
Filtering SOLiD reads before mapping?? Conflicting advice | hlwright | SOLiD | 5 | 06-27-2011 06:10 AM |
ChIP-Seq: Genome-wide binding of the orphan nuclear receptor TR4 suggests its general | Newsbot! | Literature Watch | 0 | 12-04-2010 03:01 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: madrid Join Date: Feb 2014
Posts: 2
|
![]()
Hi all,
I'm having some trouble with the analysis of my ChIP-seq data. From a ChIP-seq experiment of mouse pancreas, I get a reasonable number of reads that map to the mouse genome (good!), some map to the human genome (contamination), and around 35% don't map anywhere. To start with, the Fastqc analysis doesn't reveal overrepresented sequences (I thougt that adaptors might be contaminating but it doesn't seem to be the case) I've checked wether the orphan reads match to different microoranism genomes, no hits. I've also checked a database that contains adaptor sequences, no hits. When I blast a read against mouse/human, I get a perfect match for half of the sequence, but no matches for the rest of the read. If I blast the non-matching sequence against everything, I get a list of matches against different microorganisms. But they are always the same, so my guess is that these are conserved regions, not specific for a single microorg. I would appreciate any advice of what can I do to know what are these reads. Many thanks, Francesc Ps. I'm attaching some of the commented orphan reads in case you wished to check anything: CCACTGAAGGTGAATTTGTCTTTTACGAAGGTCCACCAAC CGACCACGGGAGCATCGTTCGCGTCCAGCGCGAAACGGCG CCAATTCCTTCCGCGCCTTGGCTGCGCTAATATCTCCCGT CAATAATTCTTGGCAATGGTTCAATCGTACTGGTCGAGCT TGATAAGAAATAATTGTAAGTAGCTAACAATATTCCAAGT GCATTCTCTCGCCGCGACTGTCCTCGATAGACACCAACTC GATGCTGGTCCACTCGCCGACGAGGATCTGATCGTGAGCG GTGTTATTTATTTACTCACATCGATAACAGTGATAAACTC CTCATCGACGGCGTGCGCGCGCTGCGGGCCCGGCAGATGG GGTACTCTCTCAGCAAGGAGAGATGAAGGAGGAAGAAGTT CCATCTTCATTTTCGATGAATGAGTATGCTTGGATTTCAA CTTTGCAAGGCGTCTGCCAATTGTTGGTTCGCCTCTTCGA CCAGGATTGAAAAGTTTGTCAAAAAGGCGGTTATTCAGGA ATTATTTAGTGGTTTTAACTAACGATTTCGTCTAGAAATG ATCTATATCGTCTTCACGCAGAAGGTGACCGATTGGCGCA CGCCGCTTCTATCGAAAGGAGCTCTAAGATGGTCAAATTG AGAAAAATGAAATGCGTTGCGTGGCTAAAAGCATATAACG |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Marburg, Germany Join Date: Oct 2009
Posts: 110
|
![]()
You can try an assembly and blasting the results.
Possible it's fish dna from the bead blocking. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Chicago Join Date: Jan 2014
Posts: 1
|
![]()
I am having a similar issue for ChIP-seq mouse data (HiSeq, SE, 50 bp). In particular, alignment statistics appear to be very antibody specific: for one protein I get ~20%, for a second ~40%, for a third ~65%, and for non-IP input ~90% (these approx %'s are borne out in two replicates for each sample).
Contamination does not seem to be an issue: fastqc did not show any adaptors left on the reads, and not much in the way of over-represented sequences. I used fastq_screen to check against human, rat, mouse, fly, yeast, c elegans, e coli, staph, and phiX, and the best matches were still to mouse, by far. Blast showed a similar mix of things, as fmadriles noted. At any rate, if it were contamination I would expect to see similar issues in all the samples, rather than depending so strongly on the antibody/protein of interest. Short of attempted assembly on the unmapped reads, which I may try, does anyone have any other suggestions about what the issue could be, or other things to try? Has anyone else seen this kind of thing in ChIP-seq data before? |
![]() |
![]() |
![]() |
#4 |
Member
Location: Columbia, Missouri Join Date: Apr 2008
Posts: 57
|
![]()
Possibly chimeric sequences from amplification.
|
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: madrid Join Date: Feb 2014
Posts: 2
|
![]()
For mmaiensc specially.
So at the end an expert has helped me and done some analyses, and finally concluded: I did a species screen with the original data and found (as you did) that most of the sequence comes from human and mouse (more mouse than human). Most of the reads map uniquely and there is a bit of overlap between mouse and rat (as you’d expect). There are however around 35% of reads which don’t map to any of the genomes or contaminants we screen for. I’ve extracted these to a new dataset and did an assembly with velvet. It’s not a great assembly since the reads are short, but it gave some extra information. I’ve included the set of contigs of at least 100bp and have sorted these both by coverage and length. All of the high coverage contigs appear to be human alpha satellite DNA or general AT rich repeats. For the long contigs, a bunch of these turned out to be rRNA from both mouse and human so a chunk of your extra sequence comes from these. In addition there is also a large set of sequences which come from a bacterial source. However you don’t appear to have the whole genome present, but more specifically you have a region of the genome around an integrase gene. This strongly suggests that either you have a high copy number transgene in your mouse, or it could be that this has contaminated one of the reagents in your library prep process. I think this is as far as I can justify taking this analysis. I’ve included the sequences and contigs I generated if you really want to pursue this, but I suspect the satellite sequence, the rRNA and the bacterial DNA should account for a significant chunk of the previously unknown sequence, and there really isn’t anything else I consistently found in there. If anything the contamination with really high levels of human sequence should probably be more of a concern in your case since this is certainly something we shouldn’t expect to see. I hope it is useful for other people as well as it has been for me!! |
![]() |
![]() |
![]() |
Thread Tools | |
|
|