Hi all,
We are in the process of optimizing our Me-Dip-seq protocol using TruSeq multiplexing on the Illumina HiSeq2000 and we have been puzzled by (or cursed with) a variety of problems.
I am on the bioinformatics side but to generalize we have tried two methods for library prep:
1 - shear the DNA (ourselves and outsourced), run a gel, cut out the area with fragments of appropriate size/elute, immuno-precipitation, TruSeq protocol, sequencing
2 - shear the DNA (ourselves and outsourced), TruSeq protocol, immuno-precipitation, sequencing
Method 1's problems led to Method 2, but Method 2 has rotten immunoprecipitation. We are planning on 100bp PE runs with probably 4 samples multiplexed in each lane, but in some of our trials we've sequenced 36bp SE on the GA IIx because we had a spare lane on a flow cell or something and we were just trying things out.
Method 1
After library prep there are always two distinct bands on the gels/agilent etc. The top band is about 20% as dense as the bottom band. We've sequenced the bands separately and together, both contain genomic DNA but the bottom band definitely seems to be the one we want. The top band sequences with low quality but it still seems to be primarily genomic DNA. Presently I am still trying to figure out why this is happening and what the differences between the two bands are. So Question 1 - has anyone else run across this multiple-band issue when you do the TruSeq library prep last?
On the second better lower band the FastQC reports almost always comes out strange in some way or another and there is mediocre (40-60%) alignment to the genome with the 100bp PE runs and better alignment (80-90%) with the 36bp SE runs. The biggest stand-out on the FastQC report is on the 100bp PE %GC distribution across the reads. Instead of being one peak like the theoretical distribution we end up with two peaks, one mostly coinciding with theoretical (top of the peak around 38-40 bases in) and a second equally large peak around 54-70 bases. Question 2 - What does this even mean?
Question 3 - with the same alignment stringency, would you expect to have such a difference in the number of matches in 100bp PE reads in comparison to 36bp SE? This is something I can pursue elsewhere, its just a question I just started pondering so I thought I would throw it in.
With method 1 we do see GC enrichment (according to the program MEDIPS) that corresponds roughly to what the authors of the program saw with their Me-DIP-seq data.
Method 2
In this case (adapters before antibody) we get beautiful sequence data with beautiful alignments, quality, etc. Great coverage, no double bands to pick from, everything looks fantastic and the process is easier in the lab BUT there is no GC enrichment. Its just a lot of pretty evenly distributed genomic DNA. Question 4 - has anyone else seen this? What might explain it? My working hypothesis is that the sequencing adapters have enough GC that the antibody is pulling everything down pretty equally? But others seem to have had success with this method.
The lack of enrichment has occurred every time with both the longer PE reads and the shorter SE reads. Explanations? Suggestions? HELP?!?!
Question 5 - BATMAN? MEDIPS? Are there any other Me-DIP-seq analyses methods someone might recommend for use once we get our library prep handled?
Thanks for ANY input or commentary you may be able to provide about any of these questions. I am still somewhat new to NGS data but learning.
We are in the process of optimizing our Me-Dip-seq protocol using TruSeq multiplexing on the Illumina HiSeq2000 and we have been puzzled by (or cursed with) a variety of problems.
I am on the bioinformatics side but to generalize we have tried two methods for library prep:
1 - shear the DNA (ourselves and outsourced), run a gel, cut out the area with fragments of appropriate size/elute, immuno-precipitation, TruSeq protocol, sequencing
2 - shear the DNA (ourselves and outsourced), TruSeq protocol, immuno-precipitation, sequencing
Method 1's problems led to Method 2, but Method 2 has rotten immunoprecipitation. We are planning on 100bp PE runs with probably 4 samples multiplexed in each lane, but in some of our trials we've sequenced 36bp SE on the GA IIx because we had a spare lane on a flow cell or something and we were just trying things out.
Method 1
After library prep there are always two distinct bands on the gels/agilent etc. The top band is about 20% as dense as the bottom band. We've sequenced the bands separately and together, both contain genomic DNA but the bottom band definitely seems to be the one we want. The top band sequences with low quality but it still seems to be primarily genomic DNA. Presently I am still trying to figure out why this is happening and what the differences between the two bands are. So Question 1 - has anyone else run across this multiple-band issue when you do the TruSeq library prep last?
On the second better lower band the FastQC reports almost always comes out strange in some way or another and there is mediocre (40-60%) alignment to the genome with the 100bp PE runs and better alignment (80-90%) with the 36bp SE runs. The biggest stand-out on the FastQC report is on the 100bp PE %GC distribution across the reads. Instead of being one peak like the theoretical distribution we end up with two peaks, one mostly coinciding with theoretical (top of the peak around 38-40 bases in) and a second equally large peak around 54-70 bases. Question 2 - What does this even mean?
Question 3 - with the same alignment stringency, would you expect to have such a difference in the number of matches in 100bp PE reads in comparison to 36bp SE? This is something I can pursue elsewhere, its just a question I just started pondering so I thought I would throw it in.
With method 1 we do see GC enrichment (according to the program MEDIPS) that corresponds roughly to what the authors of the program saw with their Me-DIP-seq data.
Method 2
In this case (adapters before antibody) we get beautiful sequence data with beautiful alignments, quality, etc. Great coverage, no double bands to pick from, everything looks fantastic and the process is easier in the lab BUT there is no GC enrichment. Its just a lot of pretty evenly distributed genomic DNA. Question 4 - has anyone else seen this? What might explain it? My working hypothesis is that the sequencing adapters have enough GC that the antibody is pulling everything down pretty equally? But others seem to have had success with this method.
The lack of enrichment has occurred every time with both the longer PE reads and the shorter SE reads. Explanations? Suggestions? HELP?!?!
Question 5 - BATMAN? MEDIPS? Are there any other Me-DIP-seq analyses methods someone might recommend for use once we get our library prep handled?
Thanks for ANY input or commentary you may be able to provide about any of these questions. I am still somewhat new to NGS data but learning.
Comment