Hi All
I am kind of new in bioinformatic so i hope you cold help me!
Recently i made a microdissection LASER and i cut the long arm of chromosome Z from a cell line, then i amplified the DNA and obtain the sequence from illumina hi seq. The files contain 9 gb the lenght of the reads are 100pb paired end and 30X (coverage)
I need to identify the genes present in this chromosome arms but this cell line is newly sequence so i have the 24000 predicted genes as reference.
I used CLC genomic Workbench to made the alignment with the reads and the predicted genes. The result was that are some reference genes that present an a amount of reads higher in comparison with other reference genes. The only think i now from my sample is that contain a lot of repeats, so i think is thats the problem.
Any suggestion about masking the reference or the reads?
please i hope somebody help me!
losts of thanks
IRE
I am kind of new in bioinformatic so i hope you cold help me!
Recently i made a microdissection LASER and i cut the long arm of chromosome Z from a cell line, then i amplified the DNA and obtain the sequence from illumina hi seq. The files contain 9 gb the lenght of the reads are 100pb paired end and 30X (coverage)
I need to identify the genes present in this chromosome arms but this cell line is newly sequence so i have the 24000 predicted genes as reference.
I used CLC genomic Workbench to made the alignment with the reads and the predicted genes. The result was that are some reference genes that present an a amount of reads higher in comparison with other reference genes. The only think i now from my sample is that contain a lot of repeats, so i think is thats the problem.
Any suggestion about masking the reference or the reads?
please i hope somebody help me!
losts of thanks
IRE