Dear all,
I'm new to CLIP analysis, so I want to go through the CLIP data processing pipeline to get the knowledge how to process it and maybe in the future improve part of the pipeline.
I got the data from GEO:GSE41288. It's a HITS-CLIP dataset where the author want to revealing miR-155-dependent AGO protein binding sites. But when I tried to align the reads to the genome mm9, I found I can only map 10% of reads back to genome using bowite or tophat. The command I use is as followed.
tophat -p 8 --read-mismatches 5 --read-edit-dist 5 -o /output/MapResult/${name} /data/mm9/mm9 /data/miR155/FASTQ/${i}
bowtie -n 3 -e 150 -l 20 -p 8 /data/mm9/mm9 /data/miR155/FASTQ/${i} --un /output/BowtieResult_new/${name}/${name}.not_hit.fastq > /output/BowtieResult_new/${name}/${name}.hit.sam
I think I already set the threshold of mismatches quite high. Could someone give me some suggestions?
Thanks
Yue
I'm new to CLIP analysis, so I want to go through the CLIP data processing pipeline to get the knowledge how to process it and maybe in the future improve part of the pipeline.
I got the data from GEO:GSE41288. It's a HITS-CLIP dataset where the author want to revealing miR-155-dependent AGO protein binding sites. But when I tried to align the reads to the genome mm9, I found I can only map 10% of reads back to genome using bowite or tophat. The command I use is as followed.
tophat -p 8 --read-mismatches 5 --read-edit-dist 5 -o /output/MapResult/${name} /data/mm9/mm9 /data/miR155/FASTQ/${i}
bowtie -n 3 -e 150 -l 20 -p 8 /data/mm9/mm9 /data/miR155/FASTQ/${i} --un /output/BowtieResult_new/${name}/${name}.not_hit.fastq > /output/BowtieResult_new/${name}/${name}.hit.sam
I think I already set the threshold of mismatches quite high. Could someone give me some suggestions?
Thanks
Yue
Comment