Hi,
I’m carrying some analysis using paired end human cancer data (50b reads / 200-500b gap) generated by GA II sequencer to find out fusion genes.
for this dataset
· I Align single reads using bowtie (-m1 --best --strata) to the hg19 reference by keeping only the best (unique) mapping for each read.
· Filter Poly T/A with length higher than 20
· Match pairs of reads based on their ID
· Remove duplicates
· Keep pairs belonging to different chromosomes
Iget the the attached contingency table reporting to which chromosome belongs each read.
What is observed from the tables is that the number of chromosomal translocations is higher than what is expected so further filtering should be done to get rid of artifacts. But I’m unable to understand what are the reasons behind having these artifacts.
Can you help me with understanding why there's a high number of artifacts ?
Thanks in advance.
Regards,
Ramzi
I’m carrying some analysis using paired end human cancer data (50b reads / 200-500b gap) generated by GA II sequencer to find out fusion genes.
for this dataset
· I Align single reads using bowtie (-m1 --best --strata) to the hg19 reference by keeping only the best (unique) mapping for each read.
· Filter Poly T/A with length higher than 20
· Match pairs of reads based on their ID
· Remove duplicates
· Keep pairs belonging to different chromosomes
Iget the the attached contingency table reporting to which chromosome belongs each read.
What is observed from the tables is that the number of chromosomal translocations is higher than what is expected so further filtering should be done to get rid of artifacts. But I’m unable to understand what are the reasons behind having these artifacts.
Can you help me with understanding why there's a high number of artifacts ?
Thanks in advance.
Regards,
Ramzi
Comment