Hi All,
I have a transgenic mouse whose genome I've sequenced and I'm interesting is answering a few questions concerning insertions, deletions, rearrangements, and transgene copy numbers, all of which I verified (except the copy numbers of course) via PCR before sequencing the genome.
Since I'm interested in structural information, I created a 550bp insert paired end read library and sequenced the transgenic mouse genome on the Illumina Hi-Seq.
I took those reads, and mapped them back to a reference using bowtie2 (and BWA for practice!). In summary; I downloaded the chromosome files from UCSC for the latest mus genome build, concatenated them, and I then performed the alignment. Great results all around. (yay)
Next, since there are at least two copies of the transgene in the genome, I went back and inserted two copies of the transgene sequence in to the appropriate chr#.fa file, concatenated it with the rest of the chr#.fa files, and performed the alignment using bowtie2 again. Everything seemed normal.
HOWEVER, (and this is the part I need help on...), when I went to visualize the alignment in a genome browser (after converting to .bam, creating indexes for everything, etc.), I ran in to a problem where my chromosome files were formatted so that each line was exactly 50 characters long.
I understand this is normal, so I wrote a program to reformat all of the lines in the chr#.fa file. That program can be found here:
https://github.com/pkstarstorm05/Ref...Formatter50.py
Longer story now short: In spite of my attempts, I keep getting errors from the genome browsers saying my genome for this specific chromosome does not have even sized lines. I've tried fixing it, but it still seems to not work.
Is there anything special we have to do to properly prepare a chromosome file for use as a reference? I'm working on a windows PC and I know there are differences with carriage returns vs. line breaks, etc. Would that cause a problem here? I've tried getting rid of the carriage returns...
Thanks in advance for any help.
Paul
(P.s. I'm new here... so I apologize if this has been posted already. I wasn't able to find it when I searched.)
I have a transgenic mouse whose genome I've sequenced and I'm interesting is answering a few questions concerning insertions, deletions, rearrangements, and transgene copy numbers, all of which I verified (except the copy numbers of course) via PCR before sequencing the genome.
Since I'm interested in structural information, I created a 550bp insert paired end read library and sequenced the transgenic mouse genome on the Illumina Hi-Seq.
I took those reads, and mapped them back to a reference using bowtie2 (and BWA for practice!). In summary; I downloaded the chromosome files from UCSC for the latest mus genome build, concatenated them, and I then performed the alignment. Great results all around. (yay)
Next, since there are at least two copies of the transgene in the genome, I went back and inserted two copies of the transgene sequence in to the appropriate chr#.fa file, concatenated it with the rest of the chr#.fa files, and performed the alignment using bowtie2 again. Everything seemed normal.
HOWEVER, (and this is the part I need help on...), when I went to visualize the alignment in a genome browser (after converting to .bam, creating indexes for everything, etc.), I ran in to a problem where my chromosome files were formatted so that each line was exactly 50 characters long.
I understand this is normal, so I wrote a program to reformat all of the lines in the chr#.fa file. That program can be found here:
https://github.com/pkstarstorm05/Ref...Formatter50.py
Longer story now short: In spite of my attempts, I keep getting errors from the genome browsers saying my genome for this specific chromosome does not have even sized lines. I've tried fixing it, but it still seems to not work.
Is there anything special we have to do to properly prepare a chromosome file for use as a reference? I'm working on a windows PC and I know there are differences with carriage returns vs. line breaks, etc. Would that cause a problem here? I've tried getting rid of the carriage returns...
Thanks in advance for any help.
Paul
(P.s. I'm new here... so I apologize if this has been posted already. I wasn't able to find it when I searched.)
Comment