Seqanswers Leaderboard Ad

**shawpa** · 01-19-2012, 04:14 AM

I don't know the answer to your question but I am interested in how you renamed your contigs. Right now I have a ref with chr1, chr2 etc and the dbSNP file is 1,2 etc. I am also a novice so help would be appreciated.

**HGENETIC** · 01-19-2012, 04:17 AM

Originally posted by shawpa View Post

I don't know the answer to your question but I am interested in how you renamed your contigs. Right now I have a ref with chr1, chr2 etc and the dbSNP file is 1,2 etc. I am also a novice so help would be appreciated.

I just used the UNIX command:

$ sed "s/chr//g" file_to.change > new.file

try that it worked for me.

**Heisman** · 01-19-2012, 05:45 AM

Why don't you just change the name back to having the "chr"?

**HGENETIC** · 01-19-2012, 06:02 AM

Originally posted by Heisman View Post

Why don't you just change the name back to having the "chr"?

I'm relatively new to UNIX and so for me the easiest way to overcome the problem was to make everything into an interger, how would you convert chromosomal intergers in a .vcf or .fasta file back into the chr1 format?

**shawpa** · 01-19-2012, 06:03 AM

Well I don't know how to get it to work either way. Working on Countcovariates step and I did what HGENETIC suggests (thanks by the way) and I stopped having the issue with my known sites file and reference. Now it is giving error because my bam input still has chr 1 chr2 etc. Tried the "fix" from above and it didn't seem to work on the bam file.

**Heisman** · 01-19-2012, 06:07 AM

Wait, I'm being dumb. Why did you remove the "chr" tags in the first place?

Anyways, if you wanted to go back, your headers in the fasta file are like ">1" and ">2", and nothing else is, correct? Then you can type sed 's/>/>chr/' input_file > output_file

**Heisman** · 01-19-2012, 06:08 AM

Originally posted by shawpa View Post

Well I don't know how to get it to work either way. Working on Countcovariates step and I did what HGENETIC suggests (thanks by the way) and I stopped having the issue with my known sites file and reference. Now it is giving error because my bam input still has chr 1 chr2 etc. Tried the "fix" from above and it didn't seem to work on the bam file.

Yeah, bam files are compressed so that wouldn't work.

There is no need to rename your reference sequence for this purpose.

**shawpa** · 01-19-2012, 06:24 AM

I removed the chr from the file because GATK gave me an error saying "known site and reference have incompatible contigs: No overlapping contigs found" So I took out the chr from my reference file to match the other. Now I run it and it says "Input files reads and reference have incompatible contigs: No overlapping contigs found." I think it is talking about my bam file and since I aligned my bam file with a reference that still had chr in it I am having an issue. Atleast I think this is what the error meant.

**Heisman** · 01-19-2012, 06:30 AM

That makes sens. I guess my question is, why don't you have "chr" in every file?

**HGENETIC** · 01-19-2012, 06:31 AM

Originally posted by Heisman View Post

Wait, I'm being dumb. Why did you remove the "chr" tags in the first place?

Anyways, if you wanted to go back, your headers in the fasta file are like ">1" and ">2", and nothing else is, correct? Then you can type sed 's/>/>chr/' input_file > output_file

Thanks for that I think that would work nicely, the reason I removed the chr tags was because i was trying to use the dbSNP135 known variant file which only had intergers whereas my fasta file had the chr tags - I think? To make things easier I'm just going to download and use the data from the GATK bundle as that should all be compatible.

**Heisman** · 01-19-2012, 06:34 AM

Yes, just use the stuff in their data bundle. There are a lot of errors in dbSNP 135 anyway. I emailed the NCBI about this awhile ago and to my knowledge they are still working on it.

**HGENETIC** · 01-19-2012, 06:37 AM

Originally posted by Heisman View Post

Yes, just use the stuff in their data bundle. There are a lot of errors in dbSNP 135 anyway. I emailed the NCBI about this awhile ago and to my knowledge they are still working on it.

Out of curiosity do you know the difference between the data in the GATK bundle for b37 and hg19, all the file names are the same except for this?

**shawpa** · 01-19-2012, 07:21 AM

Originally posted by HGENETIC View Post

Out of curiosity do you know the difference between the data in the GATK bundle for b37 and hg19, all the file names are the same except for this?

I am curious about this too. If I did alignment using hg19 but now I switch to b37 for the countcovariates step will everything be screwed up?

**Heisman** · 01-19-2012, 09:35 AM

This is one of those things I tend to ignore although I shouldn't. I think the vast majority of it is the same, but I could be completely wrong.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

GATK realignment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News