SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   CrossMap analysis (http://seqanswers.com/forums/showthread.php?t=73855)

clarissaboschi 01-26-2017 09:56 AM

CrossMap analysis
 
I am trying to convert some genome coordinates of SNPs in one vcf file using the CrossMap program (http://crossmap.sourceforge.net/).

I am having difficulties with the conversion. My vcf file does not have all chromosomes present in the chain file (for example chromosome random, etc), so the conversion is not performed.

Do I need to remove all chromosomes not present in my vcf file from the chain file? (and maybe from the fasta file as well?)

My error message was:
KeyError: "sequence 'chr10_NT_461738v1_random' not present"

This sequence is present only in the chain file, but I am not sure if I should edit the chain file.

Also, Is the chromosome format need to be the same in the 3 files (input, chain and fasta file): Chr1, chr1 or 1?

thanks

liguow 01-30-2017 06:16 AM

Quote:

Originally Posted by clarissaboschi (Post 203505)
I am trying to convert some genome coordinates of SNPs in one vcf file using the CrossMap program (http://crossmap.sourceforge.net/).

I am having difficulties with the conversion. My vcf file does not have all chromosomes present in the chain file (for example chromosome random, etc), so the conversion is not performed.

Do I need to remove all chromosomes not present in my vcf file from the chain file? (and maybe from the fasta file as well?)

My error message was:
KeyError: "sequence 'chr10_NT_461738v1_random' not present"

This sequence is present only in the chain file, but I am not sure if I should edit the chain file.

Also, Is the chromosome format need to be the same in the 3 files (input, chain and fasta file): Chr1, chr1 or 1?

thanks

The error message "sequence 'chr10_NT_461738v1_random' not present" was not issued by CrossMap itself, it could be issued by its dependent package like pysam.

My guess is "chr10_NT_461738v1_random" presents in your VCF file, but absent from your reference FASTA file.

clarissaboschi 01-30-2017 07:01 AM

Ok, thanks I will check it. I tried by using bed file format and it worked very well.

liguow 01-30-2017 10:54 AM

Quote:

Originally Posted by clarissaboschi (Post 203626)
Ok, thanks I will check it. I tried by using bed file format and it worked very well.

It further confirms my hypothesis. Pysam tried to retrieve the reference allele from FASTA file, it reported this error message when it failed to find
'chr10_NT_461738v1_random'.


All times are GMT -8. The time now is 10:14 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.