Hi all
I report here a strange behavior I experience on both OSX10.6.8 and centos 5.7 with latest software installed (macport and yum respectively)
If I download archived versions of the human reference genome from the Broad ftp (bundle 1.2)
and run the command:
I get it all OK until the end and suddenly pages of binary garbage occur
I thought this had to do with corruption of the archive and I did the following:
expand the archive back to multifasta
reformat the fasta content using bioperl (in and out as fasta)
recompress with razip (samtools 1.18)
repeat the zgrep command
I get the garbage again!!
Is this normal and due to some specificity of razip?
## below a series of commands and results
I report here a strange behavior I experience on both OSX10.6.8 and centos 5.7 with latest software installed (macport and yum respectively)
If I download archived versions of the human reference genome from the Broad ftp (bundle 1.2)
and run the command:
Code:
zgrep ">" human_b36_both.fasta.gz
I thought this had to do with corruption of the archive and I did the following:
expand the archive back to multifasta
reformat the fasta content using bioperl (in and out as fasta)
recompress with razip (samtools 1.18)
repeat the zgrep command
I get the garbage again!!
Is this normal and due to some specificity of razip?
## below a series of commands and results
Code:
# sorry for the length $> razip -c human_b36_both.fa > human_b36_both.fa.gz $> zgrep ">" human_b36_both.fa.gz >1 >2 … >NT_113899 >NT_113965 >NT_113898 >NC_007605 ?f???,ixAp_?Òoё\G?C?Nm????R??[D?;?œ?)?pj5X??UL?`??mM?l%??ZºŐP?BI??W??d???HCoo??DS?ѷivfq(??X??U???w??? }?C"R???¿?.???????.\???,??7???bҳ*?k??F?b?l!??M??Ս??[???D?T?NfJ?8Ɉ?f?p??cGm ?<?:vRv?Hd?ղ???C.??߉?ye???N? U?4CY???w??<??v!?@?o?w;??İ?xHD?b+????|????e9???D? ??:??(fUY???m??jL??o?»B}??!;?X?c????` ?\Y4??)ß????<?/Þ?@@j!'?Y?B?2???"?$? I???9_?5??=묦???H1?l??Q??|?{??6[????G?a;:&??gw?<e?u2???R,]?P?%?Vd')Y?_K?ae -Z ʂ@??g?YvF?By??q?'??m?Z&? (~5?ʈ???????3??8{?W?j?? 7_?L??-??r?kԊRb??g?8?<?6??$???K??M?- ?$H?k?r??v%Jp˴lںxSJ ?q????Khg? db?>??b?q`E?RJ ?~lH?????%?m.???X?+??t?ߒ??%̽ @??ޫ?)`?it[?w??:?݃ݓY ?P3fg$j?????t?>??e?9?n?5?????y23?2WgT?f?*?=l??`ԊU?C??????P?TO ??~?4dg?mq&z3??ZJ?qP-??j??r?*??????20?;vRe*?B??LD] #(… many many such pages of trailing binary garbage) # while the tail of the fasta file is clean $> tail human_b36_both.fa ATGGGGGGCCGCGCATTCCTGGAAAAAGTGGAGGGGGCGTGGCCTTCCCCCGCGGCCCCC CAGCCCCCCCGCACAGAGCGGCGCTACGGCGGGCGGGCGGCGGGGGGTCGGGGTCCGCGG GCTCCGGGGGCTGCGGGCGGTGGATGGCGGCGGACGTTCCGGGGATCGGGGGGGTCGGGG GGCGCCGCGCGGGCGCAGCCATGCGTGACCGTGATGAGGGGGCAGGGTCGCAGGGGGTGT GTCTGGTGGGGGCGGGAGCGGGGGGCGGCGCGGGAGCCTGCACGCCGTTGGAGGGTAGAA TGACAGGGGGCGGGGACAGAGAGGCGGTCGCGCCCCCGGCCGCGCCAGCCAAGCCCCCAA GGGGGGCGGGGAGCGGGCAATGGAGCGTGACGAAGGGCCCCAGGGCTGACCCCGGCAAAC GTGACCCGGGGCTCCGGGGTGACCCAGCCAAGCGTGACCAAGGGGCCCGTGGGTGACACA GGCAACCCTGACAAAGGCCCCCCAGGAAAGACCCCCGGGGGGCATCGGGGGGGGTGTTGG CGGGGGCATGGGGGGGTCGGATTTCGCCCTTATTGCCCTGTTT # indexing the archive works fine $>samtools faidx human_b36_both.fa.gz $>cat human_b36_both.fa.gz.fai 1 247249719 3 60 61 2 242951149 251370554 60 61 ... NT_113898 1305230 3143346495 60 61 NC_007605 171823 3144673490 60 61 # extracting the last record also works and ends just like the tail above $>samtools faidx human_b36_both.fa.gz NC_007605 >NC_007605 AGAATTCGTCTTGCTCTATTCACCCTTACTTTTCTTCTTGCCCGTTCTCTTTCTTAGTAT GAATCCAGTATGCCTGCCTGTAATTGTTGCGCCCTACCTCTTTTGGCTGGCGGCTATTGC CGCCTCGTGTTTCACGGCCTCAGTTAGTACCGTTGTGACCGCCACCGGCTTGGCCCTCTC ACTTCTACTCTTGGCAGCAGTGGCCAGCTCATATGCCGCTGCACAAAGGAAACTGCTGAC … GGGGGGCGGGGAGCGGGCAATGGAGCGTGACGAAGGGCCCCAGGGCTGACCCCGGCAAAC GTGACCCGGGGCTCCGGGGTGACCCAGCCAAGCGTGACCAAGGGGCCCGTGGGTGACACA GGCAACCCTGACAAAGGCCCCCCAGGAAAGACCCCCGGGGGGCATCGGGGGGGGTGTTGG CGGGGGCATGGGGGGGTCGGATTTCGCCCTTATTGCCCTGTTT
Comment