Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTSeq Script from DEXSeq Reports Assertion Fail in SAM file

    I am evaluating DEXSeq which uses two Python scripts from HTSeq to prepare for analyses. The second script, dexseq_count.py reports the following error and aborts:


    Code:
    Traceback (most recent call last):
      File "dexseq_count.py", line 132, in <module>
        for af, ar in HTSeq.pair_SAM_alignments( HTSeq.SAM_Reader( sam_file ) ):
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.5.3p3-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 604, in pair_SAM_alignments
        for almnt in alignments:
      File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.5.3p3-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 543, in __iter__
        algnt = SAM_Alignment.from_SAM_line( line )
      File "_HTSeq.pyx", line 1228, in HTSeq._HTSeq.SAM_Alignment.from_SAM_line (src/_HTSeq.c:20889)
    ValueError: ("Sequence in SAM file contains '.', which is not supported.", 'line 24 of file /media/myLab/DEXSeq/30D_alignments.sam')
    The SAM file was generated by MapSplice without errors. After reviewing the SAM format specification, I believe the character '.' is a valid character in the SEQ field.

    The GFF file was created successfully from the current ENSMBL annotation file: Mus_musculus.NCBIM37.64.gtf using the first DEXSeq Python script (calling the respective HTSeq script), dexseq_prepare_annotation.py.

    Below is a sample of the first 100 lines of the SAM file.

    Is there a different character that should replace the '."?
    Can I use MapSplice generated SAMs with DEXSeq/HTSeq?
    If so, what do I need to do differently?

    Additional Info:

    -The SAM file generated by MapSplice was derived from a set of of 3 samples from the same experiment with paired end reads.

    -The actual command line was:

    Code:
    python dexseq_count.py --paired=yes /media/myLab/DEXSeq/mm9.gff /media/myLab/DEXSeq/30D_alignments.sam /media/myLab/DEXSeq/30D_counts.txt
    -The FASTQ files that were used by MapSplice to generate the alignment SAM were Illumina 1.5 converted to Sanger scoring.

    -R, Bioconductor, DEXSeq are all bleeding edge up to date.

    -Excerpt (First 100 lines) from MapSplice generated alignment file (.SAM):

    Code:
    @SQ	SN:chr1	LN:197195432
    @SQ	SN:chr10	LN:129993255
    @SQ	SN:chr11	LN:121843856
    @SQ	SN:chr12	LN:121257530
    @SQ	SN:chr13	LN:120284312
    @SQ	SN:chr14	LN:125194864
    @SQ	SN:chr15	LN:103494974
    @SQ	SN:chr16	LN:98319150
    @SQ	SN:chr17	LN:95272651
    @SQ	SN:chr18	LN:90772031
    @SQ	SN:chr19	LN:61342430
    @SQ	SN:chr2	LN:181748087
    @SQ	SN:chr3	LN:159599783
    @SQ	SN:chr4	LN:155630120
    @SQ	SN:chr5	LN:152537259
    @SQ	SN:chr6	LN:149517037
    @SQ	SN:chr7	LN:152524553
    @SQ	SN:chr8	LN:131738871
    @SQ	SN:chr9	LN:124076172
    @SQ	SN:chrM	LN:16299
    @SQ	SN:chrX	LN:166650296
    @SQ	SN:chrY	LN:15902555
    HWUSI-EAS1737:4:1:18952:1481#CGATGT/1	73	chr7	112705184	0	9M239N67M	*	0	0	GGAGACAAAGTGCATATAATTGGCCACNNNCCTCCAGGACATTGTCTTAAGAGCTGGAGCTGGAATTATTACAAAA7722224<10.48;147444DDD@G;2###42719DDBDFD0B@B7;92<8>?>9>55A#################	NM:i:3	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:4882:1493#CGATGT/1	73	chr19	5799800	0	75M	*	0	0	CTACGCTTTCGAGGGACCGGCAGAGGA..CAACCTTCCTTAGCTGCCCGCCTCAAAAAAAGAAAAAGGAAAAAAG	IIIIFFIIIIIIHIIIIIIBEIHBB95##812442DGGGBAEECDDEBDD>5?(=-583346=>>>4:@@:@<>1	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5723:1491#CGATGT/1	73	chr10	52997944	0	75M	*	0	0	CCTCGATTCATCAATGTCCCTCCGTAA..GGCACACATCTGAGAGCTTTTTAGTGAGCGTTTCTGGGCTGTGGTC	IIIIHHIHHIIIGIHIIIIIIIIHI=:##<;;579HHHIIIHIGIHGIHHHEHGGHGDIEGHEHGDEDGBBDEB@	NM:i:3	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5894:1491#CGATGT/1	73	chr3	138219226	0	75M	*	0	0	AAAAATGTAACACCATAATTACATTCC..ACTAGAATTAGTATGTCTGCCTTTGTATCTCTATGCTGTACTTTAA	IIIIDIIIIIIIIIEHIIIIIIIII>@##@8=<98HIIIIHIIIGIGIHGHHGHEHHFHGHBHEGGGEGFIGHGE	NM:i:2	IH:i:2	HI:i:1
    HWUSI-EAS1737:4:1:5894:1491#CGATGT/1	73	chrX	59039965	0	75M	*	0	0	AAAAATGTAACACCATAATTACATTCC..ACTAGAATTAGTATGTCTGCCTTTGTATCTCTATGCTGTACTTTAA	IIIIDIIIIIIIIIEHIIIIIIIII>@##@8=<98HIIIIHIIIGIGIHGHHGHEHHFHGHBHEGGGEGFIGHGE	NM:i:2	IH:i:2	HI:i:2
    HWUSI-EAS1737:4:1:9511:1489#CGATGT/1	73	chr1	84367214	0	75M	*	0	0	CTCGATAGTCTACTGAAAACTCTTGAG..GCTAACATTTTACATCTCTTAAGCTTTTTAATTTTCTTAAAAATAT	IIIIIIIGIIIHHIIIHHIIIIIIG7:##<;6>5;GIIIIIIIIHIIIIE@IFIIIIIFBHIIHHHIHD<FFIFF	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:12372:1486#CGATGT/1	73	chr14	103603743	0	15M636N61M	*	0	0	AACTTCTACAGCAACCTTGGGAATAGCNNCGGCTACTGGTCCTATGTAGGCTATAATAAGGGGAAGCAGCATGGAAIBIIIGIIIIIIHIIIIIIIIIIIG>@##;>;639EDEGDBDB<DEC>ED;@=A?B<BBBBBD;=4646349;=4>	NM:i:2	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:17262:1493#CGATGT/1	73	chr9	40778574	0	75M	*	0	0	CTCAAAATTGGCCAACGTATGCTTAATC.ACAATGATTAGCACCACTCTGAGGTAAGAGTGCTTCCCTCTGTATT	IIIIIIIIHGHIIIIIIFIHIIIIIEEB#E@B@BBHIGIIIIIHHHGHGHGGGDDGFGEDGEEEEEDEEDDDDDD	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:1237:1502#CGATGT/1	73	chr6	94554548	0	75M	*	0	0	TAAAGACACTTTATGCCATTTGTTAGAC.CTTCAATATTTTACATGTTTTCAATGTACACTGTACCAAAATTTCT	HGDHHHDHBHDHH?HEGGGGG@GGD=;@#5::;;8>GGBGD@GGGHDGHHFFHHHHGGEDGEBGGGDGEFBHGHH	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:3022:1499#CGATGT/1	73	chr19	60848919	74	24M467N52M	*	0	0	CTGATGCTGTTCTTCTTTCTCTTGCAGANTGTGAGCAGGTCTGATGACTTCAAGTGCTTTTGCAAGCACTGAGGACIIIIIIIIIIIIIIIIIIIIIIIIIDEB#DFBFEEIIIIIIIIGIIIGHIIGHIGFIIIIIIGGIGHIHHGEGGEG	NM:i:3	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:3121:1501#CGATGT/1	73	chr4	131480076	0	75M	*	0	0	GGGACATCAGAATTCTGGTCCAATTTCC.GACAGAGGCGGCAGCAGAGGGACAGAGTTCAGGTACCCCAGGTCTG	IIGDIHIIEIIGIIIIIIIIIEHIIFFF#EBDBB@IIIIIHFF885;;;;BB>B>A>A??=?-<7@@@#######	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:3258:1499#CGATGT/1	73	chr11	75578367	0	75M	*	0	0	AAACAAACAAACAGTTTTCAGAAGTTCT.AAGGCAAGAGTGAATTTCTGTGGATTTTACTGGTCCCAGCTTTAGG	EBD@D;GGGDDAGGGDHHHFHGBHHAAA#?55185EEEG@GBBGBHHHHHHBEHFEG@EEB3BBBB@EEDEEAGA	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:3659:1504#CGATGT/1	73	chr2	91780405	0	75M	*	0	0	CTGGTTGCCCCCACTCTTGGGGTTCACA.ACACCAGCAGGGGTTTCATGAGGGGGGATGGGGTGGGTCTGATGAT	BGGGGGDDBGEGGE>ECEEEGE@GD;77#:@>>5;GDGGGDAB8D<=B??A<C>A>'3?;CC#############	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:4236:1500#CGATGT/1	73	chr12	32026863	0	75M	*	0	0	AGCAATCAGTCCATGAAGTCAACACACA.ACAGACTCTTAGACATTTCTGACTGACAAAGGCCACCAAGCTAACA	GGEBGGGDGEHGHDHHHEHHHGHBHAA?#>>?>??BHHHHHGHFHHDHHGHHGHHGGGGGGEEED@BBBBBCEEE	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5859:1503#CGATGT/1	73	chr3	126548748	0	75M	*	0	0	TAGAATACCTTTTTTGTATATTAAATGA.CGTTATTGGGCATGCTGGTATATGGCTATGATCTCAGCAAGCTAGT	IIIIHHIIIIIIIIIIFIIFGGGGGBA>#??<?;@IIFHHIIH@HIIHIIHHGEEGGEHECBHE8FCBBBBCADA	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:15471:1504#CGATGT/1	73	chr2	29852376	0	75M	*	0	0	CAGTACTTTGCCGATGCCAATGAGGCTG.GTCCTGGATGCGGGAGAAGGAGCCCATTGTGGGCAGTACGGACTAT	IHIIIIIIIIIIIIIIIIGGIIHIIBBB#B?AAABIDIIHIIHBGEEEEDD>EE>BBB<BBBB@D@AABB>>@:@	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:15606:1502#CGATGT/1	73	chr4	155541534	0	75M	*	0	0	CTTTCCAATCCACAGCACCAGCCCCTGC.TGGCCTCAGTGCGTAAGCTCAGCTCAAAGTGGTTGCTCTGCCGCGC	IIIIIIIIIIIIIFIIIIIIHIIII??B#=;;<;7B=A:ABB@?@B@BC>AB?B7=;699@:@B?##########	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:16004:1499#CGATGT/1	73	chr3	89900904	77	75M	*	0	0	ATGAGTAGCTGTAGCCTATGAATACTAG.GTGTCTTAGATTTTCTTAAGATTGAAATTCATAGTACTGGACATCA	HIIIIIIIIIIIIIIIIIHIIIIIIDDB#B=B?AAGHGHFIIIIIIDIIIIIIIHHIHIIIIIGIHIHIEGGHIG	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:1415:1509#CGATGT/1	73	chr5	131915720	0	75M	*	0	0	ATGAGGCTCCCTGTCCCGGAACGATCGG.CGGCTTCATACAACCGGGGTGTGGAGAGCCGGCGCAGGGGGTCATT	-;5E?D@EGDGD4BGGGD=GE?DB=@;?#B===;>E8DGE@BBE-DDB??DGG8GDBGD@?##############	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:3922:1507#CGATGT/1	73	chr13	9667471	76	75M	*	0	0	ATTTAGATGTCACTCAGAACCTACTCTT.ACTTTTCACCATATCAACTTCACACCCAAGGAGGGCGAAGGCTGCA	IIIIIIIIIIIIIIIIIIIIIIGIIEFF#EEFEFEIIHIIIDIIGDIIIHHIGIHIEEFFEEDFHGBEDEFDEED	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5557:1511#CGATGT/1	73	chr14	61582893	0	75M	*	0	0	CTTTAGAGTAGAAAATCCTGCGATTCAA.ATATATTCCATATATTCAAAATCGAAAACACATGACTTGACCTGAC	HFHDHGGGDGG>GGGGGDGBGG>GE466#747417GGEDG<FAEF?<?B@DEEGEFGEFAA=9=<AA9A9BDBBB	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:6314:1506#CGATGT/1	73	chr8	82207817	0	75M	*	0	0	TTTGTATCCAACAAATGAGTATGCACTG.AGCAATGGACACTTTGTATTGCTATTTCCCTGCACTCACTGGAGAA	EDEGGGBGE<GGGG@FGGGEG@GGG@;9#;;>>;;EGBEDEEGEEDEEFCG@DDDHHDFEEDEFBEBBE@A:A58	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:6950:1517#CGATGT/2	163	chr13	91832651	0	14M959N62M	=	91833672	1021	ACAGTATTTCCAAGAATTCTCCCAATAATATGAGCCTGAGTANNCAGCNGGGCACTCCAAGGGATGACGGTGAAAT	GGGDGEGDGGIIIIIGIIIGIIIIIIIIIIEGIIIIEIIH8@##=66<#7=====HIGGDEAE@DCBEB@=@@>@#	NM:i:3	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:1091:1527#CGATGT/1	99	chr7	25765099	0	61M410N15M	=	25765101	2	CGATACTCACAAAGAAGGCTGTGTGGCATGTGAACTCTACCACCTTCCTCTGCTCATAAGTCCACTGCTGCCCATA	GGG?GGGGGGEG@BEDDGE3EBD?EEDEBEG>GGGGGG>GGBGG>>EC?CDDB@G?E?>?AAA3A050.5BDBB@6	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:1556:1525#CGATGT/1	99	chr5	34090921	0	30M440N46M	=	34090941	20	CTCGGCCACTTTCACCTGGGCCTCCTTTGCCACAATTTCTGGAAGGGTCTGCAAGGTGGACTTCAGCTGGTCAGCA	HHHGHHHHHHHHHHHD>GGEGG>GG<CCEGGFG<BAGG>GD@BDDCB@>CGDD8DB*AA2AA?A?HEEEH>BBDB#	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:1790:1525#CGATGT/1	99	chr10	122562923	0	61M1750N15M	=	122564745	1822	CTCACAGCAACGTAGGTGTCCATCTGTTTCCTCAGTTTCAGTAGACATTTTGACATACCGGCACATTCTCAAGAGT	HHHHHHHHHFHHHHHHHHHHHBHFHGHHHHHHHGHHHHHBHGHGDHDHHDGEHFHHFHHGGG@GEGEHGH<BBBB>	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:2483:1519#CGATGT/1	73	chr8	73985747	0	75M	*	0	0	CTTGACCTCCACGAATGTGTAGCCGTCACTGGACTCGACACTCTCCAGTGGCCGCACGGCGGTAGGGGGGGGCCC	HHGHGHHHHHDGBGGGGGGFF@FFE:B=<*:<<<:GE@E<C@C>C?#############################	NM:i:3	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:3433:1524#CGATGT/2	163	chr5	92889805	0	75M	=	92889840	35	GAGACGTCAGGTTTGAAAATATGGACGAGGGACAAGATCTCAT.TTTGTAGCCCCAGAGCAGTTCGTGCACGGTG	IIIGIIIIGIIIIIIHHIIIIIIIIIIHIIIHDIIDHHII@?@#?9:76:EIIHGCBEE@GEDDEB?CB;C@@:@	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:3948:1527#CGATGT/1	99	chr13	105738802	83	75M	=	105738833	31	GCAAGGAGCAGAGGGAGGGAGCAGAACTTTATCTGGTGGACCCACACCTAAGGAAGTCCTTTGGCAGGTTAGAGG	IIIIIIIIIIIIIIIIIGFHIIIIHIHIIHIIHIGIBIGEHIHHHGHGGEDGGEDEDDDDBEAAD?DB>B=?5??	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:4513:1518#CGATGT/2	163	chr10	79441613	71	75M	=	79441652	39	CTTTGTTCTCCTCGTTCTCTGCCAGACCCTTTACACTGGACA..AGGATGTCATCATAGCCCAGGAACTCATCCA	IIIIIIIIIIIIIIIIHIIIIIIIHIIIIIIIGIGGHIHI;9##:;:9;:DGDGGHGIGIGHHFBBFFEEEEEGG	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:4776:1523#CGATGT/1	99	chr7	52636708	87	75M	=	52636703	-5	TCTGTGTGTTTTTATAGGGGAGGTCAGTGTGAGGAAAGAGACATCAGAACAGAAATGAACAGGAGTGGAAAGATC	IIIIIIIIIIIIIFIIIIIIIIIHIIIHIHIHIIIIIIIIHIHIIHIGHHHIGHHIHIGGGHGEHDHGEGBDEED	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5212:1521#CGATGT/1	99	chr8	86519569	0	75M	=	86519615	46	CTTAGCTCCAGCCATACCCAATCTTGCTGGTGTATCCAGGGGCAGGGTACGGAAAGAGGGCCCCAAATTCAGCCT	IIGIIIIIIHIIIIIIIIIIIIIIIIEIIIHIHIIIIHIICHHIIBE0ECBD?@B@B@BCCBBBBB77B>?9;A=	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5599:1524#CGATGT/2	163	chr5	116858802	0	75M	=	116858799	-3	GGGTCATGCTGCAAAAAAAAGGCACTGTGTTGTGTGGTGAAGG.TAAATCCCATATGGGGATATAAGACCTGAGA	IIIGIIIIIIIIIIIIIIIIIIEIIIEDHIGEEGBGFCEE@BB#?AA?==IHGIEHGDHFCBEFFE<EBBEDDGE	NM:i:4	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:5976:1529#CGATGT/2	163	chr16	94587154	91	54M4N22M	=	94587179	25	CACACTTTAACAATCTAGAGTATTATTAAAGTAGCTGACGTATGCATGTAACAGTGCTATAAGGGGCCAATGAACA	IIIIIIIIGIIIIIIIIIGIIIIIIIIIIIIIIIIIIIHIIHIIIHIIIHHIIIGIIIIIIHIIHHHIIIIHIEGH	NM:i:1	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:7726:1524#CGATGT/2	163	chr19	60940389	0	75M	=	60940440	51	CGCCACCTTTAGAGAGCCTGAACCTGTCTTAAAATTACAAAGC.TGTGGAGCAGTACTTATAAACACAAACATAA	GGGDGIIGID1;5:6CBCCE<CAECIHFHHIIHEGFEHIB77:#::67:3>A?A8@?>BADBBDBE?DDCDCB<B	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:9029:1527#CGATGT/1	73	chr1	75254611	0	75M	*	0	0	CCCATTCTCTGGCAACCTTGGTGCCCGGGCTCTGCCAGGCACTGGCAACTCTTGGGCCATGTAGAGCAGCCCTGA	HHHHHHHHHHHHHHHHHGHEGDGGGECCAEEF<@FEBB@EEEBEHEBBB<BB:B@B>>9BBAABA=:@9@#####	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:10540:1520#CGATGT/1	99	chr10	56108908	88	75M	=	56108983	75	CAGGAATCCAAGTCCTCAACATGGCATTTCCTTTATGAAAAGACAGGTTGTCCTACATCCCCGCTAAAAAACATT	IIIIIIIIIIIIIIIIIHIIGIIIIHIIIIIIIIGIIHIIHIIIIIIHHIGIIIGIIIIIIHHEFGCECGAFHCD	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:10881:1528#CGATGT/2	169	chr8	73891263	0	75M	*	0	0	GAGAAGGTGAGGGATTCCCACGCCGCCGCAAAGGCCTGCGAACCAACTCCACAGGCCACTGTCTCCGCTCCTTCT	IIIHHIIDIIIIIIIIIIIIIIIHHHIDIIIHHIIHGGIHCHEIGGGEGHGEEDGGEDD@?@BBB>@B<B?>??>	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:12453:1525#CGATGT/2	163	chr4	119299082	0	75M	=	119299128	46	GAAAACAAAGCTATGTTACCAGTGCTATCTAAATTCCTTTCGA.GACAATCAGAGATTCTAGCTGGGTAACTTAC	IIIHIHIIHIIHGIIIIHIIIIIHIIHIIIIGGIIIIIIIECE#C@BB??IHIHHGGIIHGGIIIGGDGGGGIGG	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:13553:1517#CGATGT/1	99	chr10	72137558	90	75M	=	72137591	33	AGAATGTGAATCCATGCCTTTGTGACTGTGACCTCACTCTCTTCAGTATTGTGTAAAAGTTTTAGGAACTCTCCA	IIIIIIIIIIIIIIIIIIIIIFIIGIIIIFIIIGIIIIIIIIIIEIEGIIGHHIIIDHIEIIIHIBIIIIHHGHC	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:13922:1521#CGATGT/2	163	chr3	107820326	0	19M313N57M	=	107820342	16	TGCGGATCGGGTGTGTCAGTCCGCGGACGTTCCAGTATCCCAGNATCATAGGCATGGTGCTGGTGCTGTGGTCTTC	8@=;:=BCB>DBD@B=@2A7EDFFD=*?;6;;;,=:==?B8*:#375862DGG8D3?9?#################	NM:i:1	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:14935:1519#CGATGT/1	73	chr11	68905385	0	75M	*	0	0	ATTACCCCCATCACTGCCCAGCTCCTCTGACTGCCCCCCTGTATTCAGGGTGGGGGTACTAGTCACTGCCCAGAT	HHHHHHHHHHHHHHGCGGGDDGDGGEGEECHEGHHHEEC6=69===.???@BAAA####################	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:15627:1524#CGATGT/2	163	chrX	35955216	0	43M494N33M	=	35955767	551	ATCAAATCATGCCTAAGAAAGACCCTGTGAAAATTGTCCGATGNCATGAACACATAGAAATCCTTACAGTAAATGG	==<;=6>1:?BB?BBBDBD74(471@@7;1484:4;48;;367#9*13'2ECAA<<A=AA>DDEEHEDHB8><@@<	NM:i:1	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:2156:1541#CGATGT/2	163	chr12	81439545	0	75M	=	81439576	31	TTCCAATTCAATTCGGTGCCTTTTGAGGCCGCTTTGGCCATGAACAACCTTGCCGCTGTCCAGGCCTGAAGGGTC	GGGGDBGEGEIIIIIGIHIIIHIIIBIGIHGIFHIIGIIEGEFEIBGIIFFHH8HD@BBDF@BEF@?########	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:7610:1540#CGATGT/1	99	chr2	174167183	0	26M102N50M	=	174167353	170	AAAACAACCTGAAGGAGGCCATTGAAACCATTGTGGCCGCCATGAGCAACCTGGTGCCCCCTGTGGAGCTGGCCAA	EIGIHIIIIIHIIHIHGIIIIIIHIHIIIIIIIGHIIIIIIIFIBEGEEEFED@BEFIEHD;B=@B8BB@BBBB##	NM:i:0	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:8054:1530#CGATGT/2	163	chr8	32936775	0	50M4852N24M6441N2M	=	32941681	4906	GATACAGATGCCAGTAATTGTCAGTACCCTCTTCTGGTAGAGCTCCTCCGCCATAAATTCAATCCCAAGATGCTTG	DIHEHGDGBDDD=D=FDFFEGDEGGGGEGEGBEBGGBGD@B?EBB?2?A>EGGDGBGGBEHIHHEAD<AD8BBDBE	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:8441:1537#CGATGT/2	163	chr9	108849028	0	71M282N5M	=	108849029	1	GCAGTGTCTCCCGAGTGTATGAAGAAGATGCCGTGCCTGGTCTCACTCCATGTCGCTTCACTGGCAGTGAGATCCG	IIIIIIIIIIIIIDIEIIIIIIIIIIGIIIHIIHHIIIIIGIHIHIGHIGGIGHIHEGGGGGEGGDG@BB@=@@B@	NM:i:0	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:10034:1535#CGATGT/2	163	chrM	7121	92	75M	=	7139	18	TTAGTCCTCTATATCATCTCGCTAATATTAACAACAAAACTAACACATACAAGCACAATAGATGCACAAGAAGTT	IIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIDIIIIHIIIIIIIIIIIIIIIIIIHIHHIIIHICGHGHGG	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:11500:1534#CGATGT/2	163	chr5	115553404	0	75M	=	115553461	57	GCTTGGCAGGGTAGCCACACTTGCCACAAGTCGACTTCTGAAGGTGGTAGGCCTTGGAGCCACAGCGGCGGGCCA	GIIIIIIBHH8AFEFGGGDGGGEEI?ECCEBG>DDG>B8DECGEBDGB@DE>BEEDDBBDEDEEEBCBE@#####	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:12574:1533#CGATGT/2	163	chr6	71339164	0	75M	=	71339175	11	CACGCTGCAGTCATGAGTGAAGCTCGGCCAACAGCGGTCGCTTGCTAGCTTCAAGGGTTATGATGAATTCCCACA	IIIIIIIHIIIIGDGGEEGGIHIIIIBIIIIIHDIIEBHHGCEEEDC@EBDDBB@ED=BBEFFEEB<CC@B@;@B	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:12609:1538#CGATGT/1	99	chr11	69146242	0	75M	=	69146249	7	AGGGGATAAAGTTGGGGAAGACAAATGTTGACAAAGGCAGAAAGAAATGTACTGGCTGTGTCGCAGAAATGGCAA	:AACEECD?EBEEEDGGEGE?G@GDDECBEGGE>GDDEBEG@GDE?:@E>GGBGGGGEDD.9;53?98<?4:847	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:12669:1533#CGATGT/2	163	chr2	92794032	81	75M	=	92794033	1	GGCACTCAGAAGACTGTATCATGCTCATCTGTGCTATGGAAAAAGATGTGTGCCAGGTACCCTCAGTCGGCAGCC	IIIIIHIIHIIIHIGIIIIBIIIIIIGIIHGIIIIGHIIIIIIHHFHIEHEIHIIIGDHGGGGHGGBGGDGEEED	NM:i:2	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:13574:1531#CGATGT/2	163	chr3	103125478	0	75M	=	103125517	39	TTAGAAGAGGCTTTTCAAAATCAGAAAGGTGCAATTGAAAATCTGTTGGCTAAGCTTCTTGAAAAGAAGAATTAT	GIEIBIIIFHIGIIIIII@EIIIFBEDGGAIIDIIIHIIIIIIIHHIGIGIIBIIFHIIIIBFHFHFCHBEHFDI	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:17345:1538#CGATGT/1	99	chr3	146359342	70	75M	=	146359449	107	GCTAAAGAAAAGGGGAAATTTGAAGATATGGCAAAGGCTGACAAGGCTCGTTATGAAAGAGAAATGAAAACCTAC	HGGHHHHHHFDGGGGHHHGGGGEEGBEGGGHHGHHHHHGHHHHFHHGGGHDHDCGEDGGBGDEEEBBEBEBA7EB	NM:i:4	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:18333:1532#CGATGT/1	99	chr12	80278009	84	75M	=	80278025	16	CGAGCCACAGAGAGGTGTTCTCAGAATCCTCCTCTACACTGCCATAACTCTGCAGGACAGGCATCCAAACTCTGT	IIIIIIIIHIIHGIIHIHIIIIIIIIIIIHIIGIIBIHIEGIIIIHHIHIFHIHHECHHE>EDDEGGCCCBDCBB	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:18384:1537#CGATGT/2	163	chr5	96695216	0	75M	=	96695240	24	GGAAAATGAGTAACACTTCATCGCTTTATTGTGGCAGTCCCTCAATTGCTAGCCATTTAAATTGTTTCAGGATCG	GEGGGEGEEGIIGGIHIIFHIGB@IGG@GDAGGDGEGEDGD2DAB@A3<C;?3A8???8?AAD@BBD<ECB@B<8	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:19209:1531#CGATGT/2	163	chr9	96233694	0	75M	=	96233731	37	CGCATGTTTGGGGTCGGGAGCTCACTGGTCCCTGTCTGGCGACCGTACATGGGTTCAAGGGCAGCAGAGGCAGCA	IHIHHIIGIIIIIHHIIIBHFIFFEIIDEIIFHHEFBFDFEEEBE>BB@BGEEDDE=BB=BB;BBAA7>B@>@/7	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:19561:1532#CGATGT/2	163	chr17	57220088	0	75M	=	57220162	74	CAGGGACAGAGTGATGGCAGGAGACTTAAGCCACCTCCTCTTCAGCCTCCTCTTCGAACTCGCCCTCTTCAGCCG	IIIIIIIIIGIFIIIIIIHIG<IEFHHDHHIHDHGBFE@EBBB?@EDDGE@E@BBB@AABB@ABB6<1?######	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:19582:1532#CGATGT/1	73	chr16	84963084	0	30M2801N46M	*	0	0	GTTCGAACCCACATCTTCAGCAAAGAACACCAGTTTTTGATGGCGGACTTCAAATCCTGAATCATGTCCGAATTCTG>=GGGG?EG@A>CA=;?B=?ABA;??<2:;56:5774;;AA3>C?8?;<::067GG@BAD@AD############	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:2746:1547#CGATGT/1	99	chr1	179379007	83	69M122946N7M	=	179379008	1	CACAAGTAGAGAGCTGTCTTTCTTTTGAGGGTTAGGCCAAGTTAGCCAATAAAACCTTCTCAACAAATGGTTCTGT	IHIIIIIIIFIEIIIIIHIIIHIIIIFIIIIHIHIIIHIIIIIHIIHHHIHCHHCIGGIGICGECB?ABBDDEGGE	NM:i:2	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:3889:1551#CGATGT/1	99	chr3	96594388	0	75M	=	96594478	90	CCTGGCCCCATTAGCAAGGTAAGACTAATTCAGGGGTCTGAGACTCTAACGCTGTTCTGTCTTGATGATCAGAGA	IIIIIIIIIIIIIIIHIIIGIIIIIIIIIIIIIIIIFIIIEIHHHHHBHHIHGGGGGHHEGHGCBGGEED@EBE@	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:4589:1542#CGATGT/2	163	chr1	155756395	92	75M	=	155756402	7	CAAGTTTAGATTTTAATCAGATTTGTAGGGTTTCTAACTTTACAGAATTGCCTGTTTGTTTCAATGTCTCCCTCC	IIGIIIIIIHIIIIIGIIIIIIIIIIIIIIIIIIIIIGIIIIHIIHIIIIIIIIIIIIIIIHGDIIIIIHIHHHH	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:6113:1542#CGATGT/2	163	chr2	125083095	0	75M	=	125083106	11	CCTGGATGACACCGAGAGAGGCTCAGGAGGCTTCGGCTCCACCGGGAAGAATTAGAACTTTGCTGGAAGTATCTC	IIIIIIIHIIIIIDIIIIIIIII>FIBDFIIHDFHFF@FF@BDBD=-:<:BBB@B9B@BBDDD@BB<BA>AA@AA	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:6640:1541#CGATGT/1	99	chr7	53682929	84	75M	=	53682934	5	ATGCGTGTCGTCTTCTGCCCCAACAAGGTGGAATTCATCAAGAACTCCCTCAATATCATTGACTTTGTGGCCATT	HHHHHHHHHHHHHHHHGHHHHHHHHHHHBHHHHHHHHHHHHDHHHHHHHGHGGGGGHHGEDEGFEGBDAEDEEEE	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:7275:1547#CGATGT/2	163	chr1	24620256	0	71M56614N5M	=	24620258	2	TGATGTTTCAAAGTATTCTGAAGCTTGGAGGATGGTGAAGTAAAGTCCTAGTATAATGGTAATTAGTAGGGCTTGA	IIIFIIIHFIIIIHIBGGGGIIIIIIGGEIGBDGG<EACCEEGEEEEHHHBEEHEDEFE@@BDDBE>CEEDFDE>3	NM:i:3	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:7426:1541#CGATGT/2	163	chr18	73797062	0	75M	=	73797118	56	AAAAATTTTTAAAAATCTGAAGAGTGACAATTAGCCAGTTACAAGCAAGGCCAGGCGCAAACACATTTGGCCACC	D=@EGGG@@G79::1===;52717/2+225C>A8C=;4@7>??################################	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:7784:1548#CGATGT/1	99	chr11	30038481	80	21M90N55M	=	30038625	144	TCTGGTTCTAGGCTCTCAAATCTGTGCTGGATGACTTCCAGGTCCTCCAGCTTCTCTGGGATCTGCATGTTGTTGA	GGGGGGHGGHHHFHHHDHHFHHEHDE<8AA=>???BBGDBAAACA8?==3A=9AA?####################	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:7882:1541#CGATGT/1	99	chr9	122986012	0	75M	=	122986046	34	GCAGAACGTCCCACCGACTTCTGAAAAACCACAGGGCTACATTAACTTTCTTCCTACCCACAACTAGAATGAAAA	?GGGG@GGDGGD>EA=EEDE>C>>>;B;:?5=758DDBGG>BDD>??=?<GDADD<>8>>C0AAC;41889;598	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:9317:1548#CGATGT/1	99	chr1	55128844	0	75M	=	55128876	32	GTATCATGTACATTTGCAGGCTCCGACCACCTTTGTAATAACGGATGTCATCACTGTTGCTAGGATACCACATTC	IIIIGIIGIIIHIIIIIIIIIHIIHHIEBHIDIFHGGGHHFGGGDEEAEEGEEFEDDGBDBBDD@ABBBBEADD?	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:9672:1547#CGATGT/2	163	chr11	119145845	0	69M405N7M	=	119145845	0	ACAGCTCCTACCCAGGTCCTTTCCAACGGCATCCCTGTCTCCAATTTCACCTACAGCCCTGACAACAAGAGCCTGG	GGGDGIGIEEGGG@GGBGGDGGDG<EC@C?EEEGED>>DAAA2??DD>8BB8<B@@ABD#################	NM:i:0	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:9890:1548#CGATGT/1	105	chr1	134462093	0	75M	*	0	0	AATACGGTATGCTGAAGGCCAGGCGTGTGATGAATCAAGAGACACCATCCCGGGGCCCAGTCCAAGTTTCACAGT	IIIIIIIHIIIIIIIGIIIIHIIHIEIEIEEGGFEIHIEHEHGHGHGGIGHDECE<BB=@?BBB>@=@B@;;<<6	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:10665:1551#CGATGT/2	163	chr2	131822168	0	22M1791N54M	=	131824015	1847	CATTAGCCGGAGCACAGTGTACTTGCGCATCAGCTTCTCCACCTCCCGGTCTTCCTCCTCCTGGAGTTTCTGAATG	IIIIIIIIIHIIIIIIIIIIIIIIIIIHIIIIIIIIIHIIHIIHIIIEIDHFHFHGGHDHED3D>B@BB@B>>@B?	NM:i:0	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:11618:1551#CGATGT/1	99	chr7	107780316	0	52M2791N24M	=	107780330	14	GTGATGTCATCATCATGCTAGTAGGAAATAAAACAGATCTTGCTGATAAGAGGCAAGTGTCAATTGAGGAGGGAGA	@;>7BBCAABGGEGBGDG<DFCAFDCE<GGDDDDDDGEFGGEEGGEFDGEEF@ED8A9A?BB3DBA9A?;AA####	NM:i:0	IH:i:1	HI:i:1	XS:A:+
    HWUSI-EAS1737:4:1:11717:1547#CGATGT/1	99	chr15	81473288	0	6M44420N70M	=	81517743	44455	GAGGACGGAGCCACCTCCGCCCAAAATAAATTGAGGGTCATGTACCTCCTGGGGGCACCCTGCACCGCCTGATGTC	HHHHHHHHHHHHHHHHHHGHHHDGHHHHHHHFFFHGFDHHHHEHHGGGGEEDDGED>EEB4B@@BBABB3@#####	NM:i:1	IH:i:1	HI:i:1	XS:A:-
    HWUSI-EAS1737:4:1:12765:1541#CGATGT/1	99	chr12	103632507	89	62M1I12M	=	103632513	6	GGGGAATTTGCTCTTGAAAGCTAATGGACAACAAAACAAACCAAAACAAACAAACAAAAAAAACTCCTAAGAAGC	GGGGFHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHHHHHHHHHHHHHHHHHHHHHFHHHHHGGEHGGHCGGG	NM:i:0	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:13990:1544#CGATGT/1	99	chr7	52958731	85	75M	=	52958727	-4	CCGAAGCCTTACATTCAGTCTCCATACGACAACAGACATAACTAAGTCCCTGTGCATTATCTGGGACCCCCAGAT	IIIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIIIIIIIIHHHIIIHIIHIHHGHIHIIIHHGDGGEGEGDHE@GD	NM:i:3	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:15635:1545#CGATGT/1	73	chr10	80106841	0	75M	*	0	0	AAAAAAGCCTGATGGGACACGTGTGGGACCAGGGTGCATCGGAGGTCCAGCTGTGGCTGGCTTCGTATGGTATGC	HHEHHHHHHHHHHHDHHHEHEEGFGGG@GGEGDEBDFGDBDBBBB=DBBBB@B@BA@##################	NM:i:1	IH:i:1	HI:i:1
    HWUSI-EAS1737:4:1:16207:1542#CGATGT/1	99	chr1	37425138	0	75M	=	37425217	79	CTGCCGCATGTACCGCTTCCCAACTACTGATGGTAACCACCTACGGATCCTGGAGCAGATGGCAGAGAGCGTCCT	IIIIIIIFIIIGIIIIIIIIIGGHGGIIIIIHIGIHIIIHHGEHIGDEGIGEEDEEDGECDDDBD@B>@AB@AA@	NM:i:0	IH:i:1	HI:i:1
    Last edited by FuzzyCoder; 09-15-2011, 09:24 PM. Reason: Clarify Title
    Best Regards,

    Paul Bergmann

  • #2
    strange sam file:

    There are two problems with your sam file:

    1) This kind of read is not separated from its quality:

    Code:
    GGAGACAAAGTGCATATAATTGGCCACNNNCCTCCAGGACATTGTCTTAAGAGCTGGAGCTGGAATTATTACAAAA7722224<10.48;147444DDD@G;2###42719DDBDFD0B@B7;92<8>?>9>55A#################
    2) Your read has points in the middle.

    Code:
    CCTCGATTCATCAATGTCCCTCCGTAA..GGCACACATCTGAGAGCTTTTTAGTGAGCGTTTCTGGGCTGTGGTC
    I removed this kind of reads and the script runs nicely. I tried to find what was the meaning of "." in MapSplice output and I did not succeed. But you are right, I think sam format allows points in the sequence field. I guess you can substitute them with N and the script will work, what really matters are the coordinates of the alignment. Another detail, for paired end reads, in order for dexseq_count.py to work good, the read pairs need to be one line after the other. I mean:

    Code:
    readfoopair1 ...
    readfoopair2 ...
    readfoo2pair1 ...
    readfoo2pair2 ...

    You can do this with:

    Code:
    sort -k1,1 -k2,2n mysamfile.sam > mysamfilesorted.sam

    Cheers,

    Alejandro
    Last edited by areyes; 09-16-2011, 04:52 AM. Reason: More complete answer.

    Comment


    • #3
      Thank you for the guidance Alejandro. I apologize for the slow response but I am new to awk, sed and the linux shell, and needed time to figure out how to use these tools.

      I followed your advice, and after many iterations arrived at the following script that finally ensured all the sam fields were properly delimited and the sam files were properly sorted:

      Code:
      #!/bin/bash
      # 
      # prepmapsplicesamfordexseq.sh
      # PEB 2011.09.16
      # splits MapSplice SAM file into two temp files: 1) headers and 2) alignments,
      # sorts the alignments file; concatenates headers and sorted alignments to create 
      # a joined sorted file with headers at the top; then replaces '.' with 'N' in
      # sequences and spaces with tabs as delimiters.  
      #
      # Parameters: infile, outfile
      #
      if test $# -ne 2 
        then
          echo "Please specify input SAM and output SAM filenames"
          exit 1
        fi
      
      : ${TEMPDIR:=/tmp}
      
      starttime="$(date +%s)"
      echo "$(date): Extracting alignments from $1 (Step 1 of 7)"
      awk '{if(index($1,"@")!=1) print}' $1 > $TEMPDIR/$$.aligned
      
      echo "$(date): Extracting headers from $1 (Step 2 of 7)"
      awk '{if(index($1,"@")==1) print; else quit}' $1 > $TEMPDIR/$$.headers
      
      echo "$(date): Sorting extracted alignments (Step 3 of 7)"
      sort -k1,1 -k2,2n $TEMPDIR/$$.aligned > $TEMPDIR/$$.sorted
      
      echo "$(date): Merging headers and sorted alignments (Step 4 of 7)"
      cat $TEMPDIR/$$.headers $TEMPDIR/$$.sorted > $TEMPDIR/$$.joined
      
      echo "$(date): Replacing . with N in sequences (Step 5 of 7)"
      awk '{if (index($1,"@")>0) print; else {if (gsub(/\./,"N",$10)) print; else print}}' $TEMPDIR/$$.joined > $TEMPDIR/$$.cleaned
      
      echo "$(date): Replacing spaces with tabs (Step 6 of 7)"
      sed 's/ /\t/g' $TEMPDIR/$$.cleaned > $2
      
      echo "$(date): Cleaning up (Step 7 of 7)"
      rm -f $TEMPDIR/$$.headers $TEMPDIR/$$.aligned $TEMPDIR/$$.sorted $TEMPDIR/$$.joined $TEMPDIR/$$.cleaned
      
      stoptime="$(date +%s)"
      elapsed_seconds="$(expr $stoptime - $starttime)"
      
      echo "Elapsed Time: $elapsed_seconds Seconds"
      However, I now receive the following error when attempting to run the count script:

      Code:
      .../python_scripts$ python dexseq_count.py --paired=yes /media/myLab/DEXSeq/mm9.gff /media/myLab/DEXSeq/30D_alignments_dexseq.sam /media/myLab/DEXSeq/30D_dexseq_counts.txt
      Traceback (most recent call last):
        File "dexseq_count.py", line 132, in <module>
          for af, ar in HTSeq.pair_SAM_alignments( HTSeq.SAM_Reader( sam_file ) ):
        File "/usr/local/lib/python2.7/dist-packages/HTSeq-0.5.3p3-py2.7-linux-x86_64.egg/HTSeq/__init__.py", line 606, in pair_SAM_alignments
          raise ValueError, "'pair_alignments' needs a sequence of paired-end alignments"
      ValueError: 'pair_alignments' needs a sequence of paired-end alignments
      I have included the first 100 lines from the 'cleaned and sorted' sam file for your review (please note that all delimiters are indeed present and are tabs only):

      Code:
      @SQ	SN:chr1	LN:197195432
      @SQ	SN:chr10	LN:129993255
      @SQ	SN:chr11	LN:121843856
      @SQ	SN:chr12	LN:121257530
      @SQ	SN:chr13	LN:120284312
      @SQ	SN:chr14	LN:125194864
      @SQ	SN:chr15	LN:103494974
      @SQ	SN:chr16	LN:98319150
      @SQ	SN:chr17	LN:95272651
      @SQ	SN:chr18	LN:90772031
      @SQ	SN:chr19	LN:61342430
      @SQ	SN:chr2	LN:181748087
      @SQ	SN:chr3	LN:159599783
      @SQ	SN:chr4	LN:155630120
      @SQ	SN:chr5	LN:152537259
      @SQ	SN:chr6	LN:149517037
      @SQ	SN:chr7	LN:152524553
      @SQ	SN:chr8	LN:131738871
      @SQ	SN:chr9	LN:124076172
      @SQ	SN:chrM	LN:16299
      @SQ	SN:chrX	LN:166650296
      @SQ	SN:chrY	LN:15902555
      HWUSI-EAS1737:4:100:10000:10128#TGACCA/1	4	*	0	0	*	*	0	0	GTTTTATTCCCATAATGCCTCTTGGCCTTTGCTGTGTCTATGCGAGATCGGAAGAGCACACGTCTGAACTCCAGTC	IIIIIIIIIHIIIIIIIIIIIIIGIHHIIIIGIGHIHHIIIHIIGHHGIIEGGGFGHIGHHGEIGEDEDEHEEEBI	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10000:10128#TGACCA/2	4	*	0	0	*	*	0	0	CGCATAGACACAGCAAAGGCCAAGAGGCATTATGGGAATAAAACAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG	HIIIIHIIIIIIIHIIIIGIIIIDIIHGGIIIIIHIIIIIIIIIGFIIIHIIIIHIIIGIIEIGGFIHEGEE@HBH	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10000:15973#TGACCA/1	99	chr7	110975052	78	47M2D29M	=	110975079	27	GAACATAAATGCTTTTTATTTGTCAGAAGACAGATTTTCAAATGTCTATCATTTTGCCAACAACTGACAGATGCTC	HHHHHHHHHHHHHHHHHBHHHAFGDGGDGBHHG<HHGGHBHHHHHHHHHHDHHHHHHHHFHHHHHHHFHHFEHHHH	NM:i:3	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:10000:15973#TGACCA/2	147	chr7	110975079	92	20M2D56M	=	110975052	-27	AGACAGATTTTCAAATGTCTATCATTTTGCCAACAACTGACAGATGCTCTCTTGGGAACAATTAACCATTGTTCAC	IHIHIDIIIII<IIIIIIIIIGIIIIHIIIFHIIHIIIHIHIIIGHIIFIGIHEHIIIIIGIGFHIGGGGEDGGGG	NM:i:0	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:10000:1940#ACAGTG/1	83	chr17	27885684	87	75M	=	27885665	-19	GTCTTGTCATCACTGCAGGGATTCACAAGGCTTCACCAAGCATCTTTGACAATCTTGTTTCTCTCTTCTTCTCTG	FDDGFGEGGGGDIEIIGHIIIIHFHIIHIIIIIIGHIIHIIHHIIIIIIHHIIFIIIIIIIIIIIIIIIIIIIII	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10000:1940#ACAGTG/2	163	chr17	27885665	0	75M	=	27885684	19	CTTAGCACAATGTTATTCTGTCTTGTCATCACTGCAGGGATTCACAAGGCTTCACCAAGCATCTTTGACAATCTT	IIIIIIIIIIIIIIIIIIIIIIIIHIIHIIHIIIIIIIIHIIIIIIHIGIHIIIHIHIIIHIIHHHGGG@GGGHG	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10000:4658#CGATGT/1	4	*	0	0	*	*	0	0	ATTAAACTTACTATGGGCTTGAGAAGGCCCGGCGGGGGGGGGGCGGCGCCGGCCCGCGCCCGGAGAGGACACGCCC	IIIEGHIIEHGIGGIIFIIIIHIIHHIHIIGGGGG#########################################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10000:4658#CGATGT/2	4	*	0	0	*	*	0	0	GAGGCCTCGCCGGGCCTCACCCGCCGGGCCTTCTCAAGCCCATAGTAAGTTTAATAGATCGGAAGAGCGTCGTGTA	IIHHGIDG@IGGGGGFGGIBGI<BDDGE2G8ECACAACCCG8DBD<AAA6EEEEEE@DB@<BBDBCC>B<?@;?##	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10000:5930#ACAGTG/1	4	*	0	0	*	*	0	0	GTTTGATCCAAATTCATCTTCTGAGGTGGAAGCATAATCAGTAGGAAAGCGAGATCGGAAGAGCACACGTCTGAAC	IIIIIIHIIIIIIIGIIIIIIIIIGIIIDIIIIIHIIIIIIIIIIIIHIIGHIIHIGIIIIFHGDGGGHGGHHGID	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10000:5930#ACAGTG/2	4	*	0	0	*	*	0	0	CGCTTTCCTACTGATTATGCTTCCACCTCAGAAGATGAATTTGGATCAAACAGATCGGAAGAGCGTCGTGTAGGGA	IIIIIIIIIIIIIIIIHIIIIIIIIIIHIIIIIIHIIIIIIIIIHIIIIHIGIIIIIIIIIIIHDGHIIIGHIFHG	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10000:7692#CGATGT/1	99	chr10	79198492	88	75M	=	79198514	22	GTGCGATCACCTCGGTGCCTCAGCCCCAACCTGGGACAGGGACAGGGCGGCCCTGGCCGAGGACCTGGCTGTGCC	IIIIIHIIIIIIIIIIIHIEIIIHIIIIDIIHIIIBIIIHIIIHHHIIHFHHIBGIIIGCECEEHEECE######	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10000:7692#CGATGT/2	147	chr10	79198514	0	75M	=	79198492	-22	GCCCCACCCGGGGACAGGGACAGGGCGGCCCTGGCCGAGGACCTGGCTGTGCCCCGCATGTGCGGTGGCCTCCGA	#######################?<8BBBB<8A??<DAGD@DD<GD<DA<DDGGBIDDFIDHBIIIIIIIIIIIH	NM:i:2	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:1000:10240#TGACCA/1	4	*	0	0	*	*	0	0	NGAGTGCCTGAGTGCCTACGAGAGGGATCTGTACAGAGATGTGATGCTAGAGATCGGAAGAGCACACGTCTGAACT	#--,,53666@@@@@@@@@@8@7@@@7@@@:::::C@@@C@@C@@C@@C@@@@@C@@@@@C@C@CCC@@@@@@@@@	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:1000:10240#TGACCA/2	4	*	0	0	*	*	0	0	CTAGCATCACATCTCTGTACAGATCCCTCTCGTAGGCACTCAGGCACTCCAGATCGGAAGAGCGTCGTGTAGGGAA	IEIIIIIIIIIIIIIIIIIIIIIGIIIHIIIIHIIBIIIFIHIGGIGFHIGEBEEGGBGGECBD@BC@C@@CCE:=	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:1000:10725#TGACCA/1	4	*	0	0	*	*	0	0	NCCACACAGAAGATGTGGCTTAAGCCCCCTGTCCCACTGGCCCCCAGATCGGAAGAGCACACGTCTGCACTCCAGT	#*,*)54548@@@@@@C@C@@@@@@@@@@@##############################################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:1000:10725#TGACCA/2	4	*	0	0	*	*	0	0	GGGGGCCAGTGGGACAGGGGGCTTAAGCCACATCTTCTGTGTGGTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGT	HIIIIIGHIIIIIGI@IIIIHIHGIIHBIHHFBIIIDHGHIEIIBBGEFEIIEBIBD2BBBDBB@@BDB7==?7=0	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:14012#TGACCA/1	4	*	0	0	*	*	0	0	GTCTGTGGAGACACTGACAGAGATGCTGCAGAGCTACATCTCAGAAATCGGGAGAGATCGGAAGAGCACACGTCTG	IIIIIIGIIIHIIIIHIIGGIIIBIIGIHIHIHIIIIGIIIIIIIIIIIIIHGHGIHHIIHEHHGGHIGHCGDGFH	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:14012#TGACCA/2	4	*	0	0	*	*	0	0	CTCCCGATTTCTGAGATGTAGCTCTGCAGCATCTCTGTCAGTGTCTCCACAGACAGATCGGAAGAGCGTCGTGTAG	IIIIIIHIIIIIIIIIHIIHIIIIIIHGDGIIGIIIBIIIIFIIIIHHHIIIGIBHGGIIIGIIGGGGEGGDCEDI	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:15442#ACAGTG/1	4	*	0	0	*	*	0	0	AAGATCTTCATCACACATCTCCATGGCGACCACTTCTTTTGCCTTCCAGGGCCCCCCTGTCCCATCAACCTCCCCA	DBED?EE:EE?EEEEGGDG@DDB@?###################################################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:15442#ACAGTG/2	4	*	0	0	*	*	0	0	GGTTCTCCAGATAAAGCCCCGAAGCCCGACGGGGCCGTAGATCTCAATAGGCTGCCCGGCCACCAACGGGCCAATC	F3EBEEEFCDE3DDBF-FFFEA<BB###################################################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:15664#CGATGT/1	99	chr11	94280235	0	50M6793N26M	=	94280272	37	GAGGAGCCGGAAGCGCGAGTAGTCAGAGTAGTAGGGCTTGCACTGGGCTTCGGCCATATGCTTCTCCTTACTCCTT	HHHHHHHHHHHHHHHGGDGBGGEGED>DDGG@BGGHB@HHGGEEBD>B>DBBDBD8=):;=AC>A:;;?4BB3BBB	NM:i:2	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:10001:15664#CGATGT/2	147	chr11	94280272	0	13M6793N63M	=	94280235	-37	TTGCACTGGGCTTCGGCCATCTGCTTCTCCTTACTCCTTCTCTTTTTCTCCAGCCTCTTTAGTCGCTTCTCCTCCC	BC@B?BBEBB=EB>EE>B>BBFEEEDCACAAE>BCGIIFAFAAAEABD3GGGFGGBIIHIBIIGIDIIIHGGIGGI	NM:i:0	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:11573#ACAGTG/1	4	*	0	0	*	*	0	0	NGGGACACGACAAGAAGATGCCGGGACCTCAAGGCCAAGGGCATCTTGTTTGTGGGGAGCGGAGTCAGTAGATCGG	#,,++65556C@@@@@@C@C@@@@C@CC@@C@@@@@;@@@<::<<@C@@@22@@@@####################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:1000:11573#ACAGTG/2	4	*	0	0	*	*	0	0	ACTGACTCCGCTCCCCACAAACAAGATGCCCTTGGCCTTGAGGTCCCGGCATCTTCTTGTCGTGTCCCGAGATCGG	IIIIIIIIIIIIIIIIIFIIIIIIAE@BEEICCDEGIGIEBEEBFDD@BB<?@?@B=@'B@>::@6>@########	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:1698#CGATGT/1	99	chr2	37411808	83	19M4168N57M	=	37416053	4245	CGGAAAGGACCTGGATAATGCAGAGGAAAAGGCAGATGCATTGAATAAGGAGCTGCTGATGACCAAGCAGAAGCTG	IIIIIIIIIIIIIIIIIIIIIIIIIHIIIHIIIIIHIIIIIIIIIIIHIIIIIIIIIIGHIGEGGEIGEHEEHGDF	NM:i:3	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:10001:1698#CGATGT/2	147	chr2	37416053	0	47M789N29M	=	37411808	-4245	TTGATGCAGAAGACGAGAAGAGGAGGTTAGAAGAGGAGTCTGCACAGTTAAAGGAAATGTGCCGCCGGGAACTTGA	8=@BB@>ECEEEDBDGGHGGGHGDGCIDFGHFHGHHHEFGHHHGIIHIFIIGIGIIIHHFHIIIIIIIIIIIIIII	NM:i:0	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:10001:19407#CGATGT/1	83	chr9	70551133	0	75M	=	70551102	-31	ATAAAGGATTATCTTACAATGTGGATTCATTACACCAAAAACACCAGCGTGCCAAACGAGCAGTCTCACATGAGG	IE8II@HIHIIHIIHGIIIIHIIIIIDIIIIIIIHIIHIIIEEGFIIIIIIIIIIIIIGIIIIIIIHIIIIIIII	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10001:19407#CGATGT/2	163	chr9	70551102	92	75M	=	70551133	31	GGAAATCCTTTAAATAAATATATTAGACATTATGAAGGATTATCTTACAATGTGGATTCATTACACCAAAAACAC	IIIGIIIIIIIIIIIIHIIIIHIIIHIIIIIII+IGGGEGIIIIIIIIIHIIHIIIIIIIIIIIIIGHIHIEHII	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10001:20806#CGATGT/1	4	*	0	0	*	*	0	0	TTCCATTAGGAAATCTTCATCGCTGCCAGAATCTTTCTCCTGGAATGGCGCCTCATCATCTTCTTCCGGCAGATCG	IIIIIIIGIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIEHGGGIHIHHEGHGHGGHHGHEHGEB;BBABBBBD	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:20806#CGATGT/2	4	*	0	0	*	*	0	0	GCCGGAAGAAGATGATGAGGCGCCATTCCAGGAGAAAGATTCTGGCAGCGATGAAGATTTCCTAATGGAAAGATCG	IIIIIIIIIIIIIIIIIGIIHIDIIIIGIIIIIIIIIIHGIIIIHIHIHFFGIHGBCDDDEGIEEGEDCEEDDDDC	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:21114#TGACCA/1	73	chr9	53470554	0	75M	*	0	0	TTTCATTTCGACTAAGTCATTCTATACAGGCACTTTTGGCAATTATAAAGTTAAGCAATGCAAGTCTTACAATCA	HHHHHHHGHEGHHHHHHHHGGGBBDGGDG@GGHHHFHHH>DDGGDHGHFHHHEGHGHHHHGHHHGFHDHEHGDHF	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10001:21114#TGACCA/2	4	*	0	0	*	*	0	0	TTTCATTTCAACTGAGCCATCATGCAATACTCTGTAGCATCTTGAATTTTGATTGTAAGACTTGCATTGCTTAACT	IHIIIIIIIHFIGIBIIIIGIIIIIHHI>IIIIDHG@GGGIIIBD<IGII8AEG>IEIIFHG@IGHIIFGHIHHII	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:1000:12902#TGACCA/1	99	chr9	64757341	0	75M	=	64757370	29	NAGTATTCAGATCCCTGCTCACAGATCAAAAACTGTTGTGTCAAAATGCCCGCTATTCCCAATGGCCAGAAGCAT	#.*0*55566@@@@C@@C@@@@@@@@@C@@@@@@@@@@@@@@@C@@C@@@@@@C@CCC@@:<<:<C@@@@@@C@C	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:1000:12902#TGACCA/2	147	chr9	64757370	0	75M	=	64757341	-29	AAACTGTTGTGTCAAAATGCCCGCTATTCCCAATGGCCAGAAGCATCAGCACATGTGGCCCCTTGGATAAGGACG	BB5@BBB8CBB@@HEGDDEEDGGEGGDEFFGIIIIIDDHIIHIIBIGIHIGIIGIHIIHIIIIIDIIIIIGIIII	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10001:3547#TGACCA/1	4	*	0	0	*	*	0	0	TTTTTAGGTGGTTTTGGCTCTTAATGGATGGTGGAACCCAGCCTGGTCAGTTCTACCCTTGGGTTCCAGATCGGAA	IIIIIIIIDI>DGGGIIIIIIIIHIIIIIIG7DGA?<??<?AA??<9=<<<@>@@@@B>B################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:3547#TGACCA/2	4	*	0	0	*	*	0	0	GGAACCCAAGGGTAGAACTGACCAGGCTGGGTTCCACCATCCATTAAGAGCCAAAACCACCTAAAAAAGATCGGAA	IIHIIIIIIIIIFEIGIIIFIIIIIIGIIIFFIIIIIIFIIIIHIHHIIHFGFIDIHIIIHHHIIGHHCHCGGIDI	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:3586#ACAGTG/1	4	*	0	0	*	*	0	0	GTCAGGGTTGTCCTTCTCCGACCCTGAGTGCTCCCGGTCCTTGGCTCTGCCCTTCTCACTGGACGAGATCGGAAGA	HIIIIIIIIIBIIIIIIIHHIIIIIIHIHIGIGIIIIIGIIHHIIGIIIGIHHIFIIEHHIDIGHGGGGGIEDDGB	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:3586#ACAGTG/2	4	*	0	0	*	*	0	0	CGTCCAGTGAGAAGGGCAGAGCCAAGGACCGGGAGCACTCAGGGTCGGAGAAGGACAACCCTGACAGATCGGAAGA	IIIGIIIIIIIIIIHIGHHIIHIIIHDHIIIHIIIGIIIEIIGIEIIHEIFHEIHIDHHHHGHGIHGDEGEE8EB2	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:4432#TGACCA/1	4	*	0	0	*	*	0	0	GTCCAGGAGCACCCACGGGAACAACCGTGTCTCTCACCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGA	IIIHIIIIIIBIIGIIHIHIIHIIHIIEIHHIIHHHIHIHHFGFHHGEHFG@GEGDGDGEGDGEEDDBDDDEEEDD	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:4432#TGACCA/2	4	*	0	0	*	*	0	0	GGGTGAGAGACACGGTTGTTCCCGTGGGTGCTCCTGGACAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGAT	IIIHIIIIIIIIIIIIIIIIIIIIIIIIGIIIGIIIIHIIIHHIIIBGIGFFHDFHDIEBEEE8CEEBD@E@DDDE	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:5213#TGACCA/1	99	chr10	77863600	57	74M14104N2M	=	77863602	2	GCCCTAGCCCCGGGCTTCCTCCTCTTCCCAGGAGGACCCCCTTGGCCAGCCTTCTCATCCGTTTTCCTGTTCCTCT	HHHHHHHHHHHHHHHHHHHFHHEGHHGHHFHHGHHDHHHHGCEBBDDBDDHBHFDDBDBDEBCEGAAA>AD<ABB<	NM:i:1	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:10001:5213#TGACCA/2	147	chr10	77863602	76	72M14104N4M	=	77863600	-2	CCTAGCCCCGGGCTGCCTCCTCTTCCCAGGAGGACCCCCTTGGCCAGCCTTCTCATCCGTTTTCCTGTTCCTCTTC	GEDDDEDEEDG=>=+?FEFHEHGDHIHIHIIIIGIHHIIIIIIIIIIIIIIIGIFIIIIIIIIIIIIIIIIIIIII	NM:i:2	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:15949#CGATGT/1	99	chr3	17699106	94	75M	=	17699195	89	NTTTGAATAAGAAAATGATTTATGTTTGAGGAAGAAGCTGAAATTTTTGAGTCATATTAATTTATCCAAAATTTC	###########################################################################	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:1000:15949#CGATGT/2	147	chr3	17699195	0	75M	=	17699106	-89	CCTAGCAAGTACTCAAATGGTAAAGCTACCAGGAAGTAGACGGTTATAAATAGAACCATCAGGGATGCAGAGACG	DE>EE;CC@<BEDEGGDDD@GGGGDGBG8EGGDGDGG@DIIIGIHHIHIFDGDEIEGG<BBGGGGBEGEGGGBDG	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10001:7076#CGATGT/1	4	*	0	0	*	*	0	0	CAGTAGCCCAGCAGCCGGGACGCATGCTCCTTCTCCTGAAGGGAAGATCGGAAGAGCACACGTCTGAACTCCAGTC	HHHHHGHHHHHHHHHHGHHEHHDD@EGFGGHGHHHED>DEEHHBEEBDEEHBGGDEECCDBB<EEBBB>BBE=B@B	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:7076#CGATGT/2	4	*	0	0	*	*	0	0	TCCCTTCAGGAGAAGGAGCATGCGTCCCGGCTGCTGGGCTACTGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG	IIIHIIIDIHBIIIIIFIHIGIIIIIIIIHHFHIGIIIHIIBDIGFGHIFF>HFEHDHDGHDFDEDCE@BCBBE>C	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:1000:18684#CGATGT/1	99	chr5	130758142	0	75M	=	130758213	71	NTGGTCTGATGTTCCTGTGGCAGAGCTGTTGGGCGCCATGAATCCGCGTTTGATTCTCCACCGTGGCTGATGGCA	#0,,065366@CC@@C@@@@@@C@@@@@@@C@@@@@@@@C@C@CC@@@@C@@@@@@@@@@@@@@@8795768865	NM:i:2	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:1000:18684#CGATGT/2	147	chr5	130758213	89	75M	=	130758142	-71	GGCATTGTCTGTTATTCCTTGTCTGTTTACTGAGACGTTGATGGACACTGGGGTTGCTTGTGTGTTGGATGTTAT	EFBEIEHHBHFHGHIIHHIIIIHIIIHEEIHIIIFIIIIGIIIIHIHHIHIIHIIHFIIIIIIIHGIIIIIIIII	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:1000:18948#CGATGT/1	83	chr3	107879006	0	4M266N72M	=	107878937	-69	CTGCCTGCGCAGATGGTTCAACATAGCCATGTTAGCGAAGGTGTAGTACAGGTAGTAGGCATAGGGAGGGTTGTCN	@@@@@@@@@@@@C@@@@@@@@@@@@C@@@@@@@@@CC@@CC@@@@C@@C@C@@@@@C@@@@@222C58585+2.-#	NM:i:1	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:18948#CGATGT/2	163	chr3	107878937	0	73M266N3M	=	107879006	69	TGATACCAGGTGGTGGATGGGCCCGGCCTCCCCACAGTGCGGCCTCAGCACGAACGTGTGGAAACCTCTCTGCCTG	IIIIIIGIIIIIIGIIIHIIIIIIIIHIIIIIIIHDEBFEIIGIFIHGDDHC@HFC@E>CB8BDB@>BBB@8@?##	NM:i:0	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:19242#ACAGTG/1	99	chr16	85025201	0	56M411N20M	=	85025242	41	NGCTGTCTCTCATTGGCTGCTTCCTGTTCCAGAGATTCCACTTTCTCCTGGAAATGCTGGATAACGGCCTTCTTGT	#**,)77888@@@@@@@@@@@@@@@###################################################	NM:i:1	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:19242#ACAGTG/2	147	chr16	85025242	85	15M411N61M	=	85025201	-41	TTTCTCCTGGAAATGCTGGATAACGGCCTTCTTGTCAGCTTTGGGCAAGTTCTTGGCTTGACGCTCTGCCTCTTCC	BGFGAGGBGGGHFHHHBIHEHIHIIIHHIIDGIIBHBIHGIIIIIEHIIHIEIGIIGIIIFHIDGIIHIIIDIIHI	NM:i:0	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:10001:9507#CGATGT/1	4	*	0	0	*	*	0	0	GGGATAACCTGGGTGCTCTGGTCCGAGGCTTAAAGCGCGAAGATTTGAGGCGTGGCTTGGTCCTGGGGCAGCCAGG	HHHHHHHHHHEGGDGGGGGGFDHHG@EEGGFHGHHHHHHHHBHHHHFGHHGDD=@#####################	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10001:9507#CGATGT/2	153	chr7	133633161	0	57M88N19M	*	0	0	GGCTTGGTCATGGTCAAGCCAGGCTCCATCCAACCCCACCAGAAGGTGGAGGCCCAGGTTTATATCCTCAGCAAGG	381:=?<B@@C=888>-BDB@?<?AAA@=8A<AAABE-<EE@8GBG8GIDGI?II@GIIIHHIIIIIIHIHBGEGI	NM:i:0	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:1000:20235#ACAGTG/1	83	chr11	6369833	0	75M	=	6369822	-11	GGGAAATAAAATACTTCACCATTTCCACCTTACATCTTAAATTTCCAACAGGACAGGACAGGACCTTTTTTTTCN	####################@@@7@@@C@@CC@@@@@@@@:::::@@@@@C@@C@CC@@@CC@@2---22-++,#	NM:i:1	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:1000:20235#ACAGTG/2	163	chr11	6369822	91	75M	=	6369833	11	ATTAAAAGCAGGGGAAATAAAATACTTCACCATTTCCACCTTACATCTTAAATTTCCAACAGGACAGGACAGGAC	IIIIHIIIIIIIIIHIIIIIIIIHIIIIIIIHIIIIIHIIIIIIIIIIIIGDIIIIHHHIIHGHIHEGGGEGDGD	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10002:12073#CGATGT/1	83	chr2	32816607	90	75M	=	32815080	-1527	TTGGGTGGACCTCAAGGTAGGCCAGTCACTCCGAAGACTTGGCCACAGATACAAACCATTGGCCCCCAAAAGCTG	GEEGFIGGAHGGHIGIGIIIIHIGIIIGIIIIIIIIHHIIIIIIHGIIIHHHIIIIIIIIIIIIIIIIIIIIIII	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10002:12073#CGATGT/2	163	chr2	32815080	87	54M1430N22M	=	32816607	1527	GGTTTCCGCTTCCGAAAGAAGAGGGGCATCCTCCGGTGTGAGCTCTTCTGGACCCTGGCTCTGTGGATGGGGAGGT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEIGIIIIIHHHIHHHEIIHFGGGEHHHDGGAGGEDB7AA#	NM:i:0	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:21251#ACAGTG/1	83	chr6	112673126	0	70M4011N6M	=	112673122	-4	GCTTGACAGTGTTCTGTCTCTCAAGTTCCCGCAACTCATGCAGAGCGGTGCTCATGGTCTTCTCGATGTCTTCTGN	###############################@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@11001)0,,#	NM:i:1	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:1000:21251#ACAGTG/2	163	chr6	112673122	0	74M4011N2M	=	112673126	4	GCCTGCTTGACAGTGTTCTGTCTCTCAAGTTCCCGCAACTCATGCAGAGCGGTGCTCACGGTCTTCTCGATGTCTT	IIIHIIIIIIIIIHIHIHIIIHIFIIBDIHIIIHIHGEGGHEEGIEGDGIED?DB02&-'3.300BB:B@AB>?A@	NM:i:1	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:10002:14374#CGATGT/1	89	chr2	74429424	84	75M	*	0	0	GATCTCTCCACAAAGAATTTTTATTTGGGCTTGTCAGAACCACTGTGCTCACTAGGGTCTGTGGTCCAAATCCTG	EGGHEHDHHGHIIHHIHHIHIIIHHHHIIIIHHIIIHIHIIEIIIIIIHGIIIIIIIIFIIIIIIIIIIIIIIII	NM:i:3	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10002:14374#CGATGT/2	4	*	0	0	*	*	0	0	CTCCACAAAGAATTTTTATTTGGGCTTGTCAGAACCACTGTGCTCACTAGGGTCTGTGGTCCAAATCCTGAGACCG	IIIIIIIIIIIIIIIIIIIIIHIIHIIGIIIIIIIIIIHIIIHIIIIIIIIIGIIIHIHGIIIHIIHIIIHIH2IE	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:18090#TGACCA/1	4	*	0	0	*	*	0	0	CCCAGTGACGCAGGTCTGGGAACAATGCAAGGGACATGGGATAGTAAAACGCTTCCAAAGGTAACAGATCGGAAGA	GGGGGHIIIIGEGGDIIIIIHIGIIIIIIHIIGIIIIIIIIIHIIIIIIIIIIIIIHHHIIDIIHDIIEFIIIHIG	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:18090#TGACCA/2	4	*	0	0	*	*	0	0	GTTACCTTTGGAAGCGTTTTACTATCCCATGTCCCTTGCATTGTTCCCAGACCTGCGTCACTGGGAGATCGGAAGA	IIIIIIIIIHIIIHIIIIIIIIIGIIIGGIIIHGIIIHIHIIIIIIIIIIGIIIIIIGIIIIHIIGIHGHIHGIIG	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:18750#TGACCA/1	4	*	0	0	*	*	0	0	GTTTGCTGTTGCTTGGAGACTTGCCCTGTACATTGCTTAGTGCCCTCCCCTGCCGCTGAACCCCACCAAGATCGGA	IIIFHGIIIGBIGHDIIIGIIHFFIIIIHIIIIIIIIHIIIIHHIIIIIICFIIGH>ECFIGGGEFFCBEE@GEEE	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:18750#TGACCA/2	153	chr15	89388873	0	75M	*	0	0	CCGATCTGTTTGCTGTTGCTTGGAGACTTGCCCTGTACATTGCTTAGTGCCCTCCCCTGCCGCTGAACCCCACCA	CBB@EDECHIIFEHDHGFGHHIHEH@BGEEGI>EFGDBFIID@IIDDGADDD2GGIIIIFIIHHIIIIIIIFIII	NM:i:4	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10002:19830#TGACCA/1	83	chr5	21555591	86	40M1475N36M	=	21555511	-80	GAAGTTCAGGGTTGATGACCACAGAAACCAAAGACGGAACCTTATAGGAAGAATAGGTAAGAGTGTCCAGTGCAAT	DGEEG>ECEFDE<DHEBBD>DCFEIGEICIIFHFIIIIEIGIHIIIIIIIEIIGIIHGIIIIIIIIIIIIIIIIIH	NM:i:0	IH:i:1	HI:i:1	XS:A:-
      HWUSI-EAS1737:4:100:10002:19830#TGACCA/2	163	chr5	21555511	0	75M	=	21555591	80	GAAGAAGTCCACAGCCCAGACATTCCGATTATACCCTTGGTGGCTCTTTTGCCTGAGACAAAATTTGGTGGCAGG	IIHIIHIIIIIIGEIIIIGIIFIIIIIGIGIIFGIEHIIHEIEIDGFIFCCIGGE@FBEEDGCBGEEEDGDEGDD	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10002:20694#CGATGT/1	99	chr7	63420623	0	25M3548N51M	=	63420635	12	ACTTTCGGCAGGATTAATGAGCCAGGGCAGTCTGCAGTGTTTTGTGGCCGTTCTGGAAAGCAGCTGAAACGATGTC	HHHHHHHGHHHHDHHGGGEEHHHGEBHHFHGHHHGHHEHEHHHHCCFCEF@B7B@;.=::===04@3=@7;;5=7:	NM:i:0	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:10002:20694#CGATGT/2	147	chr7	63420635	82	13M3548N63M	=	63420623	-12	ATTAATGAGCCAGGGCAGTCTGCAGTGTTTTGTGGCCGTTCTGGAAAGCAGCTGAAACGATGTCACAGCAGTCAGC	BEEFBEC=BDBDGGEIBDBA>FFDFFAIHIFIDGIIIIHIIGIHIIIIIIFIIIIIHIIIIIIHIIIIIIIIIIII	NM:i:0	IH:i:1	HI:i:1	XS:A:+
      HWUSI-EAS1737:4:100:10002:3249#ACAGTG/1	4	*	0	0	*	*	0	0	CTGTGGACTTATCACTGTCGCTGTCTGAGCTTGAACTGGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAC	GIIIIIIIHIIIIGIIIIIDIIHIIHIIIIIIIIDIIIIIIFIHIIIIGIHIIIGIFIIIDHBIIGGIHHBHHHFH	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:3249#ACAGTG/2	4	*	0	0	*	*	0	0	CCCAGTTCAAGCTCAGACAGCGACAGTGATAAGTCCACAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGA	IIIGHIIIGIIIIIIIIIIHIIIIIIIIGHIIIGIIIIIIHHHHIIGIIHIIHIHGGEIDEIII>BBDBDBFBHEE	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:3514#CGATGT/1	89	chr5	117565527	0	75M	*	0	0	GTGGGAGGAGAGAAGCTGGGAGCGAGAAAGCCAGGGCGAGCATTGCTCCGACCCGAGTGCGGAAATGAGAGACGC	?.=AAEFBBDBBDBD@@GEGHIGHHIHEEHIEBHIGIIIIGFIGDFIIFGDIIHIIIIIGGGGG@IIIFIIIGII	NM:i:0	IH:i:1	HI:i:1
      HWUSI-EAS1737:4:100:10002:3514#CGATGT/2	4	*	0	0	*	*	0	0	TGGGGAAGCTCCCCAAGCGGAGCTAGGGGCATTAAAACAGTGGGAGGAGAGAAGCTGGGAGCGAGAAAGCCAGGGC	IIHIIIHIHIIIIIFGIIHHIIHFHIIHEGIIIIIDIBHIDGIFBGD@GDEEEDD3BBBBDHECFBE;AEGEDDDG	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:5277#ACAGTG/1	4	*	0	0	*	*	0	0	CGTCCTACTTGCTCTGGAGCCTGGGGCAGATGACTGCTGGAGCCCTGGAGTTCAAAGCTAGCCTGCGAGATCGGAA	IIIIEIIIHIIIIIIIIIIIIIIIIIIHIIGIIHIIIIHFHCIIHHIHEHDDEGGFGGGGGGEEGGGBGEBEEEBB	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:5277#ACAGTG/2	4	*	0	0	*	*	0	0	CGCAGGCTAGCTTTGAACTCCAGGGCTCCAGCAGTCATCTGCCCCAGGCTCCAGAGCAAGTAGGACGAGATCGGAA	IIIIIIIIIIIDIIIIIIIGIGHIHIHGIIIIIIIIHGHIIIIIIIIGGHIIIIFHHDCIDIGICGGEGGEGGHDG	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:9232#ACAGTG/1	4	*	0	0	*	*	0	0	CGCAGATACTCCTGTAAAAACTTACTGCTAATTCGTAGTGTAATAGGACCTTTCCTAGTGTTTTCAGCTTGTAACT	IIIIIIIIIIIIIIIIIIIIIIGIIIHIIIGIHIGIHIIIGHIIIHHFIIIIHIIHIIFHGHHHEGGGGIGEEGEG	IH:i:0	HI:i:0
      HWUSI-EAS1737:4:100:10002:9232#ACAGTG/2	137	chr14	52698101	92	75M	*	0	0	CTCTACTATATTTAACCAATTTGTTCCCCTTTCAGTGTCATTGATTTCTGTGGTTCAAGGAAGAGTTTGCTAGCT	IIIIIIIGIIHIIIIIIIIIIIHIIIIIIIIIIIHIIIIIIIIIIIIIIIHIHHIIIIDHGIIFIGIIHIHHIIG	NM:i:0	IH:i:1	HI:i:1
      I am unsure how to approach this new challenge. Any guidance will be much appreciated.

      NOTE: The sample of lines from my sam file in my first post was derived using 'head'; while it appeared as though there was no whitespace between the sequence and the quality scores on same lines, that is actually an artifact of the 'head' process. The delimiters do actually exist.

      NOTE: the sam file generated by MapSplice used both [space] and [tab] as field delimiters. My script included the sed command to replace [space] with [tab] so that DEXSeq would accept it.

      NOTE: The original data come from fastq files generated from Illumina 1.5 paired-end reads with quality scores converted to sanger via Galaxy's FASTQ Groomer.

      NOTE: While the script uses awk for the '.'->'N' replacement, standard awk accepts both [space] and [tab] as valid field delimiters and did not allow me to convert spaces to tabs, therefore I used sed for that step.

      Pending a response, I will try to use Galaxy/Tophat bam output files converted to sam to see if they can be properly processed by the count script. However, I would prefer to use MapSplice output as it seems to be better geared for identifying differentially spliced junctions.

      NOTE: Please pardon any misuse of genetics terminology. I am a mathematician/statistician/programmer, not a geneticist.

      Best Regards,

      Paul Bergmann
      Last edited by FuzzyCoder; 09-17-2011, 11:41 PM. Reason: Added more Information
      Best Regards,

      Paul Bergmann

      Comment


      • #4
        I see...

        The flag term in the sam file is not correct if there are paired-end reads what you are analyzing. (2nd column) According to the sam format:

        "0x1 template having multiple segments in sequencing"

        in your case, there are some fields with 4 which would indicate that you have single reads. Because you are setting the parameter in dexseq_count.py saying that you have paired end reads, the script is complaining... MapSplice sam format seems a bit confusing. To be honest I have never use it.

        Though I have use tophat and its works good with the script.

        Pending a response, I will try to use Galaxy/Tophat bam output files converted to sam to see if they can be properly processed by the count script. However, I would prefer to use MapSplice output as it seems to be better geared for identifying differentially spliced junctions.
        Have you try GSNAP? It also handles nicely this kind of alignments and I have not have any problem with its SAM output:

        Last edited by areyes; 09-18-2011, 04:24 AM.

        Comment


        • #5
          Alejandro-

          I have temporarily abandoned MapSplice and, after a brief failed attempt at using TopHat generated BAMs with DEXSeq, have spent the last week trying to arrive at the right set of parameters for running GSNAP. With my hardware, I am forced to use memory mapping rather than RAM allocation for the hash and genome.

          My concern is that it will take an inordinate amount of time to align and map the the 6 sets of paired end reads. Each of the 12 FASTQ files is 2.9 - 3.6 GB. My computer is a Quad Core with 6 GB RAM. I used the option "-B 2" as anything higher kills the process or freezes the machine due to insufficient memory.

          Can you provide me some idea of how long it will take to process two FASTQ files representing the paired end reads from a single replicate with GSNAP (each file ~3 GB). It seems to be quite a bit slower than MapSplice which took ~ 30 hours to process all 12 FASTQs (3 replicates per condition) at once.

          Thank you.

          I know you are the DEXSeq guru, and have only used GSNAP. However, my searches on the web have not reswulted in any meaningful information regarding actual runtime benchmarks for GSNAP, and I cannot wait weeks for the results.
          Best Regards,

          Paul Bergmann

          Comment


          • #6
            Please contact [email protected] for bug reports

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 11:49 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X