I got some files which is .txt format.
the content just like below:
@HWUSI-EAS762:2:1:14:1232#0/1
TCATTGGAAACAACTCCTACAGAGGAGGTAACAGAAACACCGGAACCAAGTGT.G.....................
+
gggggggggggggggggggggfgggggfeegegegggcggYT^Y`[b``BBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWUSI-EAS762:2:1:14:1130#0/1
A.CCGACTT.T.T.TGGAGTGTAACCTTTTGTGGCAACATC.TTCACCTAAGT.......................
+
they told me that I should do some filtering before I use it to do assembly.
the filtering principles are:
1)There's no N among the first 20 bases;
2)the percentage of bases whose quality less than 10 should below 10%
3)the percentage of bases whose quality less than 13 should below 20%
4)the average quality of whole read should above 20.
So I write a tiny software myself.
but after I've done those, the file was 667M left from 1.7G.
and the assembly result was very bad.
what should I do?
thank you for your help!
the content just like below:
@HWUSI-EAS762:2:1:14:1232#0/1
TCATTGGAAACAACTCCTACAGAGGAGGTAACAGAAACACCGGAACCAAGTGT.G.....................
+
gggggggggggggggggggggfgggggfeegegegggcggYT^Y`[b``BBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWUSI-EAS762:2:1:14:1130#0/1
A.CCGACTT.T.T.TGGAGTGTAACCTTTTGTGGCAACATC.TTCACCTAAGT.......................
+
they told me that I should do some filtering before I use it to do assembly.
the filtering principles are:
1)There's no N among the first 20 bases;
2)the percentage of bases whose quality less than 10 should below 10%
3)the percentage of bases whose quality less than 13 should below 20%
4)the average quality of whole read should above 20.
So I write a tiny software myself.
but after I've done those, the file was 667M left from 1.7G.
and the assembly result was very bad.
what should I do?
thank you for your help!
Comment