Hi folks, recently started looking at de novo sequence assembly. I have a file of illumina reads in raw format but have converted them to fastq. Im afraid I have no idea whether this sequencing was achieved using paired or mate end sequencing, its more of a training data set to get to grips with assembly. I have 17092779 reads of 85bp length. Running velvet, this creates a huge graph. I ran it using k=51 and got a really low n50 value. I think there is a huge level of redundancy in my files and would like to remove such sequences. Does velvet do this itself as I ran velvet all night and it was eating up 95% of the server I am running it on. Can anyone point me towards some scripts for filtering out the useless reads from my file? I tried searching the forums but havent turned up much?
Thank you
Thank you