View Single Post
Old 10-24-2012, 11:04 PM   #2
Senior Member
Location: USA

Join Date: Apr 2010
Posts: 102

Originally Posted by newkid View Post
Posted this in Bioinformatics too:

Hi everyone,

I was looking for a tutor that would be willing to sit down for a few one hour long sessions, once or twice a week. These sessions would be to explain and walk me through a few important concepts involved in genome assembly (de novo, and via reference). We would be working with publicly available data sets in a linux environment over gtalk/skype.

It would be ideal to have someone that has a considerable amount of industrial experience in bioinformatics with a broad understanding of computational biology.

I would be willing to pay a reasonable rate via paypal or any other agreed medium! I'm on Pacific standard time, and am available weekends or late evenings.

A few sample questions--if you can answer these off the top of your head you're in great shape:

1. What is a k-mer value?
2. How big and what are the file formats generated from a illumina hi-seq machine at 30x coverage? (for e. coli...)
3. How are data sets trimmed? (popular programs to do so)


I am from CA as well and though i am not from Industry i have fair bit of RNAseq experience. I have done lots and lots of denovo and reference based assemblies and after undergoing through lots of pains and thrills i am very comfortable dealing with any kind of denovo and RB stuff.

Regarding your questions:
1. K-mer is the seed/word size of length k observed more than once in a sequence.

2. Assuming you are sequencing your library with SE of 50 bases you only need 3 million reads to get a coverage of 30X. So that is 1/50 of the lane in hi-seq.

3. I assume you are asking about the ways to trim you reads. If so there a many number of ways to do so and one popular tool is FASTX tool kit.

upendra_35 is offline   Reply With Quote