SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie beginner question... milesgr General 6 03-14-2012 06:21 AM
Beginner Sequencing Analysis question mgibson General 0 06-17-2011 09:30 AM
1000 genomes newbie question brofallon Bioinformatics 1 06-16-2011 05:50 AM
Quick question on 1000 genomes project Nataiki Bioinformatics 0 03-18-2011 11:17 AM
1000 genomes Nataiki Bioinformatics 4 02-04-2011 04:42 AM

Reply
 
Thread Tools
Old 05-23-2011, 07:40 AM   #1
EHC
Junior Member
 
Location: here

Join Date: Jun 2010
Posts: 8
Default 1000 genomes - a beginner question

Hello,

First, thank you for your time!

I work on yeast NGS. I like to run an analysis based on the 1000 genomes.
What kind of computer/facility I need?

I would like to be able to analyze reads that map to specific 20kb genomic region. I would like to find all reads that at least one of the pair maps this 20kb region and to be able to reassemble them.

Also, if someone can point me to a practical to work with the 1000 genome (what you need to have before you start) it would be great.

Thanks!
EHC
EHC is offline   Reply With Quote
Old 05-24-2011, 02:45 AM   #2
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

The sort of compute facility you need very much depends on what you want to do.

Getting the variants which have been discovered in a particular 20KB region is relatively easy and is doable in a limited disc space (say 500GBytes)

To get the alignments from the bam files for such a region is going to take a lot longer as you have to use samtools to download over ftp your particular 20KB region for more than 1000 individuals and this is both going to time time a disk, I would want to have at least 5TBytes available for your analysis

If you also want all the unmapped reads for the 1094 low coverage individuals you will need 5Tbytes of disc for those files alone before you are getting your subsections for your 20KB region of the genome

Then depending on what sort of assembly tools you wish to use you may need a combination of a lot of compute nodes and at least one machine with a lot of memory

Can you give us details of what you are looking for in these assemblies?
laura is offline   Reply With Quote
Old 05-25-2011, 12:26 PM   #3
EHC
Junior Member
 
Location: here

Join Date: Jun 2010
Posts: 8
Default

Thank you Laura. I am interested in reads that do not behave as expected (mainly broken reads). Is there any easy way to get these in a specific region.
EHC is offline   Reply With Quote
Old 05-26-2011, 12:51 AM   #4
laura
Senior Member
 
Location: Cambridge UK

Join Date: Sep 2008
Posts: 151
Default

Our FAQ describes how to get subsections of files http://www.1000genomes.org/faq/how-d...ction-bam-file
laura is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO