Seqanswers Leaderboard Ad

**nickloman** · 07-19-2010, 09:31 AM

One way might be to use Amazon EC2 to do this. You would create an Amazon EC2 instance, for example with a Ubuntu image, and then access the 1000 genomes data which is apparently available through S3.

See also this thread

1000 Genomes in the Amazon cloud? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4874

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

There might be other, easier ways of doing it .. but this is one method of avoiding downloading the data locally.

**krobison** · 07-19-2010, 09:47 AM

samtools can access the 1000 genomes BAM files on their websites; it will download the index file for each alignment you access but not the entire alignment.

There are various wrappers for samtools & I don't know if this will work in them. It definitely works at the command line & in the current version of pysam (Python binding) with a few small mods.

**culmen** · 07-19-2010, 01:21 PM

Thanks a lot nickloman and Robison for your help.

samtools can access the 1000 genomes BAM files on their websites; it will download the index file for each alignment you access but not the entire alignment.
--krobison

The alignment in the BAM file shows the reads alignment to reference sequence. Is there any way that I could get the consensus of that particular part (as shown in the ensembl browser of 1000genomes data with NA19238 selected) of each genome in 1000genomes data.

Are there any tools to blast each genome sequence of 1000genomes data (without downloading data) with a query sequence (primer)?

Thanks a lot,
Culmen

**culmen** · 07-20-2010, 06:08 AM

Basically I am looking for all the SNPs in the region of a STR (ex: [TCTA]8 whose marker D6S502 ) with 1000bp flanks on either streams. (from all 1000 genomes).

So I thought it would be great if I could extract that particular regions ( 1kbp < STR > 1kbp ) from all the 1000 genomes.

Expecting this table as a result of my data extraction.

Appreciate any kind of help or suggestion,
Culmen

**laura** · 08-09-2010, 02:16 AM

If what you are after is variant calls then you are better looking at the results in their july release of data

ftp://ftp.1000genomes.ebi.ac.uk/vol1...010_07_release

You can even download subsets of snps in vcf format using tabix

tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz 1:233411980:245804116

You can get tabix from the samtools website

SAM tools - Browse Files at SourceForge.net

https://sourceforge.net/projects/samtools/files/

SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on…

and then vcftools are a set of perl and c++ scripts/programs for handling the vcftools

VCFtools

http://vcftools.sourceforge.net/

**tumorim** · 08-09-2010, 12:52 PM

download the three 1000G files from http://www.openbioinformatics.org/an..._download.html.

Then just do

perl -ne 'm/(\d+)\t(\d+)/ and $1 eq "8" and $2>=125975261 and $2<=125977441 and print' < hg18_CEU.sites.2010_03.txt

You'll get all variants in CEU population. Do the same for YRI/ASN.

**KevinLam** · 08-09-2010, 11:16 PM

caveat: I haven't done this yet so I might be way wrong.
but since you only have 'variant data' for a stretch of 2kb.
why not upload your bam / wig file up to ucsc instead?
2 kbase sounds quite manageable.

**culmen** · 08-10-2010, 06:56 AM

Thanks a lot for your suggestions guys.

@laura: Thanks I am following similar steps.

@tumorim: ANNOVAR looks cool. Thanks for letting me know about it.

@KevinLam: Thats a good idea. I would have tried UCSC, but I have more than 13 x (1000 files of 2kbps).

**genesquared** · 11-05-2012, 11:57 PM

any update on this method in 2012?

since the recent 1000 genome Nature paper (Nov 1, 2012 ), is there any update on how to download a 2+kb segment?

thanks in advance!

**KevinLam** · 11-06-2012, 07:10 AM

Originally posted by genesquared View Post

since the recent 1000 genome Nature paper (Nov 1, 2012 ), is there any update on how to download a 2+kb segment?

thanks in advance!

hmm is your problem related to the thread starter's?

else you could possibly see if galaxy already has the data else upload via the ftp link then extract the portion you want via the UCSC link on the data?

this way you won't have to 'download' all the info .. but the 1kg info is on galaxy

**laura** · 11-07-2012, 11:23 AM

Originally posted by genesquared View Post

since the recent 1000 genome Nature paper (Nov 1, 2012 ), is there any update on how to download a 2+kb segment?

thanks in advance!

Like I told the previous poster the best way to do this is to use samtools or tabix

There is much more info about this in our faq

1000genomes.org - 1000genomes Resources and Information.

http://www.1000genomes.org/faq/how-do-i-get-sub-section-vcf-file

1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

small part of all 1000 sequences from 1000genomes data needed?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News