Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
TCGA mRNA data zhaopeihua Bioinformatics 2 03-27-2014 11:03 AM
where I can do my analysis as before in TCGA data browser tujchl Bioinformatics 1 03-27-2014 10:56 AM
TCGA data access mathew Bioinformatics 1 03-27-2014 10:51 AM
Redundancy in TCGA data dsmarcoantonio Bioinformatics 1 04-06-2013 09:33 AM
cghub ? Richard Finney Bioinformatics 0 03-26-2013 12:03 PM

Thread Tools
Old 08-26-2014, 04:06 PM   #1
Location: San Francisco, CA

Join Date: Aug 2011
Posts: 91
Default How to find TCGA data in cghub?

We have gotten access approval for some TCGA data, but how do I find them? I have GeneTorrent (gtdownload and cgquery) installed, but it seems mightily difficult to find anything I'm looking for.

For instance, I'm trying to download a couple of data sets from TCGA's lung adenocarcinoma studies:

"The dbGAP accession number for the data reported in this paper is phs000488.v1.p1."

The dbGAP page can be found here:

cgquery "study=phs000488" returned zero result, as is the case for pretty much all the accession numbers I've found in any paper.

I downloaded some supplementary files from the article's website, but couldn't identify any of the Patient ID in the cghub's data manifest file, e.g.,

So....... does anyone know actually how to find the data set you're looking for?

lethalfang is offline   Reply With Quote
Old 08-26-2014, 05:35 PM   #2
Richard Finney
Senior Member
Location: bethesda

Join Date: Feb 2009
Posts: 700


Have you seen this ...
I usually don't like these JavaScript GUI click click click things, but this is not so bad.

You want the analysis_ids.

My script for this (which you must customize to your location ) is ..

#point to your executable and libraries for cghub client where you put them
export LD_LIBRARY_PATH=/data/data04/CG/cghub/lib/:/data/data04/CG/cghub/lib/GeneTorrent/:/h1/finneyr/xerces-c-3.0.1/src/.libs/:/h1/finneyr/XQilla-2.2.3/.libs/:LD_LIBRARY_PATH
export PATH=/data/data04/CG/cghub/bin/:$PATH:

function f
gtdownload -vv -c /h1/finneyr/finneyr.key -d $1
sleep 2;

# just add "f analysis_id"

f 038d680d-4a29-4be1-9568-72d80a52c782
f 059e80af-c614-4424-8075-d42f072705b2

You can grab info for BAMs for a project like this ...
function f
echo $1
cghub/bin/cgquery disease_abbr=$1 > cghub.$1.txt
if [ $n -eq 0 ]; then sleep 1; fi
sleep 2;


This creates reports for LIHC (liver) and LUAD (lung adeno).
You can parse out the analysis_ids.


(you need to log in)

For the sites , you can write a script to grab a listing of all files.

Last edited by Richard Finney; 08-26-2014 at 05:51 PM.
Richard Finney is offline   Reply With Quote
Old 11-22-2014, 03:25 AM   #3
Registered Vendor
Location: San Francisco, CA

Join Date: Mar 2014
Posts: 18


Yep, like Richard has pointed out, you need to get a hold of analysis IDs.

We typically use to search for the samples that we're interested in, and then link the samples to the patient & sample metadata available for the various cohorts here:

For what it's worth, Station X has spent a lot of timing organizing and curating the patient & sample metadata, and subsequently attaching to the various genomics assays generated by The Cancer Genome Atlas. This data is all prepped and ready for analysis in GenePool.

If you're interested, here are some related posts about it:

Good Luck!

GenePool is making genomics data management, analysis, and sharing easier!
Products @

Last edited by GenePool; 11-23-2014 at 09:25 PM.
GenePool is offline   Reply With Quote

cghub, dbgap, tcga

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:38 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO