SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TCGA mRNA data zhaopeihua Bioinformatics 2 03-27-2014 10:03 AM
where I can do my analysis as before in TCGA data browser tujchl Bioinformatics 1 03-27-2014 09:56 AM
TCGA data access mathew Bioinformatics 1 03-27-2014 09:51 AM
Redundancy in TCGA data dsmarcoantonio Bioinformatics 1 04-06-2013 08:33 AM
cghub ? Richard Finney Bioinformatics 0 03-26-2013 11:03 AM

Reply
 
Thread Tools
Old 08-26-2014, 03:06 PM   #1
lethalfang
Member
 
Location: San Francisco, CA

Join Date: Aug 2011
Posts: 91
Default How to find TCGA data in cghub?

We have gotten access approval for some TCGA data, but how do I find them? I have GeneTorrent (gtdownload and cgquery) installed, but it seems mightily difficult to find anything I'm looking for.

For instance, I'm trying to download a couple of data sets from TCGA's lung adenocarcinoma studies: http://www.sciencedirect.com/science...92867412010616

"The dbGAP accession number for the data reported in this paper is phs000488.v1.p1."

The dbGAP page can be found here: http://www.ncbi.nlm.nih.gov/projects...hs000488.v1.p1

cgquery "study=phs000488" returned zero result, as is the case for pretty much all the accession numbers I've found in any paper.

I downloaded some supplementary files from the article's website, but couldn't identify any of the Patient ID in the cghub's data manifest file, e.g., http://www.broadinstitute.org/pubs/l...UAD-5V8LT.html


So....... does anyone know actually how to find the data set you're looking for?

Thanks.
lethalfang is offline   Reply With Quote
Old 08-26-2014, 04:35 PM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

FOR BAM FILES ...

Have you seen this ...
https://browser.cghub.ucsc.edu/
I usually don't like these JavaScript GUI click click click things, but this is not so bad.

You want the analysis_ids.

My script for this (which you must customize to your location ) is ..

#point to your executable and libraries for cghub client where you put them
export LD_LIBRARY_PATH=/data/data04/CG/cghub/lib/:/data/data04/CG/cghub/lib/GeneTorrent/:/h1/finneyr/xerces-c-3.0.1/src/.libs/:/h1/finneyr/XQilla-2.2.3/.libs/:LD_LIBRARY_PATH
export PATH=/data/data04/CG/cghub/bin/:$PATH:

function f
{
gtdownload -vv -c /h1/finneyr/finneyr.key -d $1
sleep 2;
}

# just add "f analysis_id"

f 038d680d-4a29-4be1-9568-72d80a52c782
f 059e80af-c614-4424-8075-d42f072705b2



ALTERNATELY
You can grab info for BAMs for a project like this ...
function f
{
echo $1
cghub/bin/cgquery disease_abbr=$1 > cghub.$1.txt
n=$((i%5))
if [ $n -eq 0 ]; then sleep 1; fi
((i=i+1))
sleep 2;
}

f LIHC
f LUAD

This creates reports for LIHC (liver) and LUAD (lung adeno).
You can parse out the analysis_ids.



FOR OTHER STUFF ...

http://tcga-data.nci.nih.gov///tcgaf...ers/anonymous/

FOR PROTECTED OTHER STUFF ...
https://tcga-data-secure.nci.nih.gov...iles/tcga4yeo/
(you need to log in)

For the tcga-data.nci.nih.gov sites , you can write a script to grab a listing of all files.

Last edited by Richard Finney; 08-26-2014 at 04:51 PM.
Richard Finney is offline   Reply With Quote
Old 11-22-2014, 02:25 AM   #3
GenePool
Registered Vendor
 
Location: San Francisco, CA

Join Date: Mar 2014
Posts: 18
Default

Hi,

Yep, like Richard has pointed out, you need to get a hold of analysis IDs.

We typically use https://browser.cghub.ucsc.edu/ to search for the samples that we're interested in, and then link the samples to the patient & sample metadata available for the various cohorts here: https://tcga-data.nci.nih.gov/tcgafi...onymous/tumor/

For what it's worth, Station X has spent a lot of timing organizing and curating the patient & sample metadata, and subsequently attaching to the various genomics assays generated by The Cancer Genome Atlas. This data is all prepped and ready for analysis in GenePool.

If you're interested, here are some related posts about it:

http://seqanswers.com/forums/showthread.php?t=48485
http://seqanswers.com/forums/showthread.php?t=42471

Good Luck!

------------------------------
GenePool is making genomics data management, analysis, and sharing easier!
Products @ www.stationxinc.com

Last edited by GenePool; 11-23-2014 at 08:25 PM.
GenePool is offline   Reply With Quote
Reply

Tags
cghub, dbgap, tcga

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO