SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Heatmap question lucer105 Bioinformatics 0 12-27-2013 09:01 PM
First Question njt638 General 1 08-24-2011 11:56 AM
Question cjose Illumina/Solexa 4 08-11-2011 05:31 AM
question? semna Bioinformatics 7 12-20-2010 04:30 AM
question? semna Bioinformatics 1 12-17-2010 01:54 AM

Reply
 
Thread Tools
Old 12-18-2014, 11:50 AM   #1
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default Inexperienced Question

Hello,

I have a project that requires access to an "online cancer related DNA database resource."

Unfortunately, it doesn't look like I can get access to CGHub. Would someone mind helping me? (i.e. perhaps suggest one that I can access)

Thank you!
cambridge101 is offline   Reply With Quote
Old 12-18-2014, 12:07 PM   #2
AntonioRFranco
Member
 
Location: Cordoba, Spain

Join Date: Feb 2013
Posts: 21
Default

www.cartagenia.com
You don't mention if it should be of free access or not
AntonioRFranco is offline   Reply With Quote
Old 12-18-2014, 01:06 PM   #3
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

AntonioRFranco - Thanks for the reply. Yes, free.
cambridge101 is offline   Reply With Quote
Old 12-18-2014, 01:43 PM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

http://www.cbioportal.org/public-portal/

There are at least a couple of companies that offer different views of the TCGA data via web. Search on SeqAnswers.
GenoMax is offline   Reply With Quote
Old 12-18-2014, 01:49 PM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

http://seqanswers.com/forums/showthread.php?t=48485
http://seqanswers.com/forums/showthread.php?t=40610
GenoMax is offline   Reply With Quote
Old 12-18-2014, 01:50 PM   #6
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

GenoMax - Thanks for the reply. I probably should have mentioned the before: select ten (10) DNA sequence strings of length at least 1Mb related to a cancer gene from ten different individuals. Make sure the sequence data is in FASTQ format and stored in one file “DNA.fas”.
cambridge101 is offline   Reply With Quote
Old 12-18-2014, 01:54 PM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

Now you say

You are not going to find DNA sequences in fastq format that are 1Mb in length unless you assemble them yourself preserving the Q-scores (assembly programs do not include Q-scores in final sequence).
GenoMax is offline   Reply With Quote
Old 12-18-2014, 01:57 PM   #8
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

I have a few days to work on this project. I'm lost.
cambridge101 is offline   Reply With Quote
Old 12-18-2014, 02:01 PM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

Tell us more about what the entire project is about. DNA sequence is just a part of it?
GenoMax is offline   Reply With Quote
Old 12-18-2014, 02:09 PM   #10
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

GenoMax - Thanks again for your suggestions and assistance. I'm just a little reluctant to say more about this project right now. I hope you understand.
cambridge101 is offline   Reply With Quote
Old 12-18-2014, 02:22 PM   #11
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

Though I don't feel I was doing anything unethical, others may disagree. Therefore, I deleted the complete requirements to avoid any conflict. Thank you.

Last edited by cambridge101; 12-19-2014 at 07:08 AM. Reason: Information not necessary.
cambridge101 is offline   Reply With Quote
Old 12-18-2014, 02:42 PM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

I think you should delete the description from the post above. This appears to be a class project/home work?

Perhaps you should ask whoever assigned the project if they are certain about the fastq requirement.
GenoMax is offline   Reply With Quote
Old 12-19-2014, 04:06 AM   #13
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

Thanks for your suggestion. I don't think I'll delete it[Did decide to delete]. I don't feel that I'm doing anything unethical. I hope that it's clear that I'm not asking for anyone to complete the project for me. I'm just having a hard time finding that FASTQ data I need.

Last edited by cambridge101; 12-19-2014 at 07:10 AM. Reason: Inconsistant with edit to earlier post.
cambridge101 is offline   Reply With Quote
Old 12-19-2014, 04:16 AM   #14
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

One place to look for long sequences (in fastq format) will be here: http://www.pacificbiosciences.com/ne.../publications/ I see only a couple of cancer related publication and they are from 2012 (when the reads were not as long as they are today).

If you drop the cancer requirement then you will get some really long reads here: http://blog.pacificbiosciences.com/2...d-shotgun.html They are not going to be 1Mb so you would still need to do some assembly.
GenoMax is offline   Reply With Quote
Old 12-19-2014, 04:34 AM   #15
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

Quote:
Originally Posted by GenoMax View Post
One place to look for long sequences (in fastq format) will be here: http://www.pacificbiosciences.com/ne.../publications/ I see only a couple of cancer related publication and they are from 2012 (when the reads were not as long as they are today).

If you drop the cancer requirement then you will get some really long reads here: http://blog.pacificbiosciences.com/2...d-shotgun.html They are not going to be 1Mb so you would still need to do some assembly.
Thanks! I'm not sure I can drop the cancer requirement. I checked: http://www.pacificbiosciences.com/ne.../publications/ I couldn't locate the data I need.

I'm open to other suggestions if anyone has any.
cambridge101 is offline   Reply With Quote
Old 12-19-2014, 07:41 PM   #16
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 496
Default

I only vaguely remember the details of the project from your deleted post, but if I were to guess what a reasonable assignment would be, it would be to select a cancer gene, then extract 1 Mb of sequence around the gene from ten different individual genomes, then analyze those 1 Mb regions for the various things asked for in the post.

You aren't going to find 1 Mb fastq reads, but you can find different individual genomes, or even different "cancer" genomes. You can definitely identify genes related to cancer. As others have said, I'd check back with the assigner of this project for clarification.

edit: I teach an upper level course in genomic methods and analysis, so am definitely curious what this assignment is about!
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 12-20-2014, 04:42 AM   #17
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

I like SNPsaurus' interpretation. 1 Mb (total amount of data or length of region covered) worth of fastq reads in and/or around a cancer gene makes sense.

@SNPsaurus: Main input for the assignment is still in post #6. The rest of the assignment was informatics goals.

This will entail a significant amount of work (data collection part) and I hope the assignment has an appropriate amount of credit (unless it is a PhD qualifier exam).
GenoMax is offline   Reply With Quote
Old 12-20-2014, 11:05 AM   #18
cambridge101
Member
 
Location: Boston

Join Date: Dec 2014
Posts: 10
Default

Alright... Let's say my oncogene of interest is in the region of 11:15000000..16000000. Therefore, all I need is that region from 10 different people.

Problem:
Where do I find that data???

Any assistance is appreciated.
cambridge101 is offline   Reply With Quote
Old 12-20-2014, 12:59 PM   #19
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,975
Default

Look for studies that have > 10 samples (since you need 10) or take 10 samples from different cancer types.

http://sra.dnanexus.com/?result_type...q=tumor+exome+

http://sra.dnanexus.com/?result_type...q=cancer+exome

Last edited by GenoMax; 12-20-2014 at 01:07 PM.
GenoMax is offline   Reply With Quote
Old 12-20-2014, 05:40 PM   #20
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 701
Default

The comments here have links to sequences for PUBLIC human cancers ...

http://www.homolog.us/blogs/blog/201...able-from-bgi/

BGI liver cancer
Seoul Genomic Medicine Institute lung cancer
Changhai Hospital prostate cancer
MD Andersen Asian Gastric cancer

I think the data is in NCBI's SRA

You'll need a lot of disk space and, if you're relatively new, a lot of patience.

Sadly, a "bam slicer" that cuts out the reads for a region isn't available; though they say some folks are working on it.
Richard Finney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO