Using the SRAdb R package, I am looking to query individuals who possess both WGS and RNA-Seq data on the Sequence Read Archive. So far I believe I have been able to query specific samples with both WGS and RNA-Seq experiments, however a specific sample, provided by the sample_accession, is not exactly what I'm looking for.
For example: I'm interested in finding human data in which one human was used to derive WGS data, and several tissues from that same human were used to derive RNA-Seq data.
Is there any easy way to accomplish this using SQL and/or R?
So far, this is my approach to identify samples that have both WGS and RNA-Seq data, using the local database, SRAmetadb.sqlite:
With this approach, I can see query contains 181 human samples that have at least one WGS experiment and one RNA-Seq experiment. That's great, but I feel I must be missing out on data that came from different samples but the same individuals.
Of course, I am not only interested in querying data from the SRA, if other databases are more appropriate for this task
For example: I'm interested in finding human data in which one human was used to derive WGS data, and several tissues from that same human were used to derive RNA-Seq data.
Is there any easy way to accomplish this using SQL and/or R?
So far, this is my approach to identify samples that have both WGS and RNA-Seq data, using the local database, SRAmetadb.sqlite:
Code:
con <- dbConnect(SQLite(),'SRAmetadb.sqlite') query <- dbGetQuery(con, paste( "SELECT sample.sample_accession, sample.scientific_name, experiment.experiment_accession, experiment.library_strategy FROM sample JOIN experiment ON sample.sample_accession = experiment.sample_accession WHERE experiment.library_strategy in ('WGS','RNA-Seq') GROUP by sample.sample_accession HAVING COUNT(DISTINCT experiment.library_strategy) = 2" ) )
Of course, I am not only interested in querying data from the SRA, if other databases are more appropriate for this task