NCBI » GEO » Info » Querying GEO DataSets and GEO ProfilesLogin

Querying GEO DataSets and GEO Profiles

Quick examples

This database stores original submitter-supplied study descriptions, as well as curated gene expression DataSets. DataSets form the basis of GEO's advanced data display and analysis tools, including gene expression profile charts and clusters.

Search Examples:

Search by... Search text
Free text smoking cancer
Keywords and species (smok* OR diet) AND (mammals[organism] NOT human[organism])
Studies in the NIH Roadmap Epigenomics project "roadmap epigenomics"[Project]
Study type "expression profiling by high throughput sequencing"[DataSet Type]
Studies with between 100 and 500 samples 100:500[Number of Samples]
Studies with CEL files "cel"[Supplementary Files]
DataSets that have 'age' as an experimental variable "age"[Subset Variable Type]
Author smith a[Author]
Published between January and June 2007 2007/01:2007/06[Publication Date]
Platform accession GPL570
Studies with PubMed identifiers "gds pubmed"[Filter]

This database stores individual gene expression profiles from curated DataSets. Search for profiles of interest based on gene annotation or pre-computed profile characteristics.

Search Examples:

Search by... Search text
Free text smoking P450
Gene symbol CYP1A1[Gene Symbol]
Gene symbols in DataSets that contain specific keywords (CYP1A1[Gene Symbol] OR ME1[Gene Symbol]) AND (smok* OR diet)
Partial gene name in a specific DataSet kinase[Gene Description] AND GDS182
GenBank accession NM_014033
Gene Ontology(GO) term in a specific DataSet apoptosis[Gene Ontology] AND GDS182
Chromosome region and species (8[Chromosome] AND 10000:3000000[Base Position]) AND mouse[organism]
Genes that show subset effects in DataSets that examine the effect of an agent agent[Flag Information] AND "value subset effect"[Flag Type]
Platform accession GPL570

How to construct queries

GEO DataSets and GEO Profiles are part of NCBI's network of Entrez databases. As with these other databases, data of interest may be located simply by entering keywords into the GEO DataSets or GEO Profiles search boxes. The Advanced Search and Limits pages, linked at the head of the GEO DataSets and GEO Profiles pages, assist greatly in the construction of complex queries. To construct a complex query, specify the search terms, their fields, and the Boolean operations to perform on the terms using the following syntax: term [field] OPERATOR term [field] where term is the search terms, field is the search field, and OPERATOR is the Boolean operator ('AND', 'OR', 'NOT' must be capitalized).
Additional query construction notes and features are provided in the following table:

Notes and featuresExample
Complete listings and descriptions of all supported fields are provided in the tables below. a search example for each field is provided within the tables
Fields may be specified either by their full name or an alias. Full names and aliases are listed in the tables below.gds[Entry Type] and gds[ETYP] perform the same search
Some fields have a fixed list of allowed search terms, others are free text. The tables below indicate which fields have fixed lists. Lists of allowed terms may be browsed on the Advanced Search page by selecting the relevant field from the drop-down menu and clicking 'Show Index'. 'age' is a fixed term for the Subset Variable Type field
age[Subset Variable Type]
Use quotes to indicate a phrase.salt stress
retrieves studies that mention both salt and stress anywhere in the description, whereas
"salt stress"
retrieves studies where the words exist as a phrase
Use parentheses to properly combine multiple search criteria. The terms inside the parentheses are processed as a unit and then incorporated into the overall search. human[organism] AND (smok* OR diet)
specifically retrieves human studies that mention either smoking or diet, whereas
human[organism] AND smok* OR diet
also returns all studies that mention diet, regardless of organism
Use an asterisk to expand your search with a wildcard. Wildcards can be placed at the beginning or end of a text string, but not in the middle. smok* will retrieve documents that contain words like smoke, smoking or smoker
Use a colon to indicate a range. 2007/01:2007/06[Publication Date]
retrieves studies published between January and June 2007
Use the 'History' section at the foot of the Advanced Search pages to combine previous queries or find the intersection of multiple queries. Each query you have performed recently is assigned a specific number which can be included within the search statement. #1 NOT #2
(#1 OR #2) AND human[organism]

Query fields and examples

Field full nameField aliasesDescriptionSearch term values and rulesExample
All Fields ALL, *All terms from all searchable fields. Default field.free text, wildcard (*) supported Find any record that contains the word 'cancer'
cancer[All fields]
Author AUTH, AU, AUTHOR NAMEContributors or authors associated with the studyfree text, wildcard (*) supported, author initials are optionalFind records authored by A Smith
smith a[Author]
DataSet Type GTYP, gdsTypeDataSet or Series type fixed list, check Advanced Search page for list of indexed termsFind all studies that examine gene expression by high throughput sequencing
expression profiling by high throughput sequencing[DataSet Type]
Description DESC, DSC, DESCRText provided in the DataSet, Series or Sample description, summary and other metadata fieldsfree text, wildcard (*) supported Find studies that contain smoking-related terms in their descriptions
smok*[DESC]
Entry Type ETYP, entryTypeRecord typefixed list, use gds (DataSet), gse (Series) or gpl (Platform)Find only DataSet records
gds[Entry Type]
Filter FILT, FLTR, SUBSET, SB, FILFilters for records that have links to other NCBI databases fixed list, check Advanced Search page for list of indexed termsFind records that have PubMed links
gds pubmed[Filter]
GEO Accession ACCN, accessionGEO accession numbervalid DataSet (GDS), Platform (GPL), Sample (GSM) or Series (GSE) accessionFind all studies performed on Platform GPL570
GPL570[GEO Accession]
MeSH Terms MESH, MH, SUBH, SH, SubheadingMedical Subject Headings (MeSH) termsMedical Subject Headings (MeSH) terms, wildcard (*) supported Find records that have MeSH term methylation
methylation[MeSH Terms]
Number of Platform Probes NPRO, n_probesNumber of Platform probe IDsinteger, range function supportedFind Platforms that have over 1 million probes
1000000:100000000[Number of Platform Probes]
Number of Samples NSAM, n_samplesNumber of Samples in the DataSet or Seriesinteger, range function supportedFind studies with between 100 and 500 samples
100:500[Number of Samples]
Organism ORGN, PORGN, primary organismName of the organismNCBI taxonomy terms, wildcard (*) supported, all levels in the taxonomy lineage and common names are indexedFind studies performed on mouse
Mus musculus[Organism]
Platform Technology Type PTYP, ptechTypePlatform typefixed list, check Advanced Search page for list of indexed termsFind all studies performed with next-generation sequencing technology
high throughput sequencing[Platform Technology Type]
Project PROJFeatured project datafixed list, use roadmap epigenomics, encode, pilot encode, or modencodeFind studies in the NIH Roadmap Epigenomics project
roadmap epigenomics[Project]
Publication Date PDAT, DPDate on which record was releasedformat YYYY/MM, range function supportedFind studies published between January and June 2007
2007/01:2007/06[Publication Date]
Related Platform RGPL, relatedGPLRetrieves the Plaform(s) for a specified DataSet or Seriesvalid DataSet (GDS) or Series (GSE) accessionFind Platforms related to GSE22474
GSE22474[Related Platform]
Related Series RGSE, relatedGSERetrieves the Series for a specified DataSet or Platformvalid DataSet (GDS) or Platform (GPL) accessionFind Series related to GPL570
GPL570[Related Series]
Reporter Identifier GEID, seqacc, clone, orf, unigene, Gene IdentifierName or identifier of Platform probe; pertains only to Platforms that have been subjected to re-annotation pipelinefree text, wildcard (*) supported Find DataSets that include a probe corresponding to Arg1
Arg1[Reporter Identifier]
Sample Source SRC, sourceThe source of the biological material of the Sample; warning: submitter-supplied field, not curatedfree text, wildcard (*) supported Find studies with samples from brain
brain[Sample Source]
Sample Type STYP, sampTypeSample type or moleculefixed list, check Advanced Search page for list of indexed termsFind studies that use protein samples
protein[Sample Type]
Sample Value Type VTYP, valTypeSample value type; pertains only to curated DataSetsfixed list, check Advanced Search page for list of indexed termsFind DataSets with log ratio sample values
log ratio[Sample Value Type]
Submitter Institute INST, instituteInstitute or organization as given in submitter accountfree textFind data submitted by the Broad Institute
Broad Institute[Institute]
Subset Description SSDE, SSDESCDataSet subset descriptionsfree text, wildcard (*) supported Find DataSets that include the term 'male' in subset description
male[Subset Description]
Subset Variable Type SSTP, SSTYPEName of DataSet experimental variablefixed list, check Advanced Search page for list of indexed termsFind DataSets that have 'age' as an experimental variable
age[Subset Variable Type]
Supplementary Files SFIL, SFILE, suppFileSupplementary file type namesfree text, wildcard (*) supported Find studies that have Affymetrix CEL files
cel[Supplementary Files]
Tag Length TAGL, taglengthSAGE or MPSS tag length in base pairsintegerFind 10 base pair SAGE data
10[Tag Length]
Title TITL, TITLE, TIText from titles of DataSets, Series, Platforms, and Samplesfree text, wildcard (*) supported Find records where 'Affymetrix' appears in a title
Affymetrix[Title]
Update Date UDATDate on which record was last updatedformat YYYY/MM, range function supportedFind records updated during June 2010
2010/06[Update Date]
Field full nameField aliasesDescriptionSearch term values and rulesExample
All Fields ALL, *All terms from all searchable fields. Default field.free text, wildcard (*) supported Find P450 genes in DataSets that investigate smoking
smok* AND P450
Annotation Type ATYP, annot_typeSource of annotationfixed list, use gene, nucleotide, unigene or proteinFind profiles with Gene-based annotation
gene[Annotation Type]
Base Position CPOS, CPOSITION, CHRPOSBase pair position on chromosomeinteger, range function supported, must be used in conjuction with Chromosome fieldFind profiles that lie between base positions 10000 to 3000000 on chromosome 8 in mouse
(8[Chromosome] AND 10000:3000000[Base Position]) AND mouse[organism]
Chromosome CHR, CHROMOSOME, CH, CHROMChromosome number or namechromosome number or nameFind profiles that lie between base positions 10000 to 3000000 on chromosome 8 in mouse
(8[Chromosome] AND 10000:3000000[Base Position]) AND mouse[organism]
DataSet Type GTYP, gdsTypeDataSet typefixed list, check Advanced Search page for list of indexed termsFind MPSS profiles
expression profiling by mpss[DataSet Type]
Filter FILT, FLTR, SUBSET, SB, FILFilters for records that have links to other NCBI databasesfixed list, check Advanced Search Preview/Index page for list of indexed termsFind profiles that have links to NCBI's Gene database
geo gene[Filter]
Flag Information FINF, FLAG_INFO, NOTEProfiles of specific subset types and for which a subset effect is found. GEO DataSets are partitioned into subsets that reflect experimental design. Profiles are flagged as having subset effects if they display differential expression across experimental variables. CAUTION: The subset effect scoring method is ad hoc, taking into account group medians, means, deviation inside the groups, penalties and arbitrary cutoff thresholds. This flag is simply an attempt to give potentially differentially-regulated genes higher visibility, and is not intended to provide an absolute determination of significance.fixed list, check Advanced Search page for list of indexed termsFind profiles that exhibit subset effects with respect to age or development stage
age[Flag Information] OR development stage[Flag Information]
Flag Type FTYP, FLAG_TYPEProfiles that exhibit specific types of subset effects. GEO DataSets are partitioned into subsets that reflect experimental design. Profiles are flagged as having subset effects if they display differential expression across experimental variables. CAUTION: The subset effect scoring method is ad hoc, taking into account group medians, means, deviation inside the groups, penalties and arbitrary cutoff thresholds. This flag is simply an attempt to give potentially differentially-regulated genes higher visibility, and is not intended to provide an absolute determination of significance.fixed list, check Advanced Search for list of indexed termsFind profiles that exhibit rank subset effects
rank subset effect[Flag Type]
GDS Text GDST, GDStxtText from DataSet title and summaryfree text, wildcard (*) supported Find profiles for Datasets that investigate muscular dystrophy
muscular dystrophy[GDS Text]
GEO Accession ACCN, accessionGEO accession numbervalid DataSet (GDS), Platform (GPL), Sample (GSM) or Series (GSE) accessionFind profiles for Platform GPL570
GPL570[GEO Accession]
GEO Description/Title Text GEOT, TI, GEOtxtText provided in the DataSet or Series description, title and other metadata fieldsfree text, wildcard (*) supported Find profiles from studies that examine aspirin
aspirin[GEO Description/Title Text]
GI GIMapped GenBank IdentifierintegerFind profiles for GenBank Identifier 89145416
89145416[GI]
Gene Description GDSC, GEND, aliases, GENE, GeneDescGene description and aliases from Gene, title from UniGene. free text, wildcard (*) supported Find kinase genes in GDS182
kinase[Gene Description] AND GDS182
Gene Ontology GOGene Ontology termsGene Ontology (GO) terms, wildcard (*) supported Find apoptosis genes in GDS182
apoptosis[Gene Ontology] AND GDS182
Gene Symbol SYMB, GeneSymbolGene Symbol from Gene or UniGenefree text, wildcard (*) supported Find CYP1A1 gene
CYP1A1[Gene Symbol]
ID_REF ID, ID_REFID from GEO Platform, SAGE tag, Affy ProbeSet IDfree text, wildcard (*) supported Find profiles for Affymetrix probeset ID 218973_at
218973_at[ID_REF]
Max Value Rank RMAX, RNKMXThe maximum value percentile rank for any Sample within DataSetinteger, 0-100, range function supportedFind profiles where the maximum rank percentile is in the 1st percentile (ie, genes with low expression)
1[Max Value Rank]
Min Value Rank RMIN, RNKMNThe minimum value percentile rank for any Sample within DataSetinteger, 0-100, range function supportedFind profiles where the minimum rank percentile is in the 100th percentile (ie, highly expressed genes)
100[Min Value Rank]
Number of Samples NSAM, n_samplesNumber of Samples in the DataSetinteger, range function supportedFind profiles with between 100 and 200 samples
100:200[Number of Samples]
Organism ORGNName of the organismNCBI taxonomy terms, wildcard (*) supported, all levels in the taxonomy lineage and common names are indexedFind mouse profiles
Mus musculus[Organism]
Platform Reporter Type RTYP, rep_typePlatform reporter type used for annotationfixed list, check Advanced Search page for list of indexed termsFind profiles where a CLONE ID is the basis for annotation
Mus musculus[Organism]
Ranked Standard Deviation RSTD, RNSTDPercentile rank of profile standard deviation compared to all other profiles in a DataSetinteger, 0-100, range function supportedFind profiles with a high level of standard deviation
100[Ranked Standard Deviation]
Reporter Identifier NAME, identifier, Gene IdentifierName or identifier of Platform probefree text, wildcard (*) supported Find profiles that include a probe corresponding to Arg1
D00636[Reporter Identifier]
Sample Source SRC, sourceThe source of the biological material of the Sample; warning: submitter-supplied field, not curatedfree text, wildcard (*) supported Find profiles with samples from brain
brain[Sample Source]
Sample Value Type VTYP, value_typeSample value typefixed list, check Advanced Search page for list of indexed termsFind profiles with log ratio sample values
log ratio[Sample Value Type]
Last modified: July 16, 2024