Skip to content

Beacon - Discovery Services for Genomic Data

Beacon IconThe Beacon protocol defines an open standard for genomics data discovery by the Global Alliance for Genomics & Health GA4GH with technical implementation through the ELIXIR Beacon project. Since 2015 the Theoretical Cytogenetics and Oncogenomics Group at the University of Zurich has contributed to Beacon development, partially with the Beacon+ demonstrator, to show current functionality and test future Beacon protocol extensions. The Beacon+ as well as the Progenetix and cancercelllines.org websites run on top of the open source bycon stack which represent a full Beacon implementation.

Technical Documentation

An increasing amount of documentation relevant to the Progenetix API can be found in those locations:

BeaconPlus Data / Query Model

The Progenetix / Beaconplus query model utilises the Beacon core data model for genomic and (biomedical, procedural) queries and data delivery. The model uses an object hierarchy, consisting of

  • variant (a.k.a. genomicVariation)
    • a single molecular observation, e.g. a genomic variant observed in the analysis of the DNA from a biosample
    • mostly corresponding to the "allele" concept, but with alternate use similar to that in VCF (e.g. CNV are no typical "allelic variants")
    • in Progenetix identical variants from different sampleas are identified through a compact digest (variantInternalId) and can be used to retrieve those distinct variants (c.f. "line in VCF")
  • analysis
    • the entirety of all variants, observed in a single experiment on a single sample
    • the result of an analysis represents a callset , comparable to a data column in a VCF variant annotation file
    • callset has an optional position in the object hierarchy, since the variants themselves describe biological observations in a biosample
  • biosample
    • a reference to a physical biological specimen on which analyses are performed
  • individual
    • in a typical use a human subject from which the biosample(s) was/were extracted

The bycon framework implemented for Progenetix and related collections such as cancercelllines.org implements these core entities as data collections in a MongoDB database.

BeaconPlus Extensions of the Beacon API

The Progenetix Beacon API implements the Beacon framework and Beacon v2 default model with some extended functionality - e.g.

  • limited support for Boolean filter use (i.e. ability to force an override of the general AND with a general &filterLogic=OR option)
  • experimental support of a /phenopackets entity type & &requestedSchema=phenopacket output option
  • additional service endpoints, e.g. for biosamples or individuals
  • geoqueries using $geoNear parameters or city matches

filters Filters / Filtering Terms

Besides variant parameters the Beacon protocol defines filters as (self-)scoped query parameters, e.fg. for phenotypes, diseases, biomedical performance or technical entities. Most of the filter options are based on ontology terms or identifiers in CURIE format (e.g. NCIT:C4033, cellosaurus:CVCL_0030 or PMID:16004614). For use case examples please look below; documentation of available ontologies and how to find out about available terms can be found on the Classifications and Ontologies page. Please see Beacon's Filters documentation for more information, e.g. about OntologyFilter, AlphanumericFilter, CustomFilter types.

The Progenetix query filter system adopts a hierarchical logic for filtering terms. However, the includeDescendantTerms pragma can be used to modify this behaviour. Examples for codes with hierarchical treatment within the filter space are:

  • NCIt
    • true, deep hierarchical ontology of cancer classifications
  • Cellosaurus
    • derived cell lines are also accessible through the code of their parental line
Example
"filters": [
    {"id": "NCIT:C4536", "includeDescendantTerms": false}
],

Beacon-style JSON responses

The Progenetix resource's API utilizes the bycon framework for implementation of the Beacon v2 API. The standard format for JSON responses corresponds to a generic Beacon v2 response. Depending on the endpoint, the main data will be a list of objects either inside response.results or (mostly) in response.resultSets[...].results. Additionally, most API responses provide access to data using handover objects.

Example responses can be genrated through the path examples below. Please be aware that Beacon responses use camelCased parameter names.

Beacon v2: Path Examples in Progenetix

The Beacon v2 protocol uses a REST path structure for consistant data access. The bycon project implements an expanding set of those Beacon v2 paths for the cancercelllines.org resource.


Base /

The root path provides the standard BeaconInfoResponse.


Base /filtering_terms

/filtering_terms/
/filtering_terms/ + query

Base /biosamples

/biosamples/ + query
/biosamples/{id}/
/biosamples/{id}/g_variants/
  • /biosamples/pgxbs-kftva5c9/g_variants/
    • retrieval of all variants from a single biosample
    • currently - and especially since for a mostly CNV containing resource - variants means "variant instances" (or as in the early v2 draft variantsInSample)
/biosamples/{id}/analyses/

Base /individuals

/individuals/ + query
  • /individuals/?filters=NCIT:C7541
    • this example retrieves all individuals having an annotation associated with NCIT:C7541 (retinoblastoma)
    • in Progenetix, this particular code will be part of the annotation for the biosample(s) associated with the returned individual
  • /individuals/?filters=PATO:0020001,NCIT:C9291
    • this query returns information about individuals with an anal carcinoma (NCIT:C9291) and a known male genotypic sex (PATO:0020001)
    • in Progenetix, the information about its sex is associated with the Individual object (stored in individuals), whereas the cancer type is a property of the Biosample. However, cross entity queries are supported through full aggregation across the different entities.
/individuals/{id}/
/individuals/{id}/genomicVariations/
  • /individuals/pgxind-kftx25hb/genomicVariations/
    • retrieval of all variants from a single individual
    • currently - and especially since for a mostly CNV containing resource - variants means "variant instances" (or as in the early v2 draft variantsInSample)

Base /genomicVariations

There is currently (April 2021) still some discussion about the implementation and naming of the different types of genomic variant endpoints. Since the Progenetix collections follow a "variant observations" principle all variant requests are directed against the local variants collection.

If using g_variants or variants_in_sample, those will be treated as aliases.

/genomicVariations/ + query
/genomicVariations/{id}/ or /g_variants/{id}/
/genomicVariations/{id}/biosamples/

Base /analyses

The Beacon v2 /analyses endpoint accesses the information about the genomic variants derived from a single analysis. In Progenetix the main use of these documents is the storage of e.g. CNV statistics or binned genome calls.

/analyses/ + query

bycon Beacon Server

The bycon project provides a combination of a Beacon-protocol based API with additional API services, used as backend and middleware for the Progenetix resource.

bycon has been developed to support Beacon protocol development following earlier implementations of Beacon+ ("beaconPlus") with now deprected Perl libraries. The work tightly integrates with the ELIXIR Beacon project.

bycon has its own documentation at bycon.progenetix.org.