PredictionTuple

class PredictionTuple(source_prefix: str, source_id: str, source_name: str, relation: str, target_prefix: str, target_identifier: str, target_name: str, type: str, confidence: float, source: str)[source]

Bases: NamedTuple

A named tuple class for predictions.

Create new instance of PredictionTuple(source_prefix, source_id, source_name, relation, target_prefix, target_identifier, target_name, type, confidence, source)

Attributes Summary

confidence

An assessment of the confidence of the mapping, reported by the method used to generate it.

relation

Alias for field number 3

source

The script or process that generated this mapping.

source_curie

Concatenate the source prefix and ID to a CURIE.

source_id

Alias for field number 1

source_name

Alias for field number 2

source_prefix

Alias for field number 0

target_curie

Concatenate the target prefix and ID to a CURIE.

target_identifier

Alias for field number 5

target_name

Alias for field number 6

target_prefix

Alias for field number 4

type

A semapv term describing the mapping type.

Methods Summary

as_dict()

Get the prediction tuple as a dictionary.

from_dict(mapping)

Get the prediction tuple from a dictionary.

from_semra(mapping, confidence)

Instantiate from a SeMRA mapping.

Attributes Documentation

confidence: float

An assessment of the confidence of the mapping, reported by the method used to generate it.

This means that confidence values aren’t generally comparable, though they should follow the rough standard that closer to 1 is more confident and closer to 0 is less confident.

Most of the lexical mappings already in Biomappings were generated with Gilda. Depending on the script, the score therefore refers to either:

  1. The Gilda match score, inspired by https://aclanthology.org/W15-3801/. Section 5.2 of the supplementary material for the Gilda paper describes this score in detail, where 1.0 is best and 0 is worst. https://github.com/biopragmatics/biomappings/blob/master/scripts/generate_agrovoc_mappings.py is an example that uses this variant.

  2. A high-level estimation of the precision of the scores generated by the given script. For example, the CL-MeSH mappings were estimated to be 90% correct, so all the mappings generated by https://github.com/biopragmatics/biomappings/blob/master/scripts/generate_cl_mesh_mappings.py are marked with 0.9 as its score.

However, other variants are possible. For example, this confidence could reflect the loss function if a knowledge graph embedding model was used ot generate a mapping orediction.

relation: str

Alias for field number 3

source: str

The script or process that generated this mapping.

Most of these scripts are in https://github.com/biopragmatics/biomappings/tree/master/scripts, or can be based off of them.

source_curie

Concatenate the source prefix and ID to a CURIE.

source_id: str

Alias for field number 1

source_name: str

Alias for field number 2

source_prefix: str

Alias for field number 0

target_curie

Concatenate the target prefix and ID to a CURIE.

target_identifier: str

Alias for field number 5

target_name: str

Alias for field number 6

target_prefix: str

Alias for field number 4

type: str

A semapv term describing the mapping type.

These are relatively high level, and can be any child of semapv:Matching, including:

  1. semapv:LexicalMatching

  2. semapv:LogicalReasoning

Methods Documentation

as_dict() Mapping[str, Any][source]

Get the prediction tuple as a dictionary.

classmethod from_dict(mapping: Mapping[str, str]) PredictionTuple[source]

Get the prediction tuple from a dictionary.

classmethod from_semra(mapping, confidence) PredictionTuple[source]

Instantiate from a SeMRA mapping.