PredictionTuple

class PredictionTuple(source_prefix: str, source_id: str, source_name: str, relation: str, target_prefix: str, target_identifier: str, target_name: str, type: str, confidence: float, source: str)[source]

Bases: NamedTuple

A named tuple class for predictions.

Create new instance of PredictionTuple(source_prefix, source_id, source_name, relation, target_prefix, target_identifier, target_name, type, confidence, source)

Attributes Summary

`confidence`	An assessment of the confidence of the mapping, reported by the method used to generate it.
`relation`	Alias for field number 3
`source`	The script or process that generated this mapping.
`source_curie`	Concatenate the source prefix and ID to a CURIE.
`source_id`	Alias for field number 1
`source_name`	Alias for field number 2
`source_prefix`	Alias for field number 0
`target_curie`	Concatenate the target prefix and ID to a CURIE.
`target_identifier`	Alias for field number 5
`target_name`	Alias for field number 6
`target_prefix`	Alias for field number 4
`type`	A semapv term describing the mapping type.

Methods Summary

`as_dict`()	Get the prediction tuple as a dictionary.
`from_dict`(mapping)	Get the prediction tuple from a dictionary.
`from_semra`(mapping, confidence)	Instantiate from a SeMRA mapping.

Attributes Documentation

confidence: float

An assessment of the confidence of the mapping, reported by the method used to generate it.

This means that confidence values aren’t generally comparable, though they should follow the rough standard that closer to 1 is more confident and closer to 0 is less confident.

Most of the lexical mappings already in Biomappings were generated with Gilda. Depending on the script, the score therefore refers to either:

The Gilda match score, inspired by https://aclanthology.org/W15-3801/. Section 5.2 of the supplementary material for the Gilda paper describes this score in detail, where 1.0 is best and 0 is worst. https://github.com/biopragmatics/biomappings/blob/master/scripts/generate_agrovoc_mappings.py is an example that uses this variant.
A high-level estimation of the precision of the scores generated by the given script. For example, the CL-MeSH mappings were estimated to be 90% correct, so all the mappings generated by https://github.com/biopragmatics/biomappings/blob/master/scripts/generate_cl_mesh_mappings.py are marked with 0.9 as its score.

However, other variants are possible. For example, this confidence could reflect the loss function if a knowledge graph embedding model was used ot generate a mapping orediction.

relation: str: Alias for field number 3

source: str

The script or process that generated this mapping.

Most of these scripts are in https://github.com/biopragmatics/biomappings/tree/master/scripts, or can be based off of them.

source_curie: Concatenate the source prefix and ID to a CURIE.

source_id: str: Alias for field number 1

source_name: str: Alias for field number 2

source_prefix: str: Alias for field number 0

target_curie: Concatenate the target prefix and ID to a CURIE.

target_identifier: str: Alias for field number 5

target_name: str: Alias for field number 6

target_prefix: str: Alias for field number 4

type: str

A semapv term describing the mapping type.

These are relatively high level, and can be any child of semapv:Matching, including:

semapv:LexicalMatching
semapv:LogicalReasoning

Methods Documentation

as_dict() → Mapping[str, Any][source]: Get the prediction tuple as a dictionary.

classmethod from_dict(mapping: Mapping[str, str]) → PredictionTuple[source]: Get the prediction tuple from a dictionary.

classmethod from_semra(mapping, confidence) → PredictionTuple[source]: Instantiate from a SeMRA mapping.