🦕Graph Search Algorithms
1. Centrality
1.1. Definition
Signature : centrality(*nodes, depth, abs_decay, rel_decay, *fielding)
The centrality
function is designed to analyze a network of nodes by evaluating their relative importance or 'centrality'. This parameter indicates its significance within the network, often based on how well-connected it is to other nodes.
The ultimate purpose is to use the centrality parameter for each node to refine the subset by removing less relevant nodes and enhancing the search for neighbors in the density_search
function, ensuring that the remaining nodes are meaningfully connected or correlated.
>>> nodes_centrality, neighbor_ratings = centrality(lst)
>>> nodes_centrality
{Node(es-n:Naturaleza(NA)): 0.37779086092115394,
Node(es-n:Arboleda(NA)): 1.0,
Node(es-n:Árbol(NA)): 0.7518212724623604,
Node(es-n:Augusto César(NA)): 0.3028169014084507,
Node(es-n:Coliseo(NA)): 0.1567522783761392,}
>>> neighbor_ratings
{Node(es-v:Preferir(NA)): 0.011119907640420357,
Node(es-j:Chabacano(NA)): 0.08668492703973746,
Node(es-v:Pasar por alto(NA)): 0.02263786224131273,
Node(es-j:Aceptable(NA)): 0.02263786224131273,
...
>>> nodes_centrality, neighbor_ratings = centrality(lst)
>>> nodes_centrality
{Node(es-n:Naturaleza(NA)): 0.16463022508038586,
Node(es-n:Arboleda(NA)): 0.5333333333333333,
Node(es-n:Augusto César(NA)): 1.0,
Node(es-n:Coliseo(NA)): 0.14431372549019608,
Node(es-n:Centurión(NA)): 0.5777777777777777}
>>> neighbor_ratings
{Node(es-v:Catapulta(NA)): 0.044992347867293765,
Node(es-j:Catástrofe(NA)): 0.044992347867293765,
Node(es-v:Fecha de muerte(NA)): 0.115984716399922335,
Node(es-j:Respetable(NA)): 0.04917500182722292,
...
>>> nodes_centrality, neighbor_ratings = centrality(lst)
>>> nodes_centrality
{Node(es-n:Naturaleza(NA)): 0.024008574490889605,
Node(es-n:Augusto César(NA)): 1.0,
Node(es-n:Coliseo(NA)): 0.1192156862745098,
Node(es-n:Centurión(NA)): 0.5777777777777777}
>>> neighbor_ratings
{Node(es-v:Decapitado(NA)): 0.059338375234523457,
Node(es-j:Trifulca(NA)): 0.08562785726285346,
Node(es-v:Vituperío(NA)): 0.01562067469764643,
Node(es-j:Ambrosía(NA)): 0.059338375234523457,
...
1.2. Parameters
nodes: entities or elements within the network for centrality analysis.
*: sets the fields that will be inspected during the neighbors collection.
depth (default=2): specifies level of exploration in network analysis; depth=1 means immediate neighbors.
soiling_weight: weighting factor adjusting the influence of revisitation frequency on node centrality, balancing connectivity-based and revisitation-based scores.
rel_decay (default=0.50): controls revisitation importance reduction with increasing depth.
1.3. Algorithm
Calculation of nodes_centrality
The algorithm essentially evaluates the importance of nodes in a network by considering two main factors: connectivity and revisitation frequency. For connectivity, it assigns scores to the neighbors of each targeted node based on how often they appear in the immediate vicinity of other nodes. The more intersections a neighbor has with different nodes, the higher its score. This score is then adjusted by taking its square root, amplifying the significance of nodes at busier intersections.
NEIGHBOR_FREQUENCY =
{key: value**2 for key, value in neighbor_frequency.items()}
# [...] Keep only neighbors shared by more than one node (akka intersections)
STANDARD_SCORING=
{node: sum(NEIGHBOR_FREQUENCY .get(neighbor, 0)
for neighbor in neighbors) for node, neighbors in node_neighbors.items()}
For revisitation, the algorithm looks at how often certain paths lead back to the original nodes beyond the first level of neighbors. It tracks this through multiple depths, assigning a soil_score
each time a path returns to an original node. The contribution of each return diminishes with depth, controlled by a rel_decay
parameter.
for neighbors in neighbors_by_depth.values():
standard_weight *= rel_decay
for node in nodes:
soil_scoring[node] += standard_weight * neighbors.count(node)
Finally, the algorithm combines these two scoring methods. The soil score is rescaled to be comparable with the standard connectivity-based score.
rescaled_soil_scoring =
{node: score * rf for node, score in soil_scoring.items()}
The final score of each node is then a weighted average of these two scores, with the weight given to the soil score adjustable through a soiling_weight
parameter. This allows for tuning the balance between valuing connectivity versus revisitation frequency.
for node, neighbors in node_neighbors.items():
score = standard_scoring[node] + soiling_weight * rescaled_soil_scoring[node]
centralities[node] = score / len(neighbors) if neighbors else 0
The final scores in this algorithm are normalized to a 0 to 1 scale, where 1 represents the highest relative score within the network, indicating the node's significance based on connectivity and revisitation, but not implying an absolute measure of characteristics or quality.
for node in centralities:
centralities[node] /= max_centrality
Calculation of neighbor_ratings
Up to this point, we have focused on calculating node centralities, but the function also returns a second output consisting of a dictionary that contains neighbors and a scaled rating representing their connection with the given nodes. Essentially, this serves as an intersection calculation of the original nodes, based on their final scores.
neighbor_ratings = {}
for node, neighbors in node_neighbors.items():
for neighbor in neighbors:
neighbor_ratings[neighbor] =
neighbor_ratings.setdefault(neighbor, 0) + centralities[node]
2. Density Search
2.1. Definition
Signature : density_search(interest_nodes, n_coincidences, *fielding)
The density search function is crafted to identify nodes in a network that have a high degree of relevance or 'density' in relation to a specified set of interest nodes. It works by analyzing the neighbors of these interest nodes and identifies those nodes (candidates) that have a significant number of coincidences (shared neighbors) with the interest nodes.
This function is particularly useful in networks where the goal is to find nodes that are not just connected but are densely connected with a certain group of nodes. By setting a threshold for the number of coincidences, users can fine-tune the density required for a node to be considered relevant.
>> lst = G.select(name=['Árbol','Arboleda','Naturaleza'])
>> density_search(lst,3)
NodeSet(size=26)
>> density_search(lst,2)
NodeSet(size=92)
>> density_search(lst,1)
NodeSet(size=424)
2.2. Parameters
nodes: entities or elements within the network for centrality analysis.
n_coincidences: the minimum number of shared neighbors a node must have with the interest nodes to be considered in the result.
*: sets the fields that will be inspected during the neighbors collection.
3. Shortest Path
3.1. Definition
Signature : shortest_path(start, end)
This function exemplifies a classic graph traversal algorithm, utilizing a queue for managing the frontier of exploration and a set for tracking visited nodes. It's pivotal to understand the queue's role in ensuring that the closest nodes (in terms of edge count) are explored first, leading to the shortest path discovery.
3.2. Parameters
Input:
start
: the node where the search commences.end
: the destination node.*
: types of connections permited during the search
Output:
A
Path
object representing the shortest sequence of nodes and edges from the start to the end. If no path exists, it returns an emptyPath
.
>> path = shortest_path(start_node, end_node) >> path.nodes ... >> path.edges ...
Last updated