🦕Graph Search Algorithms

1. Centrality

1.1. Definition

Signature : centrality(*nodes, depth, abs_decay, rel_decay, *fielding)

The centrality function is designed to analyze a network of nodes by evaluating their relative importance or 'centrality'. This parameter indicates its significance within the network, often based on how well-connected it is to other nodes.

The ultimate purpose is to use the centrality parameter for each node to refine the subset by removing less relevant nodes and enhancing the search for neighbors in the density_search function, ensuring that the remaining nodes are meaningfully connected or correlated.

>>> nodes_centrality, neighbor_ratings = centrality(lst)
>>> nodes_centrality
{Node(es-n:Naturaleza(NA)): 0.37779086092115394,
 Node(es-n:Arboleda(NA)): 1.0,
 Node(es-n:Árbol(NA)): 0.7518212724623604,
 Node(es-n:Augusto César(NA)): 0.3028169014084507,
 Node(es-n:Coliseo(NA)): 0.1567522783761392,}
>>> neighbor_ratings 
{Node(es-v:Preferir(NA)): 0.011119907640420357,
Node(es-j:Chabacano(NA)): 0.08668492703973746,
Node(es-v:Pasar por alto(NA)): 0.02263786224131273,
Node(es-j:Aceptable(NA)): 0.02263786224131273,
...

>>> nodes_centrality, neighbor_ratings = centrality(lst)
>>> nodes_centrality
{Node(es-n:Naturaleza(NA)): 0.16463022508038586,
Node(es-n:Arboleda(NA)): 0.5333333333333333,
Node(es-n:Augusto César(NA)): 1.0,
Node(es-n:Coliseo(NA)): 0.14431372549019608,
Node(es-n:Centurión(NA)): 0.5777777777777777}
>>> neighbor_ratings 
{Node(es-v:Catapulta(NA)): 0.044992347867293765,
Node(es-j:Catástrofe(NA)): 0.044992347867293765,
Node(es-v:Fecha de muerte(NA)): 0.115984716399922335,
Node(es-j:Respetable(NA)): 0.04917500182722292,
...

>>> nodes_centrality, neighbor_ratings = centrality(lst)
>>> nodes_centrality
{Node(es-n:Naturaleza(NA)): 0.024008574490889605,
Node(es-n:Augusto César(NA)): 1.0,
Node(es-n:Coliseo(NA)): 0.1192156862745098,
Node(es-n:Centurión(NA)): 0.5777777777777777}
>>> neighbor_ratings 
{Node(es-v:Decapitado(NA)): 0.059338375234523457,
Node(es-j:Trifulca(NA)): 0.08562785726285346,
Node(es-v:Vituperío(NA)): 0.01562067469764643,
Node(es-j:Ambrosía(NA)): 0.059338375234523457,
...

Be aware that the number of neighbors will always significantly exceed the number of nodes, as each node contributes various neighbors. Moreover, there will always be a substantial disparity between the most and least engaged neighbors, causing the predominant ratings within the neighbor_ratings to fall below 5% (0.05). When utilizing this function, our focus typically centers on extracting neighbors with a rating of at least >35% (0.35), indicating a stronger connection or relevance to the original set of nodes.

1.2. Parameters

nodes: entities or elements within the network for centrality analysis.
*: sets the fields that will be inspected during the neighbors collection.
depth (default=2): specifies level of exploration in network analysis; depth=1 means immediate neighbors.
soiling_weight: weighting factor adjusting the influence of revisitation frequency on node centrality, balancing connectivity-based and revisitation-based scores.
rel_decay (default=0.50): controls revisitation importance reduction with increasing depth.

1.3. Algorithm

Calculation of nodes_centrality

The algorithm essentially evaluates the importance of nodes in a network by considering two main factors: connectivity and revisitation frequency. For connectivity, it assigns scores to the neighbors of each targeted node based on how often they appear in the immediate vicinity of other nodes. The more intersections a neighbor has with different nodes, the higher its score. This score is then adjusted by taking its square root, amplifying the significance of nodes at busier intersections.

NEIGHBOR_FREQUENCY =
    {key: value**2 for key, value in neighbor_frequency.items()}

# [...] Keep only neighbors shared by more than one node (akka intersections)

STANDARD_SCORING=
    {node: sum(NEIGHBOR_FREQUENCY .get(neighbor, 0)
    for neighbor in neighbors) for node, neighbors in node_neighbors.items()}

For revisitation, the algorithm looks at how often certain paths lead back to the original nodes beyond the first level of neighbors. It tracks this through multiple depths, assigning a soil_score each time a path returns to an original node. The contribution of each return diminishes with depth, controlled by a rel_decay parameter.

for neighbors in neighbors_by_depth.values():
    standard_weight *= rel_decay
    for node in nodes:
        soil_scoring[node] += standard_weight * neighbors.count(node)

Finally, the algorithm combines these two scoring methods. The soil score is rescaled to be comparable with the standard connectivity-based score.

rescaled_soil_scoring =
{node: score * rf for node, score in soil_scoring.items()}

The final score of each node is then a weighted average of these two scores, with the weight given to the soil score adjustable through a soiling_weight parameter. This allows for tuning the balance between valuing connectivity versus revisitation frequency.

for node, neighbors in node_neighbors.items():
    score = standard_scoring[node] + soiling_weight * rescaled_soil_scoring[node]
    centralities[node] = score / len(neighbors) if neighbors else 0

The final scores in this algorithm are normalized to a 0 to 1 scale, where 1 represents the highest relative score within the network, indicating the node's significance based on connectivity and revisitation, but not implying an absolute measure of characteristics or quality.

for node in centralities:
centralities[node] /= max_centrality

High centrality scores may not always imply strong cohesion within a single group; they can also indicate nodes bridging disparate groups with weak or dispersed connections. Understanding network context is crucial for accurate interpretation.

Calculation of neighbor_ratings

Up to this point, we have focused on calculating node centralities, but the function also returns a second output consisting of a dictionary that contains neighbors and a scaled rating representing their connection with the given nodes. Essentially, this serves as an intersection calculation of the original nodes, based on their final scores.

neighbor_ratings = {}
for node, neighbors in node_neighbors.items():
    for neighbor in neighbors:
        neighbor_ratings[neighbor] =
        neighbor_ratings.setdefault(neighbor, 0) + centralities[node]

2. Density Search

2.1. Definition

Signature : density_search(interest_nodes, n_coincidences, *fielding)

The density search function is crafted to identify nodes in a network that have a high degree of relevance or 'density' in relation to a specified set of interest nodes. It works by analyzing the neighbors of these interest nodes and identifies those nodes (candidates) that have a significant number of coincidences (shared neighbors) with the interest nodes.

This function is particularly useful in networks where the goal is to find nodes that are not just connected but are densely connected with a certain group of nodes. By setting a threshold for the number of coincidences, users can fine-tune the density required for a node to be considered relevant.

>> lst = G.select(name=['Árbol','Arboleda','Naturaleza'])

>> density_search(lst,3)
NodeSet(size=26)
>> density_search(lst,2)
NodeSet(size=92)
>> density_search(lst,1)
NodeSet(size=424)

2.2. Parameters

nodes: entities or elements within the network for centrality analysis.
n_coincidences: the minimum number of shared neighbors a node must have with the interest nodes to be considered in the result.
*: sets the fields that will be inspected during the neighbors collection.

3. Shortest Path

3.1. Definition

Signature : shortest_path(start, end)

As of this writing, the Shortest Path function is primarily developed for entertainment purposes and has no specific application within the SKCLI framework.

This function exemplifies a classic graph traversal algorithm, utilizing a queue for managing the frontier of exploration and a set for tracking visited nodes. It's pivotal to understand the queue's role in ensuring that the closest nodes (in terms of edge count) are explored first, leading to the shortest path discovery.

3.2. Parameters

Input:
- start: the node where the search commences.
- end: the destination node.
- *: types of connections permited during the search
Output:
- A Path object representing the shortest sequence of nodes and edges from the start to the end. If no path exists, it returns an empty Path.
```
>> path = shortest_path(start_node, end_node)

>> path.nodes
...
>> path.edges
...
```

Previous<Path> Object Doc.NextModule 2: SKCLI

Last updated 1 year ago