🌿<NodeSet> Object Doc.

Workflow between Graph and Nodeset classes [deprecated].

<Graph> shares methods with <NodeSet>

We initially focused on the Graph section because NodeSets function as a container, facilitating the sharing of certain methods from the Graph with NodeSets generated throughout the system. This enables these smaller data structures to undergo randomization, filtering, or modification operations in a manner consistent with how they are applied within the Graph context. This is why in this section, we will refer to the object either as a Graph (variable 'G' in code snippets) or as a set of nodes.

0. Constructor

The NodeSet class is an extension of Python's list class, specifically tailored for managing collections of Node objects. It incorporates enhancements to ensure that when instantiated or appended with new elements, duplicates are automatically filtered out, maintaining a unique element set.

<NodeSet> domain is merely architectural

NodeSet usage (instantiation and the invocation of its methods outside the context of a graph) is exclusively within the architectural realm, serving as a means to ensure the consistent implementation of specific methods, with no expectation (now or in the future) for developers or end users to directly invoke NodeSet.

Theoretical Application
>> ndst = NodeSet(nodes=None)
>> ndst
NodeSet(size=0)
>> ndst.append(n:=G.random())
>> ndst.append(n)
>> ndst
NodeSet(size=1) # only keeps one copy of the same node

>> ndst.look()
['es-j:Perenne(NA)', 'es-n:Mago(NA)']
>> ndst = NodeSet(nodes=G.random(10).look())
>> ndst
NodeSet(size=10)

1. Display

To get to represent the set of nodes, we can use the look method.

>> G.look()
[Node(es-v:Estar como una regadera(NA)), Node(es-n:Papel impreso(NA)),
Node(es-n:Marisma(NA)), Node(es-n:Fonendoscopio(NA)), Node(es-n:Terraza(NA))]

# Otherwise
>> G
NodeSet(size=28751)
get_set

The get_set method is used to get a set of values for a given property for each node within a given node set structure.

>> G.get_set('lang')
['es']

>> G.get_set('type')
['n', 'v', 'j', 'b']

>> G.get_set('name')[:10] # rarely used
['Reputado', 'Microbio', 'Deshuesar', 'Estar a buen recaudo',
'Puerta antibalas', 'Extraer', 'Circunvalar', 'Sustento',
'Colgar', 'Justo antes de que']

>> G.get_set('lemma')[:10] # rarely used
['NA', 'Emotion', 'Person']

>> G.get_set('favorite')[:10] # rarely used
[False, True]

Note. The argument "examples" wouldn't work, since they are a list rather than a single value kind of attribute.

2.1. select(**kwargs)

Locates and retrieves a NodeSet of Node objects that match the specified criteria.

  • lang : the language that must be selected (str / [str1, str2, ...])

  • type : the type that must be selected (str / [str1, str2, ...])

  • lemma : the lemma to be selected (str / [str1, str2, ...]) / wether a lemma must exist or not (bool)

  • favorite : wether the node must be favorite or not (bool)

# Single-args
>> G.select(name='Perro')
>> G.select(lang='es', type='n')
>> G.select(lemma=True); # return lemma!='NA' nodes
>> G.select(lemma=False); # return lemma='NA' nodes
>> G.select(lemma='Emotion');
>> G.select(favorite=True); G.select(favorite=False)
>> G.select() # returns the whole structure

# Multiple-option-args
>> G.select(name=['Perro', 'Gato'])
>> G.select(lang=['es', 'en'], type=['n', 'j', 'v'])
>> G.selsect(lemma=[True, False]) # useless, yet enabled
>> G.select(lemma=['Emotion', 'Person'])
>> G.selsect(favorite=[True, False]) # useless, yet enabled

# Any argument combination is permitted.
The "find" function

The Graph class includes a the find method, that simplifies node access during development.

When searching for a node, if there's exactly one match, this function allows immediate use of that node, avoiding the need to extract it from a NodeSet. This makes it quicker and more convenient to work with individual nodes in the graph.

def find(self, **kwargs):
    ndst = self.select(**kwargs)
    return ndst[0] if len(ndst) == 1 else ndst
>>> G.select(name='Tempestad')
NodeSet(size=1)

>>> G.find(name='Tempestad')
Node(es-n:Tempestad(NA))

>>> G.find(name='asdf') # empty case
None

The select function can essentially be viewed as a filtering mechanism, given its capability to yield multiple outcomes based on defined criteria. However, since the filtering process targets identity (hash) attributes, it is categorized as a search method rather than a filter method.

2.2. random(k=None, **kwargs)

Given a set of nodes (even a set recently obtained from the select method), it randomizes k elements and returns it as nodeset (obviously).

>> G.random() # intended for fast-retrieval during checks
Node(es-n:Alter ego(NA))

>> G.random(1)
NodeSet(size=1)
>> G.random(1500)
NodeSet(size=1500)

Optionally, selection **kwargs are enabled for the random method, to be able to effectuate a selection and a randomization in a single step.

# Instead of this...
>> G.select(favorite=True).random(1500)

# You can go fully with this...
>> G.random(1500, favorite=True)

2.3. get_similars(name, k=1)

Given a word name, will search for k coincidences across the structure. In this context, the term coincidence refers to words with similar syntactical structure, identified using the SequenceMatcher function from the difflib library.

Note that this function, differently to others, does not return a NodeSet object, but a list containing (score, name) tuples.

>> G.get_similars('Interactivo', 5)
(1.0, Node(es-j:Interactivo(NA)))
(0.842, Node(es-j:Inactivo(NA)))
(0.818, Node(es-j:Instructivo(NA)))
(0.818, Node(es-j:Hiperactivo(NA)))
(0.818, Node(es-j:Reiterativo(NA)))
(0.8, Node(es-n:Atractivo(NA)))
(0.8, Node(es-j:Intergaláctico(NA)))
(0.778, Node(es-j:Intacto(NA)))
(0.762, Node(es-j:Imperativo(NA)))
(0.762, Node(es-j:Inefectivo(NA)))
(0.75, Node(es-j:Internacional(NA)))
(0.75, Node(es-j:Introspectivo(NA)))
(0.737, Node(es-n:Interior(NA)))
(0.737, Node(es-j:Negativo(NA)))
(0.737, Node(es-j:Interino(NA)))

3. Filter Methods

<Chaining Pattern>

Since Filter Methods yield NodeSet objects, they can be chained in the terminal for hands-on manipulation. This characteristic is particularly valuable when the filter setup is dessigned on the go.

3.1. Syntax (for `name` and `lemma`)

Below are presented the methods for filtering nodes based on the syntax of the regular expressions name or lemma. These methods enable the selection and manipulation of nodes that conform to these specific patterns within the data.

1. Characters

The char_count method filters nodes based on the number of characters in their name or lemma. It supports exact counts, as well as greater than or less than comparisons, allowing for flexible text length-based queries.

Parameters:

  • threshold (int) : does not need an previous operator. If provided, filters nodes whose length precisly matches the inserted integer.

  • operator (str) [optional] : requires a following threshold (like '>', 15). Common operators include '>', '<', '>=', '<=', '!=' and '='.

  • complement (bool) [optional] : If set to True, the method selects nodes that do not match the specified word count instead.

  • on_lemma (bool) [optional] : If enacted as True, the method performs the filter aiming at lemmas instead of names.

G.char_count(15)
G.char_count('>', 15)

2. Words

The word_count method filters nodes based on the number of words in their name or lemma. This method is beneficial for analyzing text complexity or focusing on specific text lengths within nodes. It supports exact counts and comparative queries, such as greater or fewer words than a specified number.

Parameters:

  • threshold (int) : does not need an previous operator. If provided, filters nodes whose length precisly matches the inserted integer.

  • operator (str) [optional] : requires a following threshold (like '>', 15). Common operators include '>', '<', '>=', '<=', '!=' and '='.

  • complement (bool) [optional] : If set to True, the method selects nodes that do not match the specified word count instead.

  • on_lemma (bool) [optional] : If enacted as True, the method performs the filter aiming at lemmas instead of names.

G.word_count(2)
G.word_count('>', 2)

3.2. Edge fields (by size)

This methods evaluate the sum of connections across levels 0, 1, and 2, focusing specifically on synonyms within synsets or semantic relations within semsets. By using a comparison operator against a threshold value and selecting specific levels, these methods allow for precise filtering of nodes based on their cumulative connectivity.

The edge_count method is designed to filter nodes within a NodeSet based on the number of connections they have in their relationships.

It allows users to specify criteria to select nodes with a certain number or range of synset connections. This method is useful for analyzing and working with nodes based on the richness of their synonymic relationships.

Parameters:

  • operator (str): A string representing the comparison operator used in the filtering condition. Common operators include '>', '<', '>=', '<=', '!=' and '='.

  • threshold (int) : An integer against which the number of synset connections is compared. It serves as the threshold for the filter condition.

  • *(str or list) Specifies which fields (kinds of edge) to consider when filtering. If not provided, all edges will be used for the filter.

  • complement: A boolean that, when True, returns nodes that do not meet the specified criteria instead of those that do.

G.edge_count(int)       # selects nodes with 'int' edges
G.edge_count('>', int)  # selects with more than 'int' edges

# Using `*fielding` arg
G.edge_count('>', int, *)

4. Batch Editing

The edit method allows for the modification of node attributes in batch. This method accepts keyword arguments corresponding to the node's attributes. This is especially useful after having located a subset of nodes base on certain criteria.

n.edit(lang='new_lang')
n.edit(type='new_type')
n.edit(name='new_name')
n.edit(lemma='new_lemma')

Last updated