🌿<NodeSet> Object Doc.

0. Constructor
The NodeSet
class is an extension of Python's list class, specifically tailored for managing collections of Node
objects. It incorporates enhancements to ensure that when instantiated or appended with new elements, duplicates are automatically filtered out, maintaining a unique element set.
>> ndst = NodeSet(nodes=None)
>> ndst
NodeSet(size=0)
>> ndst.append(n:=G.random())
>> ndst.append(n)
>> ndst
NodeSet(size=1) # only keeps one copy of the same node
>> ndst.look()
['es-j:Perenne(NA)', 'es-n:Mago(NA)']
>> ndst = NodeSet(nodes=G.random(10).look())
>> ndst
NodeSet(size=10)
1. Display
To get to represent the set of nodes, we can use the look
method.
>> G.look()
[Node(es-v:Estar como una regadera(NA)), Node(es-n:Papel impreso(NA)),
Node(es-n:Marisma(NA)), Node(es-n:Fonendoscopio(NA)), Node(es-n:Terraza(NA))]
# Otherwise
>> G
NodeSet(size=28751)
2. Search
2.1. select(**kwargs)
select(**kwargs)
Locates and retrieves a NodeSet
of Node objects that match the specified criteria.
lang
: the language that must be selected (str / [str1, str2, ...])type
: the type that must be selected (str / [str1, str2, ...])lemma
: the lemma to be selected (str / [str1, str2, ...]) / wether a lemma must exist or not (bool)favorite
: wether the node must be favorite or not (bool)
# Single-args
>> G.select(name='Perro')
>> G.select(lang='es', type='n')
>> G.select(lemma=True); # return lemma!='NA' nodes
>> G.select(lemma=False); # return lemma='NA' nodes
>> G.select(lemma='Emotion');
>> G.select(favorite=True); G.select(favorite=False)
>> G.select() # returns the whole structure
# Multiple-option-args
>> G.select(name=['Perro', 'Gato'])
>> G.select(lang=['es', 'en'], type=['n', 'j', 'v'])
>> G.selsect(lemma=[True, False]) # useless, yet enabled
>> G.select(lemma=['Emotion', 'Person'])
>> G.selsect(favorite=[True, False]) # useless, yet enabled
# Any argument combination is permitted.
2.2. random(k=None, **kwargs)
random(k=None, **kwargs)
Given a set of nodes (even a set recently obtained from the select
method), it randomizes k
elements and returns it as nodeset (obviously).
>> G.random() # intended for fast-retrieval during checks
Node(es-n:Alter ego(NA))
>> G.random(1)
NodeSet(size=1)
>> G.random(1500)
NodeSet(size=1500)
Optionally, selection **kwargs
are enabled for the random
method, to be able to effectuate a selection and a randomization in a single step.
# Instead of this...
>> G.select(favorite=True).random(1500)
# You can go fully with this...
>> G.random(1500, favorite=True)
2.3. get_similars(name, k=1)
get_similars(name, k=1)
Given a word name
, will search for k
coincidences across the structure. In this context, the term coincidence refers to words with similar syntactical structure, identified using the SequenceMatcher
function from the difflib
library.
Note that this function, differently to others, does not return a NodeSet
object, but a list containing (score, name)
tuples.
>> G.get_similars('Interactivo', 5)
(1.0, Node(es-j:Interactivo(NA)))
(0.842, Node(es-j:Inactivo(NA)))
(0.818, Node(es-j:Instructivo(NA)))
(0.818, Node(es-j:Hiperactivo(NA)))
(0.818, Node(es-j:Reiterativo(NA)))
(0.8, Node(es-n:Atractivo(NA)))
(0.8, Node(es-j:Intergaláctico(NA)))
(0.778, Node(es-j:Intacto(NA)))
(0.762, Node(es-j:Imperativo(NA)))
(0.762, Node(es-j:Inefectivo(NA)))
(0.75, Node(es-j:Internacional(NA)))
(0.75, Node(es-j:Introspectivo(NA)))
(0.737, Node(es-n:Interior(NA)))
(0.737, Node(es-j:Negativo(NA)))
(0.737, Node(es-j:Interino(NA)))
3. Filter Methods
3.1. Syntax (for `name` and `lemma`)
Below are presented the methods for filtering nodes based on the syntax of the regular expressions name
or lemma
. These methods enable the selection and manipulation of nodes that conform to these specific patterns within the data.
1. Characters
The char_count
method filters nodes based on the number of characters in their name or lemma. It supports exact counts, as well as greater than or less than comparisons, allowing for flexible text length-based queries.
Parameters:
threshold (
int
) : does not need an previous operator. If provided, filters nodes whose length precisly matches the inserted integer.operator (
str
) [optional] : requires a following threshold (like'>', 15
). Common operators include'>'
,'<'
,'>='
,'<='
,'!='
and'='
.complement (
bool
) [optional] : If set to True, the method selects nodes that do not match the specified word count instead.on_lemma (
bool
) [optional] : If enacted as True, the method performs the filter aiming at lemmas instead of names.
G.char_count(15)
G.char_count('>', 15)
2. Words
The word_count
method filters nodes based on the number of words in their name or lemma. This method is beneficial for analyzing text complexity or focusing on specific text lengths within nodes. It supports exact counts and comparative queries, such as greater or fewer words than a specified number.
Parameters:
threshold (
int
) : does not need an previous operator. If provided, filters nodes whose length precisly matches the inserted integer.operator (
str
) [optional] : requires a following threshold (like'>', 15
). Common operators include'>'
,'<'
,'>='
,'<='
,'!='
and'='
.complement (
bool
) [optional] : If set to True, the method selects nodes that do not match the specified word count instead.on_lemma (
bool
) [optional] : If enacted as True, the method performs the filter aiming at lemmas instead of names.
G.word_count(2)
G.word_count('>', 2)
3.2. Edge fields (by size)
This methods evaluate the sum of connections across levels 0, 1, and 2, focusing specifically on synonyms within synsets or semantic relations within semsets. By using a comparison operator against a threshold value and selecting specific levels, these methods allow for precise filtering of nodes based on their cumulative connectivity.
The edge_count
method is designed to filter nodes within a NodeSet
based on the number of connections they have in their relationships.
It allows users to specify criteria to select nodes with a certain number or range of synset connections. This method is useful for analyzing and working with nodes based on the richness of their synonymic relationships.
Parameters:
operator (
str
): A string representing the comparison operator used in the filtering condition. Common operators include'>'
,'<'
,'>='
,'<='
,'!='
and'='
.threshold (
int
) : An integer against which the number of synset connections is compared. It serves as the threshold for the filter condition.*(
str
orlist
) Specifies which fields (kinds of edge) to consider when filtering. If not provided, all edges will be used for the filter.complement
: A boolean that, when True, returns nodes that do not meet the specified criteria instead of those that do.
G.edge_count(int) # selects nodes with 'int' edges
G.edge_count('>', int) # selects with more than 'int' edges
# Using `*fielding` arg
G.edge_count('>', int, *)
4. Batch Editing
The edit
method allows for the modification of node attributes in batch. This method accepts keyword arguments corresponding to the node's attributes. This is especially useful after having located a subset of nodes base on certain criteria.
n.edit(lang='new_lang')
n.edit(type='new_type')
n.edit(name='new_name')
n.edit(lemma='new_lemma')
Last updated