find_similar_str¶

pyhelpers.text.find_similar_str(input_str, lookup_list, n=1, ignore_punctuation=True, engine='difflib', **kwargs)[source]¶

Finds n strings that are similar to input_str from a sequence of candidates.

Parameters:

input_str (str) – The string to find similar matches for.
lookup_list (Iterable) – A sequence of strings to search for matches.
n (int | None) – Number of similar strings to return; defaults to 1; when n=None, the function returns the entire lookup_list sorted by similarity in descending order.
ignore_punctuation (bool) – Whether to ignore punctuation in the comparison; defaults to True.
engine (str | Callable) –
Method for finding similarities; options include:
- 'difflib' (default), which uses difflib.get_close_matches().
- 'rapidfuzz' (or 'fuzz'), which uses rapidfuzz.fuzz.QRatio().
kwargs – [Optional] Additional parameters for the chosen engine; for instance, cutoff for 'difflib' and score_cutoff for 'rapidfuzz'.

Returns:

A string or list of strings similar to input_str, depending on n and the engine used.

Return type:

str | list | None

Note

By default, the function uses the built-in difflib module.
When engine='rapidfuzz' (or simply, engine='fuzz'), the function relies on RapidFuzz, which is not a dependency of pyhelpers. Install it separately using pip or conda.

Examples:

>>> from pyhelpers.text import find_similar_str
>>> lookup_list = ['Anglia',
...                'East Coast',
...                'East Midlands',
...                'North and East',
...                'London North Western',
...                'Scotland',
...                'South East',
...                'Wales',
...                'Wessex',
...                'Western']
>>> find_similar_str('angle', lookup_list)
'Anglia'
>>> find_similar_str('angle', lookup_list, n=2)
['Anglia', 'Wales']
>>> find_similar_str('angle', lookup_list, engine='fuzz')
'Anglia'
>>> find_similar_str('angle', lookup_list, n=2, engine='fuzz')
['Anglia', 'Wales']
>>> find_similar_str('x', lookup_list) is None
True
>>> find_similar_str('x', lookup_list, cutoff=0.25)
'Wessex'
>>> find_similar_str('x', lookup_list, n=2, cutoff=0.25)
'Wessex'
>>> find_similar_str('x', lookup_list, engine='fuzz')
'Wessex'
>>> find_similar_str('x', lookup_list, n=2, engine='fuzz')
['Wessex', 'Western']