G2Engine Reference

Prepare environment

Initialize Senzing configuration

Use G2ConfigMgr to install a Senzing engine configuration in the database.

Initialize python environment

In [1]:
import os
import sys
import json

# For RenderJSON

import uuid
from IPython.display import display_javascript, display_html, display

Helper class for JSON rendering

A class for pretty-printing JSON. Not required by Senzing, but helps visualize JSON.

In [2]:
class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict):
            self.json_str = json.dumps(json_data)
        elif isinstance(json_data, bytearray):
            self.json_str = json_data.decode()
        else:
            self.json_str = json_data
        self.uuid = str(uuid.uuid4())

    def _ipython_display_(self):
        display_html('<div id="{}" style="height:100%; width:100%; background-color: LightCyan"></div>'.format(self.uuid), raw=True)
        display_javascript("""
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
        document.getElementById('%s').appendChild(renderjson(%s))
        });
        """ % (self.uuid, self.json_str), raw=True)

System path

Update system path.

In [3]:
python_path = "{0}/python".format(
    os.environ.get("SENZING_G2_DIR", "/opt/senzing/g2"))
sys.path.append(python_path)

Initialize variables

Create variables used for G2Engine.

In [4]:
%run senzing-init-config.ipynb
Stored 'senzing_config_json' (str)
Default config already set
Stored 'config_id_bytearray' (bytearray)
In [5]:
%store -r senzing_config_json
%store -r config_id_bytearray
config_id=config_id_bytearray.decode()
In [6]:
RenderJSON(senzing_config_json)

G2Engine

In [7]:
import G2Exception
from G2Engine import G2Engine

G2Engine initialization

To start using Senzing G2Engine, create and initialize an instance. This should be done once per process. The initV2() method accepts the following parameters:

  • module_name: A short name given to this instance of the G2Engine object.
  • senzing_config_json: A JSON string containing configuration parameters.
  • verbose_logging: A boolean which enables diagnostic logging.
  • config_id: (optional) The identifier value for the engine configuration can be returned here.

Calling this function will return "0" upon success.

In [8]:
g2_engine = G2Engine()
try:
    return_code = g2_engine.initV2(
        module_name,
        senzing_config_json,
        verbose_logging)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

destroy()

destroy the engine so that we can initalize differently

In [9]:
g2_engine.destroy()

initWithConfigIDV2

Alternatively initWithConfigIDV2() can be used to specify a configuration.

In [10]:
try:
    return_code = g2_engine.initWithConfigIDV2(
        module_name,
        senzing_config_json,
        config_id,
        verbose_logging)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

reinitV2

The reinitV2() function may be used to reinitialize the engine using a specified initConfigID.

In [11]:
try:
    return_code = g2_engine.reinitV2(
        config_id)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

primeEngine

The primeEngine() method may optionally be called to pre-initialize some of the heavier weight internal resources of the G2 engine.

In [12]:
try:
    g2_engine.primeEngine()

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

getActiveConfigID

Call getActiveConfigID() to return an identifier for the loaded Senzing engine configuration. The call will assign a long integer to a user-designated variable -- the function itself will return "0" upon success. The getActiveConfigID() method accepts one parameter as input:

  • configuration_id: The identifier value for the engine configuration. The result of function call is returned here
In [13]:
configuration_id = bytearray()
try:
    g2_engine.getActiveConfigID(configuration_id)
    print("Configuration id: {0}".format(configuration_id.decode()))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getlastException())
Configuration id: 1777871070

exportConfig

Call exportConfig() to retrieve your Senzing engine's configuration. The call will assign a JSON document to a user-designated buffer, containing all relevant configuration information -- the function itself will return "0" upon success. The exportConfig function accepts the following parameters as input:

  • response_bytearray: The memory buffer to retrieve the JSON configuration document
  • config_id_bytearray: The identifier value for the engine configuration can be returned here.
In [14]:
response_bytearray = bytearray()
config_id_bytearray = bytearray()

try:
    g2_engine.exportConfig(response_bytearray, config_id_bytearray)
    print("Configuration ID: {0}".format(config_id_bytearray.decode()))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)
Configuration ID: 1777871070

stats

Call stats() to retrieve workload statistics for the current process. These statistics will automatically reset after retrieval.

  • response_bytearray: A memory buffer for returning the response document. If an error occurred, an error response is stored here.
In [15]:
response_bytearray = bytearray()

try:
    g2_engine.stats(response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

getRepositoryLastModifiedTime

Call getRepositoryLastModifiedTime() to obtain the last modified time of the Senzing repository,measured in the number of seconds between the last modified time and January 1, 1970 12:00am GMT (epoch time). The call will assign a long integer to a user-designated buffer -- the function itself will return "0" upon success. The getRepositoryLastModifiedTime() method accepts one parameter as input:

  • last_modified_unixtime: The last modified time. The result of function call is returned here
In [16]:
last_modified_timestamp = bytearray()

try:
    g2_engine.getRepositoryLastModifiedTime(last_modified_timestamp)

# Human readable output.

    from datetime import datetime
    last_modified_unixtime = int(int(last_modified_timestamp.decode()) / 1000)
    last_modified_datetime = datetime.fromtimestamp(last_modified_unixtime)

    print("Last modified timestamp: {0}\nLast modified time: {1}"
          .format(  last_modified_timestamp.decode(), last_modified_datetime))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
Last modified timestamp: 1594251375067
Last modified time: 2020-07-08 23:36:15

Insert

Insert parameters

The following variables are used as parameters to the Senzing API.

In [17]:
datasource_code_1 = "TEST"
record_id_1 = "1"
datasource_code_2 = "TEST"
record_id_2 = "2"
datasource_code_3 = "TEST"
record_id_3 = "3"
datasource_code_4 = "TEST"
record_id_4 = "4"
datasource_code_5 = "TEST"
record_id_5 = "5"
datasource_code_6 = "TEST"
record_id_6 = "6"
datasource_code_7 = "TEST"
record_id_7 = "7"

load_id = None
g2_engine_flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS

Initial data.

In [18]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Smith",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

addRecord

Once the Senzing engine is initialized, use addRecord() to load a record into the Senzing repository -- addRecord() can be called as many times as desired and from multiple threads at the same time. The addRecord() function returns "0" upon success, and accepts four parameters as input:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system
  • record_id: The record ID, used to identify distinct records
  • data_as_json: A JSON document with the attribute data for the record
  • load_id: The observation load ID for the record; value can be null and will default to data_source
In [19]:
try:
    g2_engine.addRecord(
        datasource_code_1,
        record_id_1,
        data_as_json,
        load_id)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

addRecordWithReturnedRecordID

Alternatively addRecordWithReturnedRecordID() can be used to add a record.

In [20]:
response=bytearray()
try:
    g2_engine.addRecordWithReturnedRecordID(
        datasource_code_1,
        response,
        data_as_json,
        load_id)
except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
print(response.decode())
49AD3C65923BFAC7275ACB06019131E353B0373C

addRecordWithInfo

Use if you would like to know what resolved entities were modified when adding the new record. It behaves identically to addRecord(), but returns a json document containing the IDs of the affected entities. It accepts the following parameters:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system.
  • record_id: The record ID, used to identify distinct records
  • data_as_json: A JSON document with the attribute data for the record
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here
  • load_id: The observation load ID for the record; value can be null and will default to data_source
  • g2_engine_flags: Control flags for specifying what data about the entity to retrieve
In [21]:
response_bytearray = bytearray()

try:
    g2_engine.addRecordWithInfo(
        datasource_code_1,
        record_id_1,
        data_as_json,
        response_bytearray,
        load_id,
        g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

getRecordV2

Use getRecordV2() to retrieve a single record from the data repository; the record is assigned in JSON form to a user-designated buffer, and the function itself returns "0" upon success. Once the Senzing engine is initialized, getRecordV2() can be called as many times as desired and from multiple threads at the same time. The getRecordV2() function accepts the following parameters as input:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system.
  • record_id: The record ID, used to identify the record for retrieval
  • g2_engine_flags: Control flags for specifying what data about the record to retrieve.
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
In [22]:
response_bytearray = bytearray()

try:
    g2_engine.getRecordV2(
        datasource_code_1,
        record_id_1,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

The function getRecordV2() is an improved version of getRecord() that also allows you to use control flags. The getRecord() function has been deprecated.

getEntityByRecordIDV2

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers methods for entity searching, all of which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success).

Use getEntityByRecordIDV2() to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system.
  • record_id: The numeric ID of a resolved entity
  • g2_engine_flags: Control flags for specifying what data about the entity to retrieve.
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
In [23]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordIDV2(
        datasource_code_1,
        record_id_1,
        g2_engine_flags,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

getEntityByEntityIDV2

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers methods for entity searching, all of which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success).

Use getEntityByEntityIDV2() to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

  • entity_id: The numeric ID of a resolved entity
  • g2_engine_flags: Control flags for specifying what data about the entity to retrieve.
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
In [24]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByEntityIDV2(
        entity_id_1,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Search By Attributes

searchByAttributes

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers a method for entity searching by attributes, which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success).

Use searchByAttributes() to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

  • data_as_json: A JSON document with the attribute data to search for.
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
In [25]:
response_bytearray = bytearray()
try:
    g2_engine.searchByAttributes(data_as_json, response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

searchByAttributesV2

This function is similar but preferable to the searchByAttributes() function. This function has improved functionality and a better standardized output structure.

Use searchByAttributesV2() to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

  • data_as_json: A JSON document with the attribute data to search for.
  • g2_engine_flags: Operational flags
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
In [26]:
response_bytearray = bytearray()

try:
    g2_engine.searchByAttributesV2(
        data_as_json,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Finding Paths

The findPathByEntityID() and findPathByRecordID() functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

  • entity_id_2: The entity ID for the starting entity of the search path
  • entity_id_3: The entity ID for the ending entity of the search path
  • datasource_code_2: The data source for the starting entity of the search path
  • datasource_code_3: The data source for the ending entity of the search path
  • record_id_2: The record ID for the starting entity of the search path
  • record_id_3: The record ID for the ending entity of the search path
  • max_degree: The number of relationship degrees to search

The functions return a JSON document that identifies the path between the entities, and the information on the entities in question. The document contains a section called "ENTITY_PATHS" which gives the path from one entity to the other. Example:

{
  "START_ENTITY_ID": 10,
  "END_ENTITY_ID": 13,
  "ENTITIES": [10, 11, 12, 13]
}

If no path was found, then the value of ENTITIES will be an empty list.

The response document also contains a separate ENTITIES section, with the full information about the resolved entities along that path.

First you will need to create some records so that you have some that you can compare. Can you see what is the same between this record and the previous one?

In [27]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Max",
        "NAME_MIDDLE": "W"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_2,
        record_id_2,
        data_as_json,
        None)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

Replace values for Record #3

In [28]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Mildred"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_3,
        record_id_3,
        data_as_json,
        None)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

Locate "entity identifier" for Record #1

In [29]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_1,
        record_id_1,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

    print("Entity ID: {0}".format(entity_id_1))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getlastException())
Entity ID: 1

Locate "entity identifier" for Record #2

In [30]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_2,
        record_id_2,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_2 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

    print("Entity ID: {0}".format(entity_id_2))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)
Entity ID: 2

Locate "entity identifier" for Record #3

In [31]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_3,
        record_id_3,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_3 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

    print("Entity ID: {0}".format(entity_id_3))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getlastException())
RenderJSON(response_bytearray)
Entity ID: 3

findPathByEntityID

In [32]:
# Define search variables.

max_degree = 3

# Find the path by entity ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathByEntityID(
        entity_id_2,
        entity_id_3,
        max_degree,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getlastException())

# Print the results.

RenderJSON(response_bytearray)

findPathByEntityIDV2

The function findPathByEntityIDV2() is an improved version of findPathByEntityID() that also allow you to use control flags.

In [33]:
# Define search variables.

max_degree = 3

# Find the path by entity ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathByEntityIDV2(
        entity_id_2,
        entity_id_3,
        max_degree,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findPathByRecordID

In [34]:
# Define search variables.

max_degree = 3

# Find the path by record ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathByRecordID(
        datasource_code_2, record_id_2,
        datasource_code_3, record_id_3,
        max_degree,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findPathByRecordIDV2

The function findPathByRecordIDV2() is an improved version of findPathByRecordID() that also allow you to use control flags.

In [35]:
# Define search variables.

max_degree = 3

# Find the path by record ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathByRecordIDV2(
        datasource_code_2, record_id_2,
        datasource_code_3, record_id_3,
        max_degree,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getlastException())
# Print the results.

RenderJSON(response_bytearray)

Finding Paths with Exclusions

The findPathExcludingByEntityID() and findPathExcludingByRecordID() functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities. In addition, it will find paths that exclude certain entities from being on the path.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. Additionally, entities to be excluded can also be specified by either Entity ID or by Record ID.

When excluding entities, the user may choose to either (a) strictly exclude the entities, or (b) prefer to exclude the entities, but still include them if no other path is found. By default, entities will be strictly excluded. A "preferred exclude" may be done by specifying the G2_FIND_PATH_PREFER_EXCLUDE control flag.

These functions have the following parameters:

  • entity_id_2: The entity ID for the starting entity of the search path
  • entity_id_3: The entity ID for the ending entity of the search path
  • datasource_code_2: The data source for the starting entity of the search path
  • datasource_code_3: The data source for the ending entity of the search path
  • record_id_2: The record ID for the starting entity of the search path
  • record_id_3: The record ID for the ending entity of the search path
  • max_degree: The number of relationship degrees to search
  • excluded_entities_as_json: Entities that should be avoided on the path (JSON document)
  • g2_engine_flags: Operational flags

findPathExcludingByEntityID

In [36]:
# Define search variables.

max_degree = 4
excluded_entities = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }]}
excluded_entities_as_json = json.dumps(excluded_entities)

# Find the path by entity ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathExcludingByEntityID(
        entity_id_2,
        entity_id_3,
        max_degree,
        excluded_entities_as_json,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findPathExcludingByRecordID

In [37]:
# Define search variables.

excluded_records = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }]}
excluded_records_as_json = json.dumps(excluded_records)

# Find the path by record ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathExcludingByRecordID(
        datasource_code_2, record_id_2,
        datasource_code_3, record_id_3,
        max_degree,
        excluded_records_as_json,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

Finding Paths with Required Sources

The findPathIncludingSourceByEntityID() and findPathIncludingSourceByRecordID() functions can be used to find single relationship paths between two entities. In addition, one of the enties along the path must include a specified data source.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. The required data source or sources are specified by a json document list.

Specific entities may also be excluded, using the same methodology as the findPathExcludingByEntityID() and findPathExcludingByRecordID() functions use.

These functions have the following parameters:

  • entity_id_2: The entity ID for the starting entity of the search path
  • entity_id_3: The entity ID for the ending entity of the search path
  • datasource_code_2: The data source for the starting entity of the search path
  • datasource_code_3: The data source for the ending entity of the search path
  • record_id_2: The record ID for the starting entity of the search path
  • record_id_3: The record ID for the ending entity of the search path
  • max_degree: The number of relationship degrees to search
  • excluded_entities_as_json: Entities that should be avoided on the path (JSON document)
  • required_dsrcs_as_json: Entities that should be avoided on the path (JSON document)
  • g2_engine_flags: Operational flags

findPathIncludingSourceByEntityID

In [38]:
# Define search variables.

max_degree = 4
excluded_entities = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }]}
excluded_entities_as_json = json.dumps(excluded_entities)
required_dsrcs = {
    "DATA_SOURCES": [
        datasource_code_1
    ]}
required_dsrcs_as_json = json.dumps(excluded_entities)

# Find the path by entity ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathIncludingSourceByEntityID(
        entity_id_2,
        entity_id_3,
        max_degree,
        excluded_entities_as_json,
        required_dsrcs_as_json,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findPathIncludingSourceByRecordID

In [39]:
# Define search variables.

excluded_records = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }]}
excluded_records_as_json = json.dumps(excluded_records)

# Find the path by record ID.

response_bytearray = bytearray([])

try:
    g2_engine.findPathIncludingSourceByRecordID(
        datasource_code_2, record_id_2,
        datasource_code_3, record_id_3,
        max_degree,
        excluded_records_as_json,
        required_dsrcs_as_json,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

Finding Networks

The findNetworkByEntityID() and findNetworkByRecordID() functions can be used to find all entities surrounding a requested set of entities. This includes the requested entities, paths between them, and relations to other nearby entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

  • entity_list_as_json: A list of entities, specified by Entity ID (JSON document)
  • record_list_as_json: A list of entities, specified by Record ID (JSON document)
  • max_degree: The maximum number of degrees in paths between search entities
  • buildout_degree: The number of degrees of relationships to show around each search entity
  • max_entities: The maximum number of entities to return in the discovered network

They also have various arguments used to return response documents

The functions return a JSON document that identifies the path between the each set of search entities (if the path exists), and the information on the entities in question (search entities, path entities, and build-out entities.

findNetworkByEntityID

In [40]:
# Define search variables.

entity_list = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }, {
        "ENTITY_ID": entity_id_2
    }, {
        "ENTITY_ID": entity_id_3
    }]}
entity_list_as_json = json.dumps(entity_list)
max_degree = 2
buildout_degree = 1
max_entities = 12

# Find the network by entity ID.

response_bytearray = bytearray()

try:
    g2_engine.findNetworkByEntityID(
        entity_list_as_json,
        max_degree,
        buildout_degree,
        max_entities,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findNetworkByEntityIDV2

The function findNetworkByEntityIDV2() is an improved version of findNetworkByEntityID() that also allow you to use control flags.

In [41]:
# Define search variables.

entity_list = {
    "ENTITIES": [{
        "ENTITY_ID": entity_id_1
    }, {
        "ENTITY_ID": entity_id_2
    }, {
        "ENTITY_ID": entity_id_3
    }]}
entity_list_as_json = json.dumps(entity_list)
max_degree = 2
buildout_degree = 1
max_entities = 12

# Find the network by entity ID.

response_bytearray = bytearray()

try:
    g2_engine.findNetworkByEntityIDV2(
        entity_list_as_json,
        max_degree,
        buildout_degree,
        max_entities,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findNetworkByRecordID

In [42]:
# Define search variables.

record_list = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }, {
        "RECORD_ID": record_id_2,
        "DATA_SOURCE": datasource_code_2
    }, {
        "RECORD_ID": record_id_3,
        "DATA_SOURCE": datasource_code_3
    }]}
record_list_as_json = json.dumps(record_list)


# Find the network by record ID.

response_bytearray = bytearray()

try:
    g2_engine.findNetworkByRecordID(
        record_list_as_json,
        max_degree,
        buildout_degree,
        max_entities,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

findNetworkByRecordIDV2

The function findNetworkByRecordIDV2() is an improved version of findNetworkByRecordID() that also allow you to use control flags.

In [43]:
# Define search variables.

record_list = {
    "RECORDS": [{
        "RECORD_ID": record_id_1,
        "DATA_SOURCE": datasource_code_1
    }, {
        "RECORD_ID": record_id_2,
        "DATA_SOURCE": datasource_code_2
    }, {
        "RECORD_ID": record_id_3,
        "DATA_SOURCE": datasource_code_3
    }]}
record_list_as_json = json.dumps(record_list)

# Find the network by record ID.

response_bytearray = bytearray()

try:
    g2_engine.findNetworkByRecordIDV2(
        record_list_as_json,
        max_degree,
        buildout_degree,
        max_entities,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Print the results.

RenderJSON(response_bytearray)

Connection Details

The whyEntityByEntityID() and whyEntityByRecordID() functions can be used to determine why records belong to their resolved entities. These functions will compare the record data within an entity against the rest of the entity data, and show why they are connected. This is calculated based on the features that record data represents.

Records can be chosen by either Record ID or by Entity ID, depending on which function is chosen. If a single record ID is used, then comparison results for that single record will be generated, as part of its entity. If an Entity ID is used, then comparison results will be generated for every record within that entity.

These functions have the following parameters:

  • entity_id: The entity ID for the entity to be analyzed
  • datasource_code: The data source for the record to be analyzed
  • record_id: The record ID for the record to be analyzed
  • g2_engine_flags: Control flags for outputting entities

They also have various arguments used to return response documents.

The functions return a JSON document that gives the results of the record analysis. The document contains a section called "WHY_RESULTS", which shows how specific records relate to the rest of the entity. It has a "WHY_KEY", which is similar to a match key, in defining the relevant connected data. It shows candidate keys for features that initially cause the records to be analyzed for a relationship, plus a series of feature scores that show how similar the feature data was.

The response document also contains a separate ENTITIES section, with the full information about the resolved entity. (Note: When working with this entity data, Senzing recommends using the flags G2_ENTITY_OPTION_INCLUDE_INTERNAL_FEATURES and G2_ENTITY_OPTION_INCLUDE_FEATURE_STATS. This will provide detailed feature data that is not included by default, but is useful for understanding the WHY_RESULTS data.)

The functions whyEntityByEntityIDV2() and whyEntityByRecordV2() are enhanced versions of whyEntityByEntityID() and whyEntityByRecordID() that also allow you to use control flags. The whyEntityByEntityID() and whyEntityByRecordID() functions work in the same way, but use the default flag value G2_WHY_ENTITY_DEFAULT_FLAGS.

whyEntityByRecordID

In [44]:
response_bytearray = bytearray()

try:
    g2_engine.whyEntityByRecordID(
        datasource_code_1,
        record_id_1,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

whyEntityByRecordIDV2

In [45]:
response_bytearray = bytearray()

try:
    g2_engine.whyEntityByRecordIDV2(
        datasource_code_1,
        record_id_1,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastExceptino())
RenderJSON(response_bytearray)

whyEntityByEntityID

In [46]:
response_bytearray = bytearray()

try:
    g2_engine.whyEntityByEntityID(
        entity_id_1,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

whyEntityByEntityIDV2

In [47]:
response_bytearray = bytearray()

try:
    g2_engine.whyEntityByEntityIDV2(
        entity_id_1,
        g2_engine_flags,
        response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Replace

replaceRecord

Use the replaceRecord() function to update or replace a record in the data repository. If record doesn't exist, a new record is added to the data repository. Like the above functions, replaceRecord() returns "0" upon success, and it can be called as many times as desired and from multiple threads at the same time. The replaceRecord() function accepts four parameters as input:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system
  • record_id: The record ID, used to identify distinct records
  • data_as_json: A JSON document with the attribute data for the record
  • load_id: The observation load ID for the record; value can be null and will default to datasource_code
In [48]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_1,
        record_id_1,
        data_as_json,
        load_id)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

replaceRecordWithInfo

replaceRecordWithInfo() is available if you would like to know what resolved entities were modified when replacing a record. It behaves identically to replaceRecord(), but also returns a json document containing the IDs of the affected entities. It accepts the following parameters:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system.
  • record_id: The record ID, used to identify distinct records
  • data_as_json: A JSON document with the attribute data for the record
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
  • load_id: The observation load ID for the record; value can be null and will default to datasource_code
In [49]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Jones",
        "NAME_FIRST": "John",
        "NAME_MIDDLE": "M"
    }],
    "PASSPORT_NUMBER": "PP11111",
    "PASSPORT_COUNTRY": "US",
    "DRIVERS_LICENSE_NUMBER": "DL11111",
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)
response_bytearray = bytearray()

try:
    g2_engine.replaceRecordWithInfo(
        datasource_code_1,
        record_id_1,
        data_as_json,
        response_bytearray,
        load_id)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Re-evaluate

reevaluateRecord

In [50]:
try:
    g2_engine.reevaluateRecord(
        datasource_code_1,
        record_id_1,
        g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

reevaluateRecordWithInfo

In [51]:
response_bytearray = bytearray()

try:
    g2_engine.reevaluateRecordWithInfo(
        datasource_code_1,
        record_id_1,
        response_bytearray,
        g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

reevaluateEntity

Find an entity.

In [52]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordIDV2(
        datasource_code_1,
        record_id_1,
        g2_engine_flags,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Re-evaluate the entity.

In [53]:
try:
    g2_engine.reevaluateEntity(entity_id_1, g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

reevaluateEntityWithInfo

In [54]:
response_bytearray = bytearray()

try:
    g2_engine.reevaluateEntityWithInfo(
        entity_id_1,
        response_bytearray,
        g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Reporting

Exporting entity data from resolved entities is one of the core purposes of Senzing software. In just a few short steps, the Senzing engine allows users to export entity data in either JSON or CSV format.

exportJSONEntityReport

There are three steps to exporting resolved entity data from the G2Engine object in JSON format. First, use the exportJSONEntityReport() method to generate a long integer, referred to here as an export_handle. The exportJSONEntityReport() method accepts one parameter as input:

  • g2_engine_flags: An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.
In [55]:
try:
    export_handle = g2_engine.exportJSONEntityReport(g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

fetchNext

Second, use the fetchNext() method to read the exportHandle and export a row of JSON output containing the entity data for a single entity. Note that successive calls of fetchNext() will export successive rows of entity data. The fetchNext() method accepts the following parameters as input:

  • export_handle: A long integer from which resolved entity data may be read and exported.
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here.
In [56]:
try:
    while True:
        response_bytearray = bytearray()
        g2_engine.fetchNext(export_handle, response_bytearray)
        if not response_bytearray:
            break
        response_dictionary = json.loads(response_bytearray)
        response = json.dumps(response_dictionary, sort_keys=True, indent=4)
        print(response)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
{
    "RELATED_ENTITIES": [
        {
            "ENTITY_ID": 5,
            "ERRULE_CODE": "SF1E",
            "IS_AMBIGUOUS": 0,
            "IS_DISCLOSED": 0,
            "LENS_CODE": "DEFAULT",
            "MATCH_KEY": "+SSN+DRLIC+PASSPORT+PNAME",
            "MATCH_LEVEL": 2,
            "MATCH_LEVEL_CODE": "POSSIBLY_SAME",
            "MATCH_SCORE": 20,
            "REF_SCORE": 6
        }
    ],
    "RESOLVED_ENTITY": {
        "ENTITY_ID": 1,
        "ENTITY_NAME": "JOHN M SMITH",
        "FEATURES": {
            "DRLIC": [
                {
                    "FEAT_DESC": "DL11111",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "DL11111",
                            "LIB_FEAT_ID": 3
                        }
                    ],
                    "LIB_FEAT_ID": 3
                }
            ],
            "NAME": [
                {
                    "FEAT_DESC": "JOHN M SMITH",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "JOHN M SMITH",
                            "LIB_FEAT_ID": 1
                        }
                    ],
                    "LIB_FEAT_ID": 1,
                    "UTYPE_CODE": "PRIMARY"
                }
            ],
            "PASSPORT": [
                {
                    "FEAT_DESC": "PP11111 US",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "PP11111 US",
                            "LIB_FEAT_ID": 4
                        }
                    ],
                    "LIB_FEAT_ID": 4
                }
            ],
            "SSN": [
                {
                    "FEAT_DESC": "111-11-1111",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "111-11-1111",
                            "LIB_FEAT_ID": 2
                        }
                    ],
                    "LIB_FEAT_ID": 2
                }
            ]
        },
        "LENS_CODE": "DEFAULT",
        "RECORDS": [
            {
                "DATA_SOURCE": "TEST",
                "ENTITY_DESC": "John M Smith",
                "ENTITY_KEY": "DC22202DD6CC5FFF711D866BCF1394489FFA9180",
                "ENTITY_TYPE": "TEST",
                "ERRULE_CODE": "",
                "INTERNAL_ID": 1,
                "LAST_SEEN_DT": "2020-07-08 23:44:42.405",
                "MATCH_KEY": "",
                "MATCH_LEVEL": 0,
                "MATCH_LEVEL_CODE": "",
                "MATCH_SCORE": 0,
                "RECORD_ID": "49AD3C65923BFAC7275ACB06019131E353B0373C",
                "REF_SCORE": 0
            }
        ]
    }
}
{
    "RELATED_ENTITIES": [],
    "RESOLVED_ENTITY": {
        "ENTITY_ID": 2,
        "ENTITY_NAME": "MAX W MILLER",
        "FEATURES": {
            "NAME": [
                {
                    "FEAT_DESC": "MAX W MILLER",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "MAX W MILLER",
                            "LIB_FEAT_ID": 10
                        }
                    ],
                    "LIB_FEAT_ID": 10,
                    "UTYPE_CODE": "PRIMARY"
                }
            ],
            "SSN": [
                {
                    "FEAT_DESC": "111-11-1111",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "111-11-1111",
                            "LIB_FEAT_ID": 2
                        }
                    ],
                    "LIB_FEAT_ID": 2
                }
            ]
        },
        "LENS_CODE": "DEFAULT",
        "RECORDS": [
            {
                "DATA_SOURCE": "TEST",
                "ENTITY_DESC": "Max W Miller",
                "ENTITY_KEY": "814730F425BFCD47D56537D7408EBD4213DCB3E9",
                "ENTITY_TYPE": "TEST",
                "ERRULE_CODE": "",
                "INTERNAL_ID": 2,
                "LAST_SEEN_DT": "2020-07-08 23:44:42.468",
                "MATCH_KEY": "",
                "MATCH_LEVEL": 0,
                "MATCH_LEVEL_CODE": "",
                "MATCH_SCORE": 0,
                "RECORD_ID": "2",
                "REF_SCORE": 0
            }
        ]
    }
}
{
    "RELATED_ENTITIES": [],
    "RESOLVED_ENTITY": {
        "ENTITY_ID": 3,
        "ENTITY_NAME": "MILDRED MILLER",
        "FEATURES": {
            "NAME": [
                {
                    "FEAT_DESC": "MILDRED MILLER",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "MILDRED MILLER",
                            "LIB_FEAT_ID": 13
                        }
                    ],
                    "LIB_FEAT_ID": 13,
                    "UTYPE_CODE": "PRIMARY"
                }
            ],
            "SSN": [
                {
                    "FEAT_DESC": "111-11-1111",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "111-11-1111",
                            "LIB_FEAT_ID": 2
                        }
                    ],
                    "LIB_FEAT_ID": 2
                }
            ]
        },
        "LENS_CODE": "DEFAULT",
        "RECORDS": [
            {
                "DATA_SOURCE": "TEST",
                "ENTITY_DESC": "Mildred Miller",
                "ENTITY_KEY": "B4C5F2D4B888C71C115DE44C8E6389FC57B6B2C5",
                "ENTITY_TYPE": "TEST",
                "ERRULE_CODE": "",
                "INTERNAL_ID": 3,
                "LAST_SEEN_DT": "2020-07-08 23:44:42.477",
                "MATCH_KEY": "",
                "MATCH_LEVEL": 0,
                "MATCH_LEVEL_CODE": "",
                "MATCH_SCORE": 0,
                "RECORD_ID": "3",
                "REF_SCORE": 0
            }
        ]
    }
}
{
    "RELATED_ENTITIES": [
        {
            "ENTITY_ID": 1,
            "ERRULE_CODE": "SF1E",
            "IS_AMBIGUOUS": 0,
            "IS_DISCLOSED": 0,
            "LENS_CODE": "DEFAULT",
            "MATCH_KEY": "+SSN+DRLIC+PASSPORT+PNAME",
            "MATCH_LEVEL": 2,
            "MATCH_LEVEL_CODE": "POSSIBLY_SAME",
            "MATCH_SCORE": 20,
            "REF_SCORE": 6
        }
    ],
    "RESOLVED_ENTITY": {
        "ENTITY_ID": 5,
        "ENTITY_NAME": "JOHN M JONES",
        "FEATURES": {
            "DRLIC": [
                {
                    "FEAT_DESC": "DL11111",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "DL11111",
                            "LIB_FEAT_ID": 3
                        }
                    ],
                    "LIB_FEAT_ID": 3
                }
            ],
            "NAME": [
                {
                    "FEAT_DESC": "JOHN M JONES",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "JOHN M JONES",
                            "LIB_FEAT_ID": 21
                        }
                    ],
                    "LIB_FEAT_ID": 21,
                    "UTYPE_CODE": "PRIMARY"
                }
            ],
            "PASSPORT": [
                {
                    "FEAT_DESC": "PP11111 US",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "PP11111 US",
                            "LIB_FEAT_ID": 4
                        }
                    ],
                    "LIB_FEAT_ID": 4
                }
            ],
            "SSN": [
                {
                    "FEAT_DESC": "111-11-1111",
                    "FEAT_DESC_VALUES": [
                        {
                            "FEAT_DESC": "111-11-1111",
                            "LIB_FEAT_ID": 2
                        }
                    ],
                    "LIB_FEAT_ID": 2
                }
            ]
        },
        "LENS_CODE": "DEFAULT",
        "RECORDS": [
            {
                "DATA_SOURCE": "TEST",
                "ENTITY_DESC": "John M Jones",
                "ENTITY_KEY": "B862AA44434BCFC5695546A2C0E87E24A42D4C8D",
                "ENTITY_TYPE": "TEST",
                "ERRULE_CODE": "",
                "INTERNAL_ID": 5,
                "LAST_SEEN_DT": "2020-07-08 23:44:42.649",
                "MATCH_KEY": "",
                "MATCH_LEVEL": 0,
                "MATCH_LEVEL_CODE": "",
                "MATCH_SCORE": 0,
                "RECORD_ID": "1",
                "REF_SCORE": 0
            }
        ]
    }
}

closeExport

In [57]:
try:
    g2_engine.closeExport(export_handle)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

exportCSVEntityReportV2

There are three steps to exporting resolved entity data from the G2Engine object in CSV format. First, use the exportCSVEntityReportV2() method to generate a long integer, referred to here as an 'export_handle'.

The exportCSVEntityReportV2() method accepts these parameter as input:

  • csv_column_list: A comma-separated list of column names for the CSV export. (These are listed a little further down.)
  • g2_engine_flags: An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section in the link below for further details.

Second, use the fetchNext() method to read the exportHandle and export a row of CSV output containing the entity data for a single entity. Note that the first call of fetchNext() will yield a header row, and that successive calls of fetchNext() will export successive rows of entity data. The fetchNext() method accepts the following parameters as input:

  • export_handle: A long integer from which resolved entity data may be read and exported
  • response_bytearray: A memory buffer for returning the response document; if an error occurred, an error response is stored here
In [58]:
try:
    headers = 'RESOLVED_ENTITY_ID,RESOLVED_ENTITY_NAME,RELATED_ENTITY_ID,MATCH_LEVEL,MATCH_KEY,IS_DISCLOSED,IS_AMBIGUOUS,DATA_SOURCE,RECORD_ID,JSON_DATA,LAST_SEEN_DT,NAME_DATA,ATTRIBUTE_DATA,IDENTIFIER_DATA,ADDRESS_DATA,PHONE_DATA,RELATIONSHIP_DATA,ENTITY_DATA,OTHER_DATA'
    export_handle = g2_engine.exportCSVEntityReportV2(headers,g2_engine_flags)

    while True:
        response_bytearray = bytearray()
        g2_engine.fetchNext(export_handle, response_bytearray)
        if not response_bytearray:
            break
        print(response_bytearray.decode())

    g2_engine.closeExport(export_handle)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RESOLVED_ENTITY_ID,RESOLVED_ENTITY_NAME,RELATED_ENTITY_ID,MATCH_LEVEL,MATCH_KEY,IS_DISCLOSED,IS_AMBIGUOUS,DATA_SOURCE,RECORD_ID,JSON_DATA,LAST_SEEN_DT,NAME_DATA,ATTRIBUTE_DATA,IDENTIFIER_DATA,ADDRESS_DATA,PHONE_DATA,RELATIONSHIP_DATA,ENTITY_DATA,OTHER_DATA

1,"JOHN M SMITH",0,0,"",0,0,"TEST","49AD3C65923BFAC7275ACB06019131E353B0373C","{""NAMES"":[{""NAME_TYPE"":""PRIMARY"",""NAME_LAST"":""Smith"",""NAME_FIRST"":""John"",""NAME_MIDDLE"":""M""}],""PASSPORT_NUMBER"":""PP11111"",""PASSPORT_COUNTRY"":""US"",""DRIVERS_LICENSE_NUMBER"":""DL11111"",""SSN_NUMBER"":""111-11-1111"",""DATA_SOURCE"":""TEST"",""ENTITY_TYPE"":""TEST"",""DSRC_ACTION"":""A"",""LENS_CODE"":""DEFAULT""}","2020-07-08 23:44:42.405","PRIMARY: Smith John M","","DRLIC: DL11111,PASSPORT: PP11111 US,SSN: 111-11-1111","","","","",""

1,"JOHN M SMITH",5,2,"+SSN+DRLIC+PASSPORT+PNAME",0,0,"TEST","1","{""NAMES"":[{""NAME_TYPE"":""PRIMARY"",""NAME_LAST"":""Jones"",""NAME_FIRST"":""John"",""NAME_MIDDLE"":""M""}],""PASSPORT_NUMBER"":""PP11111"",""PASSPORT_COUNTRY"":""US"",""DRIVERS_LICENSE_NUMBER"":""DL11111"",""SSN_NUMBER"":""111-11-1111"",""DATA_SOURCE"":""TEST"",""ENTITY_TYPE"":""TEST"",""DSRC_ACTION"":""A"",""LENS_CODE"":""DEFAULT"",""RECORD_ID"":""1""}","2020-07-08 23:44:42.649","PRIMARY: Jones John M","","DRLIC: DL11111,PASSPORT: PP11111 US,SSN: 111-11-1111","","","","",""

2,"MAX W MILLER",0,0,"",0,0,"TEST","2","{""NAMES"":[{""NAME_TYPE"":""PRIMARY"",""NAME_LAST"":""Miller"",""NAME_FIRST"":""Max"",""NAME_MIDDLE"":""W""}],""SSN_NUMBER"":""111-11-1111"",""DATA_SOURCE"":""TEST"",""ENTITY_TYPE"":""TEST"",""DSRC_ACTION"":""A"",""LENS_CODE"":""DEFAULT"",""RECORD_ID"":""2""}","2020-07-08 23:44:42.468","PRIMARY: Miller Max W","","SSN: 111-11-1111","","","","",""

3,"MILDRED MILLER",0,0,"",0,0,"TEST","3","{""NAMES"":[{""NAME_TYPE"":""PRIMARY"",""NAME_LAST"":""Miller"",""NAME_FIRST"":""Mildred""}],""SSN_NUMBER"":""111-11-1111"",""DATA_SOURCE"":""TEST"",""ENTITY_TYPE"":""TEST"",""DSRC_ACTION"":""A"",""LENS_CODE"":""DEFAULT"",""RECORD_ID"":""3""}","2020-07-08 23:44:42.477","PRIMARY: Miller Mildred","","SSN: 111-11-1111","","","","",""

5,"JOHN M JONES",0,0,"",0,0,"TEST","1","{""NAMES"":[{""NAME_TYPE"":""PRIMARY"",""NAME_LAST"":""Jones"",""NAME_FIRST"":""John"",""NAME_MIDDLE"":""M""}],""PASSPORT_NUMBER"":""PP11111"",""PASSPORT_COUNTRY"":""US"",""DRIVERS_LICENSE_NUMBER"":""DL11111"",""SSN_NUMBER"":""111-11-1111"",""DATA_SOURCE"":""TEST"",""ENTITY_TYPE"":""TEST"",""DSRC_ACTION"":""A"",""LENS_CODE"":""DEFAULT"",""RECORD_ID"":""1""}","2020-07-08 23:44:42.649","PRIMARY: Jones John M","","DRLIC: DL11111,PASSPORT: PP11111 US,SSN: 111-11-1111","","","","",""

5,"JOHN M JONES",1,2,"+SSN+DRLIC+PASSPORT+PNAME",0,0,"TEST","49AD3C65923BFAC7275ACB06019131E353B0373C","{""NAMES"":[{""NAME_TYPE"":""PRIMARY"",""NAME_LAST"":""Smith"",""NAME_FIRST"":""John"",""NAME_MIDDLE"":""M""}],""PASSPORT_NUMBER"":""PP11111"",""PASSPORT_COUNTRY"":""US"",""DRIVERS_LICENSE_NUMBER"":""DL11111"",""SSN_NUMBER"":""111-11-1111"",""DATA_SOURCE"":""TEST"",""ENTITY_TYPE"":""TEST"",""DSRC_ACTION"":""A"",""LENS_CODE"":""DEFAULT""}","2020-07-08 23:44:42.405","PRIMARY: Smith John M","","DRLIC: DL11111,PASSPORT: PP11111 US,SSN: 111-11-1111","","","","",""

Redo Processing

Redo records are automatically created by Senzing when certain conditions occur where it believes more processing may be needed. Some examples:

  • A value becomes generic and previous decisions may need to be revisited
  • Clean up after some record deletes
  • Detected related entities were being changed at the same time
  • A table inconsistency exists, potentially after a non-graceful shutdown

First we will need to have a total of 6 data sources so let's add 4 more.

Create Record and Entity #6

In [59]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Owens",
        "NAME_FIRST": "Lily"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_4,
        record_id_4,
        data_as_json,
        None)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
In [60]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_4,
        record_id_4,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_6 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
    print("Entity ID: {0}".format(entity_id_6))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)
Entity ID: 6

Create Record and Entity #7

In [61]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Bauler",
        "NAME_FIRST": "August",
        "NAME_MIDDLE": "E"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_5,
        record_id_5,
        data_as_json,
        None)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
In [62]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_5,
        record_id_5,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_7 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
    print("Entity ID: {0}".format(entity_id_7))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)
Entity ID: 7

Create Record and Entity #8

In [63]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Barcy",
        "NAME_FIRST": "Brian",
        "NAME_MIDDLE": "H"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_6,
        record_id_6,
        data_as_json,
        None)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
In [64]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_6,
        record_id_6,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_8 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
    print("Entity ID: {0}".format(entity_id_8))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)
Entity ID: 8

Create Record and Entity #9

In [65]:
data = {
    "NAMES": [{
        "NAME_TYPE": "PRIMARY",
        "NAME_LAST": "Miller",
        "NAME_FIRST": "Jack",
        "NAME_MIDDLE": "H"
    }],
    "SSN_NUMBER": "111-11-1111"
}
data_as_json = json.dumps(data)

try:
    g2_engine.replaceRecord(
        datasource_code_7,
        record_id_7,
        data_as_json,
        None)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
In [66]:
response_bytearray = bytearray()

try:
    g2_engine.getEntityByRecordID(
        datasource_code_7,
        record_id_7,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    entity_id_9 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
    print("Entity ID: {0}".format(entity_id_9))

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)
Entity ID: 9

countRedoRecords

Once the Senzing engine is initialized, use countRedoRecords() to return the remaining internally queued maintenance records in the Senzing repository. countRedoRecords() takes no arguments and returns <0 for errors.

In [67]:
return_code = g2_engine.countRedoRecords()

getRedoRecord

Once the Senzing engine is initialized, use getRedoRecord() to retrieve the next internally queued maintenance record into the Senzing repository -- getRedoRecord() can be called as many times as desired and from multiple threads at the same time but all threads are required to be in the same process. getRedoRecord() should not be called from multiple processes. Unlike processRedoRecord(), getRedoRecord() does not actually process the record. To process the record, you would use the G2Engine process() function. The getRedoRecord() function returns "0" upon success and an empty response if there is nothing to do.

  • response_bytearray: A memory buffer for returning the maintenance document (may be XML or JSON). The format is internal to Senzing. If empty it means there are no maintenance records to return.
In [68]:
response_bytearray = bytearray()

try:
    g2_engine.getRedoRecord(response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

processWithInfo

In [69]:
if (response_bytearray):
    try:
        process_response_bytearray = bytearray()

        g2_engine.processWithInfo(
            response_bytearray.decode(),
            process_response_bytearray)

    except G2Exception.G2ModuleGenericException as err:
        print(g2_engine.getLastException())

    RenderJSON(process_response_bytearray)

process

In [70]:
if (response_bytearray):
    try:
        g2_engine.process(response_bytearray.decode())

    except G2Exception.G2ModuleGenericException as err:
        print(g2_engine.getLastException())

processRedoRecord

This processes the next redo record and returns it (If processRedoRecord() "response" returns 0 and "response_bytearray" is blank then there are no more redo records to process and if you do count.RedoRecords() again it will return 0) Has potential to create more redo records in certian situations.

  • response_bytearray: A buffer that returns a JSON object that summaries the changes cased by adding the record. Also contains the recordID.
In [71]:
response_bytearray = bytearray()
try:
    g2_engine.processRedoRecord(response_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
# Pretty-print XML.

xml_string = response_bytearray.decode()
if len(xml_string) > 0:
    import xml.dom.minidom
    xml = xml.dom.minidom.parseString(xml_string)
    xml_pretty_string = xml.toprettyxml()
    print(xml_pretty_string)
<?xml version="1.0" ?>
<UMF_DOC>
	<OBS>
		<OBS_ENT>
			<ENT_SRC_KEY>C1F081BB9C9EDA71A73873FAF39774DF802A3B9F</ENT_SRC_KEY>
			<ENT_SRC_DESC>Lily Owens</ENT_SRC_DESC>
			<RECORD_ID>4</RECORD_ID>
			<ETYPE_CODE>TEST</ETYPE_CODE>
		</OBS_ENT>
		<LENS_LIST>
			<LENS_CODE>DEFAULT</LENS_CODE>
		</LENS_LIST>
		<DSRC_CODE>TEST</DSRC_CODE>
		<DSRC_ACTION>X</DSRC_ACTION>
		<OBS_SRC_KEY>REDO_QUEUE</OBS_SRC_KEY>
	</OBS>
	<REDO_EVALUATION_FOR_GENERIC>
		<FEATURE_ID>2</FEATURE_ID>
		<LENS_ID>1</LENS_ID>
		<ECLASS_ID>1</ECLASS_ID>
	</REDO_EVALUATION_FOR_GENERIC>
</UMF_DOC>

processRedoRecordWithInfo

processRedoRecordWithInfo() is available if you would like to know what resolved entities were modified when processing a redo record. It behaves identically to processRedoRecord(), but also returns a json document containing the IDs of the affected entities. It accepts the following parameters:

  • response_bytearray: A buffer that returns a JSON object that summaries the changes cased by adding the record. Also contains the recordID.
  • response_bytearray: A buffer that returns a JSON object that summaries the changes cased by adding the record. Also contains the recordID.
In [72]:
response_bytearray = bytearray()
info_bytearray = bytearray()

try:
    g2_engine.processRedoRecordWithInfo(
        response_bytearray,
        info_bytearray)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

# Pretty-print XML.

xml_string = response_bytearray.decode()
if len(xml_string) > 0:
    import xml.dom.minidom
    xml = xml.dom.minidom.parseString(xml_string)
    xml_pretty_string = xml.toprettyxml()
    print(xml_pretty_string)

# Pretty-print JSON

RenderJSON(info_bytearray)
<?xml version="1.0" ?>
<UMF_DOC>
	<OBS>
		<OBS_ENT>
			<ENT_SRC_KEY>B862AA44434BCFC5695546A2C0E87E24A42D4C8D</ENT_SRC_KEY>
			<ENT_SRC_DESC>John M Jones</ENT_SRC_DESC>
			<RECORD_ID>1</RECORD_ID>
			<ETYPE_CODE>TEST</ETYPE_CODE>
		</OBS_ENT>
		<LENS_LIST>
			<LENS_CODE>DEFAULT</LENS_CODE>
		</LENS_LIST>
		<DSRC_CODE>TEST</DSRC_CODE>
		<DSRC_ACTION>X</DSRC_ACTION>
		<OBS_SRC_KEY>REDO_QUEUE</OBS_SRC_KEY>
	</OBS>
	<REDO_EVALUATION_FOR_GENERIC>
		<FEATURE_ID>2</FEATURE_ID>
		<LENS_ID>1</LENS_ID>
		<ECLASS_ID>1</ECLASS_ID>
	</REDO_EVALUATION_FOR_GENERIC>
</UMF_DOC>

Delete

deleteRecord

Use deleteRecord() to remove a record from the data repository (returns "0" upon success); deleteRecord() can be called as many times as desired and from multiple threads at the same time. The deleteRecord() function accepts three parameters as input:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system.
  • record_id: The record ID, used to identify distinct records
  • load_id: The observation load ID for the record; value can be null and will default to dataSourceCode
In [73]:
try:
    g2_engine.deleteRecord(datasource_code_1, record_id_1, load_id)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

deleteRecordWithInfo

deleteRecordWithInfo() behaves the same as deleteRecord() but also returns a json document containing the IDs of the affected entities. It accepts the following parameters:

  • datasource_code: The name of the data source the record is associated with. This value is configurable to the system.
  • record_id: The record ID, used to identify distinct records.
  • response_bytearray: A buffer that returns a JSON object that summaries the changes cased by adding the record. Also contains the recordID.
  • load_id: The observation load ID for the record; value can be null and will default to dataSourceCode
In [74]:
response_bytearray = bytearray()

try:
    g2_engine.deleteRecordWithInfo(
        datasource_code_2,
        record_id_2,
        response_bytearray,
        load_id,
        g2_engine_flags)

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())
RenderJSON(response_bytearray)

Attempt to get the record again. It should error and give an output similar to "Unknown record".

In [75]:
try:
    response_bytearray = bytearray()

    return_code = g2_engine.getRecord(
        datasource_code_1,
        record_id_1,
        response_bytearray)

    response_dictionary = json.loads(response_bytearray)
    response = json.dumps(response_dictionary, sort_keys=True, indent=4)
    print("Return Code: {0}\n{1}".format(return_code, response))
except G2Exception.G2ModuleGenericException as err:
    print("Exception: {0}".format(err))
Exception: 0033E|Unknown record: dsrc[TEST], record[1]

Cleanup

To purge the G2 repository, use the aptly named purgeRepository() method. This will remove every record in your current repository.

purgeRepository

In [76]:
try:
    g2_engine.purgeRepository()

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException())

destroy

Once all searching is done in a given process, call destroy() to uninitialize Senzing and clean up resources. You should always do this once at the end of each process.

In [77]:
try:
    g2_engine.destroy()

except G2Exception.G2ModuleGenericException as err:
    print(g2_engine.getLastException)