Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

The ArcGIS Join Field tool is handy but extremely slow. This is a Python implementation that joins one field very fast using a Python dictionary.

Broad procedure here is to:

  • make a new empty column in the target table (tbl) on line 34
  • pull the new information to goes into that column from the remote (aka "join") table (jtbl) into a Python dictionary, on lines 41-49. Line 49 is where each row's new info actually gets put into the Python dictionary.
  • iterate over the rows in the target table on lines 53-59, pushing new information into the table with the rows object that is associated with the arcpy.da.UpdateCursor(). For each of the rows in this loop, we get the row identifier (key) on line 56 and use that as a look up in the Python dictionary (jdict) on line 57 to get the new column's value for that row. Then this updated version of the "row" (row) is pushed back into the target table on line 59.
def JoinOneField(tbl, key1, jtbl, key2, jfield, ftype="LONG", fprec="", flen="",
                 fname=""):
    """Joins one field from another table (like Join Field, but fast,
    and field values are not modified for no-matches)!
    tbl - source table
    key1 - source table key field
    jtbl - join table
    key2 - join field (must be unique in jtbl)
    jfield - field to copy over
    ftype - field type
    fprec - field precision
    flen - field length (for ftype="TEXT")
    fname - field name (if different from jfield)
    The function returns a two-integer tuple of number of matches and no-matches.
    ** The field data values are not changed for no-matches **
    JoinOneField(srctbl, key1, jtbl, key2, NHDPlusID", "DOUBLE")"""
    #
    # Curtis Price, cprice@usgs.gov
    #
 
    # by default the join field and destination field are the same
    if not fname:
        fname = jfield
    # for shapefile output make sure we have enough precision for double
    if ftype.upper() == "DOUBLE" and not fprec:
        fprec = 15
    if not arcpy.ListFields(tbl, key1):
        raise Exception("Key Field {} not found in {}".format(key1, tbl))
    if not arcpy.ListFields(jtbl, key2):
        raise Exception("Key Field {} not found in {}".format(jfield, jtbl))
    if not arcpy.ListFields(jtbl, jfield):
        raise Exception("Join Field {} not found in {}".format(jfield, jtbl))
    try:
        arcpy.AddField_management(tbl, fname, ftype, fprec, "", flen)
    except:
        pass
    # load data into dictionary
    jdict = {}
    k = 0
    first_dupe = True
    with arcpy.da.SearchCursor(jtbl, [key2, jfield]) as rows:
        for row in rows:
            key, val = row
            if first_dupe and jdict.has_key(key):
                # display a single warning message for one to many joins
                print("Key field {} not unique in {}".format(key2, jtbl))
                first_dupe = False
            else:
                jdict[key] = val
    # copy data values to output table
    num_match = 0
    num_nomatch = 0
    with arcpy.da.UpdateCursor(tbl, [key1, fname]) as rows:
        for row in rows:
            try:
                key = row[0]
                row[1] = jdict[key]
                num_match += 1
                rows.updateRow(row)
            except KeyError:
                # no match in dictionary (join no-match)
                # do nothing -- do not change existing value
                num_nomatch += 1
                ## print ("{}:{}".format(key, "no match"))
            except RuntimeError as msg:
                if str(msg).find("The value type is incompatible") != -1:
                    raise Exception(
                        "Invalid value for field {}: {!r}".format(
                        jfield, jdict[key]))
                else:
                    # some other error
                    raise
    # return number of rows matched and not matched
##    GPMsg("JoinOneField: {} rows, {} matches, {} no-matches".format(
##        num_match + num_nomatch, num_match, num_nomatch))
    return num_match, num_nomatch

 

 

  • No labels

1 Comment

  1. Unknown User (cprice@usgs.gov)

    Your comments look good Roland, thanks for adding the description.