Summary
Python’s built‑in hash() function returns the integer value for most Python int objects.
The only built‑in integer that does not return its own value is -1, which returns -2.
This design choice is an escape hatch to prevent a forbidden sentinel value that would break the data‑structure implementation of dictionaries and sets.
Root Cause
- The C API for CPython uses the value -1 as a special marker:
PyDict_Next()returnsNULLon failure or a key when it finds one; an error in the iterator is indicated by a returned-1.- In hash tables,
-1is used to indicate a “never used” or “deleted” slot, depending on the table state.
- If an integer’s hash were
-1, the hash table would treat it as that sentinel, silently corrupting the dictionary or set. - To keep the hashing scheme simple (hash(val) = val for all ints) while preserving this sentinel, CPython maps
hash(-1)to -2, a value that is not used as a special marker.
Why This Happens in Real Systems
- Hash tables rely on a set of sentinel values to differentiate between empty, occupied, and deleted slots.
- Using
-1for both a hash value and a sentinel would make it impossible to distinguish a genuine key with hash-1from an empty bucket. - Many other integer‑hashing systems also use a special marker; Python chose
-2because it is the smallest integer that:- Is still a valid hash for signed 64‑bit integers.
- Is not ambigious with the sentinel
-1.
Real-World Impact
- Dictionary/set correctness: Without the remapping, inserting
-1as a key could silently overwrite or corrupt entries. - Performance: The remapping imposes no measurable overhead;
hash()is a fast inline operation. - Tooling: Developers must remember that
hash(-1)is not equal to-1; this is rarely a concern in application code but can trip up tests that comparekey == hash(key).
Example or Code (if necessary and relevant)
# Typical hash behavior
print(hash(42)) # 42
print(hash(-42)) # -42
# The special case
print(hash(-1)) # -2
/* CPython snippet from Objects/longobject.c */
if (self->ob_digit[0] == 0 && Py_SIZE(self) == 1)
return -2; /* hash(-1) --> -2 */
How Senior Engineers Fix It
- Understand sentinel semantics: Recognize that special values in low‑level APIs cannot overlap with legitimate hash results.
- Use a safe hash function: For custom integer‑like types, implement
__hash__to return the value except for-1, in which case return a reserved sentinel (commonly-2or*other value). - Validate data structures: Ensure that any mutable mapping that relies on hash values treats special markers correctly.
- Document the exception explicitly in the public API or documentation comments to avoid future regressions.
Why Juniors Miss It
- They assume identity hashes are trivial and ignore corner cases in system design.
- They are unaware that internal APIs use sentinel values that must not collide with user data.
- They rarely encounter a scenario where a key with hash
-1is inserted into a dictionary, so the consequence seems abstract. - They may overlook historical design trade‑offs made in early CPython releases, such as this escape hatch added in Python 2.6 to support negative start values for random integer generation.