PEP 685 – Comparison of extra names for optional distribution dependencies
- PEP
- 685
- Title
- Comparison of extra names for optional distribution dependencies
- Author
- Brett Cannon <brett at python.org>
- PEP-Delegate
- Paul Moore <p.f.moore at gmail.com>
- Discussions-To
- https://discuss.python.org/t/14141
- Status
- Draft
- Type
- Standards Track
- Created
- 08-Mar-2022
- Post-History
- 08-Mar-2022
Contents
Abstract
This PEP specifies how to normalize distribution extra names when performing comparisons. This prevents tools from either failing to find an extra name, or accidentally matching against an unexpected name.
Motivation
The Provides-Extra core metadata specification states that an extra’s
name “must be a valid Python identifier”.
PEP 508 specifies that the value of an extra
marker may contain a
letter, digit, or any one of .
, -
, or _
after the initial character.
Otherwise, there is no other PyPA specification
which outlines how extra names should be written or normalization for comparison.
Due to the amount of packaging-related code in existence,
it is important to evaluate current practices by the community and
standardize on one that doesn’t break most code, while being
something tool authors can agree to following.
The issue of there being no standard was brought forward by an
initial discussion
noting that the extra adhoc-ssl
was not considered equal to the name
adhoc_ssl
by pip 22.
Rationale
PEP 503 specifies how to normalize distribution names:
re.sub(r"[-_.]+", "-", name).lower()
This collapses any run of the substitution character down to a single
character,
e.g. ---
gets collapsed down to -
.
This does not produce a valid Python identifier as specified by the
core metadata 2.2 specification for extra names.
Setuptools 60 does normalization via:
re.sub('[^A-Za-z0-9.-]+', '_', name).lower()
The use of an underscore/_
differs from PEP 503’s use of a
hyphen/-
.
Runs of _
, unlike PEP 503, do not get collapsed,
e.g. ___
stays the same.
For pip 22, its “extra normalisation behaviour is quite convoluted and erratic” [pip-erratic], and so its use is not considered.
- [pip-erratic]
- https://discuss.python.org/t/what-extras-names-are-treated-as-equal-and-why/7614/10?
Specification
When comparing extra names, tools MUST normalize the names being compared using the semantics outlined in PEP 503 for names:
re.sub(r"[-_.]+", "-", name).lower()
The core metadata specification will be updated such that the allowed names for Provides-Extra matches what PEP 508 specifies for names. As this is a superset of what is currently allowed by the core metadata 2.2 specification, it allows for a loosening of the naming requirements. It will also bring extra naming in line with that of the Name field.
For tools writing core metadata, they MUST write out extra names in their normalized form. This applies to the Provides-Extra field and the extra marker when used in the Requires-Dist field.
Tools generating metadata MUST raise an error if a user specified two or more extra names which would normalize to the same name. Tools SHOULD warn users when an invalid extra name is read.
Backwards Compatibility
Moving to PEP 503 normalization and PEP 508 name acceptance, it allows for all preexisting, valid names to continue to be valid.
Based on research looking at a collection of wheels on PyPI [pypi-results], the risk of extra name clashes is limited to 73 clashes when considering even invalid names, while only looking at valid names leads to only 3 clashes:
- dev-test: dev_test, dev-test, dev.test
- dev-lint: dev-lint, dev.lint, dev_lint
- apache-beam: apache-beam, apache.beam
By requiring tools writing core metadata to only record the normalized name, the issue of preexisting, invalid extra names should be diminished over time.
- [pypi-results]
- https://discuss.python.org/t/pep-685-comparison-of-extra-names-for-optional-distribution-dependencies/14141/17?u=brettcannon
Security Implications
It is possible that for a distribution that has conflicting extra names, a tool ends up installing distributions that somehow weaken the security of the system. This is only hypothetical and if it were to occur, it would probably be more of a security concern for the distributions specifying such extras names rather than the distribution that pulled them in together.
How to Teach This
This should be transparent to users on a day-to-day basis. It will be up to tools to educate/stop users when they select extra names which conflict.
Reference Implementation
No reference implementation is provided aside from the code above,
but the expectation is the packaging project will provide a
function in its packaging.utils
that will implement extra name
normalization.
It will also implement extra name comparisons appropriately.
Finally, if the project ever gains the ability to write out metadata,
it will also implement this PEP.
Rejected Ideas
Normalize names according to PEP 503
For backwards-compatibility concerns, it was decided not to strictly follow how PEP 503 normalizes distribution names.
Open Issues
N/A
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/pep-0685.rst
Last modified: 2022-03-10 22:22:21 GMT