Change function calls to use vectorcall (#5948)
* Make argument_vector re-usable for other types.
* Attempt to collect args into array for vectorcall
* Revert "Attempt to collect args into array for vectorcall"
This reverts commit 418a034195c5c8b517e7404686555c097efe7a4b.
* Implement vectorcall args collector
* pre-commit fixes
* Checkpoint in moving to METH_FASTCALL
* pre-commit fixes
* Use the names tuple directly, cleaner code and less reference counting
* Fix unit test, the code now holds more references
It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member.
* Make clangtidy happy
* Oops, _v is C++14
* style: pre-commit fixes
* Minor code cleanup
* Fix signed conversions
* Fix args expansion
This would be easier with `if constexpr`
* style: pre-commit fixes
* Code cleanup
* fix(tests): Install multiple-interpreter test modules into wheel
The `mod_per_interpreter_gil`, `mod_shared_interpreter_gil`, and
`mod_per_interpreter_gil_with_singleton` modules were being built
but not installed into the wheel when using scikit-build-core
(SKBUILD=true). This caused iOS (and potentially Android) CIBW
tests to fail with ModuleNotFoundError.
Root cause analysis:
- The main test targets have install() commands (line 531)
- The PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES were missing
equivalent install() commands
- For regular CMake builds, this wasn't a problem because
LIBRARY_OUTPUT_DIRECTORY places the modules next to pybind11_tests
- For wheel builds, only targets with explicit install() commands
are included in the wheel
This issue was latent until commit fee2527d changed the test imports
from `pytest.importorskip()` (graceful skip) to direct `import`
statements (hard failure), which exposed the missing modules.
Failing tests:
- test_multiple_interpreters.py::test_independent_subinterpreters
- test_multiple_interpreters.py::test_dependent_subinterpreters
Error: ModuleNotFoundError: No module named 'mod_per_interpreter_gil'
* tests: Pin numpy 2.4.0 for Python 3.14 CI tests
Add numpy==2.4.0 requirement for Python 3.14 (both default and
free-threaded builds). NumPy 2.4.0 is the first version to provide
official PyPI wheels for Python 3.14:
- numpy-2.4.0-cp314-cp314-manylinux_2_27_x86_64...whl (default)
- numpy-2.4.0-cp314-cp314t-manylinux_2_27_x86_64...whl (free-threaded)
Previously, CI was skipping all numpy-dependent tests for Python 3.14
because PIP_ONLY_BINARY was set and no wheels were available:
SKIPPED [...] test_numpy_array.py:8: could not import 'numpy':
No module named 'numpy'
With this change, the full numpy test suite will run on Python 3.14,
providing better test coverage for the newest Python version.
Note: Using exact pin (==2.4.0) rather than compatible release (~=2.4.0)
to ensure reproducible CI results with the first known-working version.
* tests: Add verbose flag to CIBW pytest command
Add `-v` to the pytest command in tests/pyproject.toml to help
diagnose hanging tests in CIBW jobs (particularly iOS).
This will show each test name as it runs, making it easier to
identify which specific test is hanging.
* tests: Skip subinterpreter tests on iOS, add pytest timeout
- Add `IOS` platform constant to `tests/env.py` for consistency with
existing `ANDROID`, `LINUX`, `MACOS`, `WIN`, `FREEBSD` constants.
- Skip `test_multiple_interpreters.py` module on iOS. Subinterpreters
are not supported in the iOS simulator environment. These tests were
previously skipped implicitly because the modules weren't installed
in the wheel; now that they are (commit 6ed6d5a8), we need an
explicit skip.
- Change pytest timeout from 0 (disabled) to 120 seconds. This provides
a safety net to catch hanging tests before the CI job times out after
hours. Normal test runs complete in 33-55 seconds total (~1100 tests),
so 120 seconds per test is very generous.
- Add `-v` flag for verbose output to help diagnose any future issues.
* More cleanups in argument vector, per comments.
* Per Cursor, move all versions to Vectorcall since it has been supported since 3.8.
This means getting rid of simple_collector, we can do the same with a constexpr if in the unpacking_collector.
* Switch to a bool vec for the used_kwargs flag...
This makes more sense and saves a sort, and the small_vector implementation means it will actually take less space than a vector of size_t elements.
The most common case is that all kwargs are used.
* Fix signedness for clang
* Another signedness issue
* tests: Disable pytest-timeout for Pyodide (no signal.setitimer)
Pyodide runs in a WebAssembly sandbox without POSIX signals, so
`signal.setitimer` is not available. This causes pytest-timeout to
crash with `AttributeError: module 'signal' has no attribute 'setitimer'`
when timeout > 0.
Override the test-command for Pyodide to keep timeout=0 (disabled).
* Combine temp storage and args into one vector
It's a good bit faster at the cost of this one scary reinterpret_cast.
* Phrasing
* Delete incorrect comment
At 6, the struct is 144 bytes (not 128 bytes as the comment said).
* Fix push_back
* Update push_back in argument_vector.h
Co-authored-by: Aaron Gokaslan <[email protected]>
* style: pre-commit fixes
* Use real types for these instead of object
They can be null if you "steal" a null handle.
* refactor: Replace small_vector<object> with ref_small_vector for explicit ownership
Introduce `ref_small_vector` to manage PyObject* references in `unpacking_collector`,
replacing the previous `small_vector<object>` approach.
Primary goals:
1. **Maintainability**: The previous implementation relied on
`sizeof(object) == sizeof(PyObject*)` and used a reinterpret_cast to
pass the object array to vectorcall. This coupling to py::object's
internal layout could break if someone adds a debug field or other
member to py::handle/py::object in the future.
2. **Readability**: The new `push_back_steal()` vs `push_back_borrow()`
API makes reference counting intent explicit at each call site,
rather than relying on implicit py::object semantics.
3. **Intuitive code**: Storing `PyObject*` directly and passing it to
`_PyObject_Vectorcall` without casts is straightforward and matches
what the C API expects. No "scary" reinterpret_cast needed.
Additional benefits:
- `PyObject*` is trivially copyable, simplifying vector operations
- Batch decref in destructor (tight loop vs N individual object destructors)
- Self-documenting ownership semantics
Design consideration: We considered folding the ref-counting functionality
directly into `small_vector` via template specialization for `PyObject*`.
We decided against this because:
- It would give `small_vector<PyObject*, N>` a different interface than the
generic `small_vector<T, N>` (steal/borrow vs push_back)
- Someone might want a non-ref-counting `small_vector<PyObject*, N>`
- The specialization behavior could surprise users expecting uniform semantics
A separate `ref_small_vector` type makes the ref-counting behavior explicit
and self-documenting, while keeping `small_vector` generic and predictable.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
Co-authored-by: Aaron Gokaslan <[email protected]>
4 files changed