Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPerformance benchmarks broken on Windows #333

Closed
12 tasks done
gvanrossum opened this issue Mar 23, 2022 · 27 comments
Closed
12 tasks done

PyPerformance benchmarks broken on Windows #333

gvanrossum opened this issue Mar 23, 2022 · 27 comments
Assignees

Comments

@gvanrossum
Copy link
Collaborator

gvanrossum commented Mar 23, 2022

With the latest PyPerformance (from its main branch at python/pyperformance@098ffc9), on Windows, with the latest CPython main, these benchmarks are failing:

  • chameleon
  • crypto_pyaes
  • django_template
  • dulwich_log
  • genshi
  • html5lib
  • mako
  • regex_compile
  • sqlalchemy_declarative
  • sqlalchemy_imperative
  • sympy
  • tornado_http

I'll try to go into more detail below.

@gvanrossum
Copy link
Collaborator Author

Genshi installs properly but gives this error when running:

Traceback (most recent call last):
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python310\lib\site-packages\pyperformance\data-files\benchmarks\bm_genshi\run_benchmark.py", line 7, in <module>
    from genshi.template import MarkupTemplate, NewTextTemplate
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'genshi'

@gvanrossum
Copy link
Collaborator Author

When I run the offending command at the REPL I get a perhaps more informative (certainly longer) traceback:

>>> from genshi.template import MarkupTemplate, NewTextTemplate
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\gvanrossum\bench\venv\cpython3.11-5d61ebee4deb-compat-f6a835d45d46\Lib\site-packages\genshi\template\__init__.py", line 20, in <module>
    from genshi.template.markup import MarkupTemplate
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\bench\venv\cpython3.11-5d61ebee4deb-compat-f6a835d45d46\Lib\site-packages\genshi\template\markup.py", line 25, in <module>
    from genshi.template.interpolation import interpolate
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\bench\venv\cpython3.11-5d61ebee4deb-compat-f6a835d45d46\Lib\site-packages\genshi\template\interpolation.py", line 33, in <module>
    token_re = re.compile('%s|%s(?s)' % (
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\cpython\Lib\re.py", line 232, in compile
    return _compile(pattern, flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\cpython\Lib\re.py", line 284, in _compile
    p = sre_compile.compile(pattern, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\cpython\Lib\sre_compile.py", line 780, in compile
    p = sre_parse.parse(p, flags)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\cpython\Lib\sre_parse.py", line 963, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\cpython\Lib\sre_parse.py", line 447, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\cpython\Lib\sre_parse.py", line 822, in _parse
    raise source.error('global flags not at the start '
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
re.error: global flags not at the start of the expression at position 994

I don't know the first thing about genshi. But it looks like either it is hitting a bug in re or it is messing with it in a weird way.

According to the debugger, the pattern being compiled is this (this is the repr()):

'[uU]?[rR]?("""|\\\'\\\'\\\')((?<!\\\\)\\\\\\1|.)*?\\1|[ \\f\\t]*((\\\\\\r?\\n|\\Z|#[^\\r\\n]*|((|Br|F|fr|rb|b|Fr|bR|R|Rf|B|BR|FR|RF|Rb|rF|fR|br|u|r|rB|RB|f|U|rf)\'\'\'|(|Br|F|fr|rb|b|Fr|bR|R|Rf|B|BR|FR|RF|Rb|rF|fR|br|u|r|rB|RB|f|U|rf)"""))|(([0-9](?:_?[0-9])*[jJ]|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)[jJ])|(([0-9](?:_?[0-9])*\\.(?:[0-9](?:_?[0-9])*)?|\\.[0-9](?:_?[0-9])*)([eE][-+]?[0-9](?:_?[0-9])*)?|[0-9](?:_?[0-9])*[eE][-+]?[0-9](?:_?[0-9])*)|(0[xX](?:_?[0-9a-fA-F])+|0[bB](?:_?[01])+|0[oO](?:_?[0-7])+|(?:0(?:_?0)*|[1-9](?:_?[0-9])*)))|(\\r?\\n|(\\~|\\}|\\|=|\\||\\{|\\^=|\\^|\\]|\\[|@=|@|>>=|>>|>=|>|==|=|<=|<<=|<<|<|;|:=|:|/=|//=|//|/|\\.\\.\\.|\\.|\\->|\\-=|\\-|,|\\+=|\\+|\\*=|\\*\\*=|\\*\\*|\\*|\\)|\\(|\\&=|\\&|%=|%|!=))|((|Br|F|fr|rb|b|Fr|bR|R|Rf|B|BR|FR|RF|Rb|rF|fR|br|u|r|rB|RB|f|U|rf)\'[^\\n\'\\\\]*(?:\\\\.[^\\n\'\\\\]*)*(\'|\\\\\\r?\\n)|(|Br|F|fr|rb|b|Fr|bR|R|Rf|B|BR|FR|RF|Rb|rF|fR|br|u|r|rB|RB|f|U|rf)"[^\\n"\\\\]*(?:\\\\.[^\\n"\\\\]*)*("|\\\\\\r?\\n))|\\w+)(?s)'

@gvanrossum
Copy link
Collaborator Author

Okay, this seems to be caused by a re feature that was deprecated and is now gone. It seems to have to do with bpo-47066, GH-31994. We need to fix this on the Genshi side.

@gvanrossum
Copy link
Collaborator Author

Filed edgewall/genshi#66 -- I have no idea how to fix this.

@gvanrossum
Copy link
Collaborator Author

The regex_compile failure is odd. I see this traceback:

Traceback (most recent call last):
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python310\lib\site-packages\pyperformance\data-files\benchmarks\bm_regex_compile\run_benchmark.py", line 68, in <module>
    regexes = capture_regexes()
              ^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python310\lib\site-packages\pyperformance\data-files\benchmarks\bm_regex_compile\run_benchmark.py", line 39, in capture_regexes
    import bm_regex_effbot
    ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python310\lib\site-packages\pyperformance\data-files\benchmarks\bm_regex_compile\bm_regex_effbot.py", line 1
    ../bm_regex_effbot/run_benchmark.py
    ^
SyntaxError: invalid syntax

I suspect that's a file containing a symbolic link. @ericsnowcurrently Since symlinks don't work reliably on Windows, can we just replace that with the contents of the target file?

@gvanrossum gvanrossum self-assigned this Mar 23, 2022
@gvanrossum
Copy link
Collaborator Author

For chameleon I get a simple traceback:

Traceback (most recent call last):
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python310\lib\site-packages\pyperformance\data-files\benchmarks\bm_chameleon\run_benchmark.py", line 5, in <module>
    from chameleon import PageTemplate
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'chameleon'

However, when I manually start the Python executable from the (shared) venv and type that same line from chameleon import PageTemplate it works fine.

@ericsnowcurrently Can you look into this? Maybe something's wrong with the venv and possibly whatever is wrong is also wrong on Linux? Otherwise are there things you'd like me to try to debug this? I'm stumped.

@gvanrossum
Copy link
Collaborator Author

FWIW I cannot repro the chameleon failure on macOS, so I suppose it's Windows specific. But I am quite stuck.

gvanrossum added a commit to python/pyperformance that referenced this issue Mar 24, 2022

The regex_compile benchmark uses two symlinks, but those don't work reliably on Windows. Replace them with real files.

See faster-cpython/ideas#333
@gvanrossum
Copy link
Collaborator Author

Fixed the regex_compile issue in python/pyperformance@aa2360d.

@gvanrossum
Copy link
Collaborator Author

The mako test fails with a similar failure as chameleon -- the import mako fails with ModuleNotFoundError: No module named 'mako'. There were no errors in the install, and manually running the Python from the venv and importing mako succeeds.

@gvanrossum
Copy link
Collaborator Author

Similar for django_template and crypto_pyaes. I feel there's something fundamentally broken on Windows whenever a benchmark depends on an installed package.

@ericsnowcurrently
Copy link
Collaborator

Are these failures related to my recent changes in pyperformance or were you not able to get that far before?

@gvanrossum
Copy link
Collaborator Author

I went back to the last commit before your recent series (it was from Feb 11) and fixed the base_executable issue again, and that produced the same problem.

With head, it seems the command that's being run is something like this:

C:\Users\gvanrossum\cpython\PCbuild\amd64\python.exe -u 'C:\Users\gvanrossum\AppData\Local\Programs\Python\Python310\lib\site-packages\pyperformance\data-files\benchmarks\bm_mako\run_benchmark.py' --verbose

Something seems off there, it doesn't use the python.exe inside the venv. I'd say this piece of code is suspect:

    is_venv = (sys.prefix != sys.base_prefix)
    base_executable = getattr(sys, '_base_executable', None)
    if is_venv:
        # XXX There is probably a bug related to venv, since
        # sys._base_executable should be different.
        if base_executable == sys.executable:
            # Indicate that we don't know.
            base_executable = None
    elif not base_executable:
        base_executable = sys.executable

(That XXX comment feels like it might be on the money?)

Hm, I just found something else. It looks like the venv is based on Python 3.10 somehow?! That would be the Python I used to run pyperformance. Here's the command I run:

py -3.10 -m pyperformance run --python C:\Users\gvanrossum\cpython\PCbuild\amd64\python.exe -o temp.json -b chameleon

Let me know how I can help you debug this.

@gvanrossum
Copy link
Collaborator Author

PS. It's kind of distracting that whenever the benchmark dies, pyperformance itself prints a traceback, like this:

ERROR: Benchmark mako failed: Benchmark died
Traceback (most recent call last):
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyperformance\run.py", line 144, in run_benchmarks
    result = bench.run(
             ^^^^^^^^^^
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyperformance\_benchmark.py", line 185, in run
    bench = _run_perf_script(
            ^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\Lib\contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyperformance\_utils.py", line 38, in temporary_file
    yield tmp_filename
    ^^^^^^^^^^^^^^^^^^
  File "C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\Lib\site-packages\pyperformance\_benchmark.py", line 232, in _run_perf_script
    raise RuntimeError("Benchmark died")
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Benchmark died

@gvanrossum
Copy link
Collaborator Author

Hah! It's the --python flag that's broken. If I use the target Python to run pyperformance it works. And now the WARNING: unable to increase process priority error is back too...

At least now I can run more benchmarks and see how they fare.

@gvanrossum
Copy link
Collaborator Author

So what's left is dulwich, genshi, and the two sqlalchemy benchmarks (the latter both depending on greenlet). All these seem to have compilation errors.

@gvanrossum
Copy link
Collaborator Author

gvanrossum commented Mar 24, 2022

The genshi issue has been fixed in genshi but there's no release yet.
I can prove this by replacing the genshi==0.7.6 line with

git+https://github.com/edgewall/genshi@605de3b#egg=genshi

in bm_genchi/requirements.txt, but that seems a bit brittle. I flagged this in the genshi project at edgewall/genshi#66 -- how long should we wait?

@gvanrossum
Copy link
Collaborator Author

News about the sqlalchemy-{declarative,imperative} benchmarks, which depend on greenlet. The requirements.txt files have greenlet==2.0.0.a1, but a new release came out two days ago, 2.0.0a2. So I edited the requirements.txt files to use that, and now I have, um, interesting behavior.

  • Initially, running these benchmarks causes the same problem where the pip install fails with error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/. (I get the error several times due to retries by pip and in different venvs.)
  • Installing this version of greenlet manually with <python-binary> -m python install greenlet==2.0.0a2 works.
  • After this, even if I remove all venvs used by PyPerformance, running the sqlalchemy* benchmarks works too!

The latter gives the message

Collecting greenlet==2.0.0a2
  Using cached greenlet-2.0.0a2-cp311-cp311-win_amd64.whl

I don't know where this cached file lives, but it sure didn't come from PyPI (there's no 3.11 greenlet wheel there), so I'm guessing it's some kind of cache used by pip shared between venvs (I have no understanding of how pip caching works on Windows or else I'd look there to confirm this -- I have no doubt I would find it).

So now the remaining issue is that initial error about the MSVC compiler. I'm guessing this is due to how the venvs are created by PyPerformance. A more useful error showed when I tried to pip install greenlet==2.0.0a2 by directly invoking the python binary in the venv (...\python.exe -m pip install -U greenlet==2.0.0a2):

      "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\gvanrossum\bench\venv\cpython3.11-5d61ebee4deb-compat-f6a835d45d46-bm-sqlalchemy_declarative\include -IC:\Users\gvanrossum\cpython\include -IC:\Users\gvanrossum\cpython\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /EHsc /Tpsrc/greenlet/greenlet.cpp /Fobuild\temp.win-amd64-3.11\Release\src/greenlet/greenlet.obj /EHsc /GT     
      greenlet.cpp
      C:\Users\gvanrossum\cpython\include\Python.h(12): fatal error C1083: Cannot open include file: 'pyconfig.h': No such file or directory     
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Enterprise\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

@notatallshaw
Copy link

notatallshaw commented Mar 27, 2022

FWIW my understanding is pip uses cachecontrol to cache it's HTTP(S) downloads and then has some abstract layers built on top of that for things like name normalization. Edit: Although below commands still show wheel cache I realize checking the Pip code I am mistaken and that the wheel cache is built separately from the HTTP cache: https://github.com/pypa/pip/blob/22.0.4/src/pip/_internal/resolution/resolvelib/factory.py#L547

I'm fairly sure you can use pip cache dir and pip cache list to fully see where and what is being cached: https://pip.pypa.io/en/stable/cli/pip_cache/

@gvanrossum
Copy link
Collaborator Author

gvanrossum commented Mar 27, 2022

Yeah, moments after hitting Send I looked at pip -h and found pip cache and then pip cache purge which cleared out the cache.

It's still not 100% clear what sequence of steps I needed to get the install in the venv to work, but it seems to be mostly

~/cpython/python.bat -m pip install greenlet==2.0.0a2

which built and cached the necessary wheel. Very mysterious.

UPDATE: The precise set of commands is listed here: python/pyperformance#163 (comment)

@gvanrossum
Copy link
Collaborator Author

Using @brandtbucher's one-line command (... pip install --no-cache-dir --force-reinstall greenlet==2.0.0a2) I have found that the problem is due to the way the venv is set up. Using the python.exe built in the cpython repo (.\python.bat -m pip ...) it works. Using the python.exe in the venv (<venv>\Scripts\python.exe -m pip ...) it fails.

Now, it also seems to be failing when I create a brand new venv using ...\python.bat -m venv lalala and then use lalala\Scripts\python.exe -m pip ... to install greenlet. I even tried updating the pip and setuptools version in the venv, and that still fails.

So I'm beginning to suspect that there's something broken in venv itself, when used with the uninstalled cpython binary to build packages from source. I can build from greenlet (2.0.0a2) from source just fine in a venv created using py -3.11 -m venv hohoho. That is using 3.11a6 installed using the official python.org installer.

The problem is that the build cannot find pyconfig.h.

I tried diffing the two venvs (lalala == using python.exe from repo, hohoho == using python.exe installed from python.org) and found this:

diff -u -r lalala/pyvenv.cfg hohoho/pyvenv.cfg
--- lalala/pyvenv.cfg   2022-03-28 13:09:59.593390900 -0700
+++ hohoho/pyvenv.cfg   2022-03-28 13:13:30.015499500 -0700
@@ -1,5 +1,5 @@
-home = C:\Users\gvanrossum\cpython\PCbuild\amd64
+home = C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311
 include-system-site-packages = false
 version = 3.11.0
-executable = C:\Users\gvanrossum\cpython\PCbuild\amd64\python.exe
-command = C:\Users\gvanrossum\cpython\PCbuild\amd64\python.exe -m venv C:\Users\gvanrossum\bench\lalala
+executable = C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\python.exe
+command = C:\Users\gvanrossum\AppData\Local\Programs\Python\Python311\python.exe -m venv C:\Users\gvanrossum\bench\hohoho

Maybe someone with more understanding of venv implementation details can tell if something's wrong there?

Anyway, this means that pyperformance is not responsible for this particular problem -- it's due to venv.

This problem also doesn't appear to be specific to 3.11 -- using a 3.10 python.exe I built ages ago in a separate worktree I can repro the same problem: greenlet builds from source (--no-binary greenlet) using the python.exe in the repo, but not using the python.exe in a venv created using the former.

Next up I'll try to browse bpo for venv trouble on Windows with the cpython-repo-built python.exe.

@gvanrossum
Copy link
Collaborator Author

Possibly relevant:

@zooba Any idea?

@ericsnowcurrently
Copy link
Collaborator

@gvanrossum The issues with --python and psutil should be all sorted now.

@ericsnowcurrently
Copy link
Collaborator

So I'm beginning to suspect that there's something broken in venv itself, when used with the uninstalled cpython binary to build packages from source.

FYI, from https://pyperformance.readthedocs.io/usage.html#compile-python-to-run-benchmarks:

# WARNING: Running Python from the build directory introduces subtle changes
# compared to running an installed Python. Moreover, creating a virtual
# environment using a Python run from the build directory fails in many cases,
# especially on Python older than 3.4. Only disable installation if you
# really understand what you are doing!

@gvanrossum
Copy link
Collaborator Author

gvanrossum commented Mar 28, 2022

Nice lawyering, but I would never have thought to look in the docs for "compile" when I'm using the much simpler "run" command. :-)

Anyway, @zooba talked me through some of this. Apparently it's a bug in distutils (present both in the stdlib version and in the version vendored by setuptools) where a venv created from an uninstalled Python binary is supposed to look in the source tree's PC directory for include files (since that's where pyconfig.h lives on Windows) but somehow a bug in the check for a source tree misfires in this case.

Steve thought that using a "pseudo install" created using PC\layout might work around the issue, but this doesn't appear to be the case. So I'm still stuck tricking the pip cache.

**UPDATE: After merging #172 I no longer need to trick the pip cache.

@gvanrossum
Copy link
Collaborator Author

Looks like all benchmarks now compile and run on Windows, so I am closing this.

@Fidget-Spinner
Copy link
Collaborator

Thanks Eric and Guido for fixing pyperformance on Windows!

@gvanrossum
Copy link
Collaborator Author

Now we still need to update some of the docs (there are a few lies outdated sections, e.g. the --venv flag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants