Profiling python C extensions
I have developed a python C-extension that receives data from python and compute some cpu intensive calculations. It's possible to profile the C-extension?
The problem here is that writing a sample test in C to be profiled would be challenging because the code rely on particular inputs and data structures (generated by python control code).
Do you have any suggestions?
After the comment by pygabriel I decided to upload a package to pypi that implements a profiler for python extensions using the cpu-profiler from google-perftools: http://pypi.python.org/pypi/yep
I've found my way using google-perftools. The trick was to wrap the functions StartProfiler and StopProfiler in python (throught cython in my case).
To profile the C extension is sufficient to wrap the python code inside the StartProfiler and StopProfiler calls.
from google_perftools_wrapped import StartProfiler, StopProfiler import c_extension # extension to profile c_extension.so StartProfiler("output.prof") ... calling the interesting functions from the C extension module ... StopProfiler()
Then to analyze for example you can export in callgrind format and see the result in kcachegrind:
pprof --callgrind c_extension.so output.prof > output.callgrind kcachegrind output.callgrind
With gprof, you can profile any program that was properly compiled and linked (gcc -pg etc, in gprof's case). If you're using a Python version not built with gcc (e.g., the Windows precompiled version the PSF distributes), you'll need to research what equivalent tools exist for that platform and toolchain (in the Windows PSF case, maybe mingw can help). There may be "irrelevant" data there (internal C functions in the Python runtime), and, if so, the percentages shown by gprof may not be applicable -- but the absolute numbers (of calls, and durations thereof) are still valid, and you can post-process gprof's output (e.g., with a little Python script;-) to exclude the irrelevant data and compute the percentages you want.
One of my colleague told me ltrace(1). It helped me on the same situation quite a lot.
Assume the shared object name of your C extention is myext.so and you want to execute benchmark.py, then
ltrace -x @myext.so -c python benchmark.py
Its output is like
% time seconds usecs/call calls function ------ ----------- ----------- --------- -------------------- 24.88 30.202126 7550531 4 ldap_result 12.46 15.117625 7558812 2 l_ldap_result4 12.41 15.059652 5019884 3 ldap_chase_v3referrals 12.41 15.057678 3764419 4 ldap_new_connection 12.40 15.050310 3762577 4 ldap_int_open_connection 12.39 15.042360 3008472 5 ldap_send_server_request 12.38 15.029055 3757263 4 ldap_connect_to_host 0.05 0.057890 28945 2 ldap_get_option 0.04 0.052182 26091 2 ldap_sasl_bind 0.03 0.030760 30760 1 l_ldap_get_option 0.03 0.030635 30635 1 LDAP_get_option 0.02 0.029960 14980 2 ldap_initialize 0.02 0.027988 27988 1 ldap_int_initialize 0.02 0.026722 26722 1 l_ldap_simple_bind 0.02 0.026386 13193 2 ldap_send_initial_request 0.02 0.025810 12905 2 ldap_int_select ....
Special care is needed if your shared object has - or + in its file name. These characters aren't treated as is (see man 1 ltrace for details).
The wildcard * can be a workaround such as -x @myext* in place of -x @myext-2.so.