The Python language used in the Blender API is very loyal and undemanding when it comes to data typing. However, when working with large amounts of data, the universalization of type conversion can negatively affect the speed of the code. For example, the simplest foreach_get() cycle, which takes data from a set of elements and puts it into an array, can be significantly speeded up simply by choosing the right type of the destination array data.
User Mysteryem in the Blender chat conducted a study on the example of getting data for the location of each particle in a particle system of 10,000 elements.
In his experiment, different types of arrays were created, into which the location of particles was placed using the foreach_get() instruction.
Arrays were created of the following types:
- regular array of floating point numbers (float)
- regular array of double precision floating point numbers (double)
- numpy array of floating point numbers (float)
- numpy array of double precision floating point numbers (double)
- ctypes array of floating point numbers (float)
- ctypes array of double precision floating point numbers d(ouble)
Mysteryem’s code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
import bpy import timeit import ctypes import numpy as np import array obj = bpy.context.object depsgraph = bpy.context.evaluated_depsgraph_get() obj_eval = obj.evaluated_get(depsgraph) psys = obj_eval.particle_systems[0] parlen = len(psys.particles) par_loc = [0, 0, 0] * parlen array_f = array.array('f', [0, 0, 0]) * parlen array_d = array.array('d', [0, 0, 0]) * parlen ndarray_f = np.empty(3 * parlen, dtype=np.single) ndarray_d = np.empty(3 * parlen, dtype=np.double) ctypes_f = (ctypes.c_float * (3 * parlen))() ctypes_d = (ctypes.c_double * (3 * parlen))() # My test case was a default particle system with 1000 particles. # If your particle system has more than this, you may want to reduce the number of `trials` below # Note that being able to use memcpy in foreach_get scales much better than iterating as the number of particles # increases. print(f"parlen: {parlen}") trials = 1000 # foreach_get can't memcpy into lists, but lists are pretty fast to iterate compared to most other types # ~ 0.026ms for 1000 particles t_par_loc = timeit.timeit("psys.particles.foreach_get('location', par_loc)", globals=globals(), number=trials) / trials print("t_par_loc:") print(f"\t{t_par_loc * 1000:f}ms") # foreach_get could perform a memcpy because the buffer type matched the property's C type! # ~ 0.003ms for 1000 particles t_array_f = timeit.timeit("psys.particles.foreach_get('location', array_f)", globals=globals(), number=trials) / trials print("t_array_f:") print(f"\t{t_array_f * 1000:f}ms") # ~ 0.099ms for 1000 particles t_array_d = timeit.timeit("psys.particles.foreach_get('location', array_d)", globals=globals(), number=trials) / trials print("t_array_d:") print(f"\t{t_array_d * 1000:f}ms") # foreach_get could perform a memcpy because the buffer type matched the property's C type! # ~ 0.003ms for 1000 particles t_ndarray_f = timeit.timeit("psys.particles.foreach_get('location', ndarray_f)", globals=globals(), number=trials) / trials print("t_ndarray_f:") print(f"\t{t_ndarray_f * 1000:f}ms") # ~ 0.113ms for 1000 particles t_ndarray_d = timeit.timeit("psys.particles.foreach_get('location', ndarray_d)", globals=globals(), number=trials) / trials print("t_ndarray_d:") print(f"\t{t_ndarray_d * 1000:f}ms") # foreach_get should have been able to perform a memcpy, but its C code has issues that prevent ctypes arrays from being # supported # ~ 0.374ms for 1000 particles t_ctypes_f = timeit.timeit("psys.particles.foreach_get('location', ctypes_f)", globals=globals(), number=trials) / trials print("t_ctypes_f:") print(f"\t{t_ctypes_f * 1000:f}ms") # ~ 0.374ms for 1000 particles t_ctypes_d = timeit.timeit("psys.particles.foreach_get('location', ctypes_d)", globals=globals(), number=trials) / trials print("t_ctypes_d:") print(f"\t{t_ctypes_d * 1000:f}ms") |
The result of executing the code is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
parlen: 10000 t_par_loc: 0.446990ms t_array_f: 0.033495ms t_array_d: 2.511545ms t_ndarray_f: 0.031628ms t_ndarray_d: 1.293199ms t_ctypes_f: 4.844052ms t_ctypes_d: 4.707416ms |
Execution results may depend on the computer and operating system you are using.
Based on the results of executing the code, the following conclusions can be drawn:
Numpy array is the best.
The maximum speed was demonstrated by the foreach_get() loop when using a numpy array with a float data type, which corresponds to the particle location data type.
Any difference with the original data type requires additional conversions, which can significantly affect the speed of the loop. So, the same numpy array shows a significant decrease in speed when using the double type.