Optimize province pixel processing in load_map_images#608
Optimize province pixel processing in load_map_images#608Spartan322 wants to merge 1 commit intomasterfrom
Conversation
|
This is not more cache friendly. In creating the new copy, you load all of the existing data into the cache, plus you end up with the additional new copy in the cache. So you end up with the same cost as the existing version, of loading the same data into the cache (if it wasn't there already) plus you consume cache space for the new copy, possibly evicting something useful in the process. |
|
Well the profiling I have actually done shows a consistent decrease of relative time spent in load_map_images with this being the only change, the data is processed into a linear allocation which would inherently speed up cache access. |
|
The version prior to your change also has the data in a linear allocation -- see the implementation of the colour_at function and the implementation of get_pixel_index_from_pos which is which is the exact same lookup pattern in terms of indexing into a contiguous linear allocation as the new version, but without making an additional copy. |
|
One difference between the two is that colour_t stores an internal 32bit integer, meaning that there are 4 bytes per pixel stored instead of 3 bytes per pixel in the source data, meaning that the copy wastes an additional byte of space per pixel, making it less cache friendly. However, maybe the pattern of spacing the data in 4 byte strides instead of 3 byte strides enabled the compiler to do some optimization with the copy it couldn't do with the original version. |
|
Regardless the improvement while notably consistent is still fairly marginal and we only perform this calculation once, it did shrink the function in the flame graph, but I'm still undecided if the improvement is worth it hence why it was left as a draft, probably better if it was processing it with SIMD operations. |
This should slightly improve the cache-friendlyness of the pixel search at the memory cost of a practical copy of the province pixel map. Did not benchmark performance but on Linux built from GCC the relative cost of load_map_images appears to shrink.