Skip to content
Snippets Groups Projects

Blocking for partial directions

Merged Markus Holzer requested to merge holzer/pystencils:CPU_Blocking into master
2 unresolved threads
Files
2
@@ -1258,7 +1258,8 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
Args:
ast_node: kernel function node before vectorization transformation has been applied
block_size: sequence defining block size in x, y, (z) direction
block_size: sequence defining block size in x, y, (z) direction.
If chosen as zero the direction will not be used for blocking.
Returns:
number of dimensions blocked
@@ -1270,8 +1271,10 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
body = ast_node.body
coordinates = []
coordinates_taken_into_account = 0
loop_starts = {}
loop_stops = {}
for loop in loops:
coord = loop.coordinate_to_loop_over
if coord not in coordinates:
@@ -1285,6 +1288,9 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
# Create the outer loops that iterate over the blocks
outer_loop = None
for coord in reversed(coordinates):
if block_size[coord] == 0:
continue
coordinates_taken_into_account += 1
body = ast.Block([outer_loop]) if outer_loop else body
outer_loop = ast.LoopOverCoordinate(body,
coord,
@@ -1296,7 +1302,9 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
ast_node.body = ast.Block([outer_loop])
# modify the existing loops to only iterate within one block
for inner_loop in loops:
for inner_loop, coord in zip(loops, coordinates):
if block_size[coord] == 0:
continue
coord = inner_loop.coordinate_to_loop_over
block_ctr = ast.LoopOverCoordinate.get_block_loop_counter_symbol(coord)
loop_range = inner_loop.stop - inner_loop.start
@@ -1307,7 +1315,7 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
stop = sp.Min(inner_loop.stop, block_ctr + block_size[coord])
inner_loop.start = block_ctr
inner_loop.stop = stop
return len(coordinates)
return coordinates_taken_into_account
    • So this function returns a magic number that will be consumed by OpenMP's collapse and this will work the right way whatever block size you're specifying here? Shouldn't OpenMP collapse all looks that enclose those blocking loops (not how many loops do not use blocking) and avoid that collapse is applied for coordinates that use blocking. Then, this version wouldn't be correct. Anyways, should OpenMP get this information in a way that is less magical (e.g. directly from blocking dimension).

      • So I think collapse should be applied to the number of blocking loops we have produced because they share the same iteration space as the original loops right? I think this was the way of thinking which got into this function originally.

        I think it makes sense to return the number of blocked coordinates then and I am not sure if it is a good idea to just directly extract that information form the blocking tuple. At the moment coordinates_taken_into_account is only increased if a new outer_loop is really produced. If we just extract the information from the tuple it might be more error prone, right?

      • Right, it returns the number of outer loops not the number of inner loops.

      • So the code for (1,16,1) would produce two inner loops that have only one iteration and the goal of this PR is to drop those loops?

      • Yes, thats correct.

      • Please register or sign in to reply
Please register or sign in to reply
def implement_interpolations(ast_node: ast.Node,