Blocking for partial directions
2 unresolved threads
2 unresolved threads
Files
2+ 11
− 3
@@ -1258,7 +1258,8 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
@@ -1270,8 +1271,10 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
@@ -1285,6 +1288,9 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
@@ -1296,7 +1302,9 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
@@ -1307,7 +1315,7 @@ def loop_blocking(ast_node: ast.KernelFunction, block_size) -> int:
So this function returns a magic number that will be consumed by OpenMP's
collapse
and this will work the right way whatever block size you're specifying here? Shouldn't OpenMP collapse all looks that enclose those blocking loops (not how many loops do not use blocking) and avoid that collapse is applied for coordinates that use blocking. Then, this version wouldn't be correct. Anyways, should OpenMP get this information in a way that is less magical (e.g. directly from blocking dimension).So I think collapse should be applied to the number of blocking loops we have produced because they share the same iteration space as the original loops right? I think this was the way of thinking which got into this function originally.
I think it makes sense to return the number of blocked coordinates then and I am not sure if it is a good idea to just directly extract that information form the blocking tuple. At the moment coordinates_taken_into_account is only increased if a new outer_loop is really produced. If we just extract the information from the tuple it might be more error prone, right?
Right, it returns the number of outer loops not the number of inner loops.
So the code for (1,16,1) would produce two inner loops that have only one iteration and the goal of this PR is to drop those loops?
Yes, thats correct.