-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][Lower] Always generate namelist group locally #109303
Conversation
Namelists are allowed inside of OpenMP reduction regions. Those regions map the symbol for the reduction variable to a thread-local reduction variable, but this mapping is escaped if the namelist is generated as a global variable because the global variable will refer the original symbol mapping. See llvm#101907 for a code example. We can't do this only when inside of an OpenMP region because the IO function might be inside of a subroutine/function which is called from inside the OpenMP region. There is also no restriction against namelist variables for the `reduce` clause of `do concurrent` in F23. We currently don't seem to implement initial values for reduction variables (F23 table 11.1), but when we do I expect this would see the same issue as OpenMP. A disadvantage of doing this is that the namelist has to be constructed in place every time. If for some reason this is not hoisted out of hot loops (e.g. it is inside of another subroutine called in the loop body) this might have some performance impact (although I suspect this won't be huge compared to the context switches required for I/O). Fixes llvm#101907
@llvm/pr-subscribers-flang-fir-hlfir Author: Tom Eccles (tblah) ChangesNamelists are allowed inside of OpenMP reduction regions. Those regions map the symbol for the reduction variable to a thread-local reduction variable, but this mapping is escaped if the namelist is generated as a global variable because the global variable will refer the original symbol mapping. See #101907 for a code example. We can't do this only when inside of an OpenMP region because the IO function might be inside of a subroutine/function which is called from inside the OpenMP region. There is also no restriction against namelist variables for the A disadvantage of doing this is that the namelist has to be constructed in place every time. If for some reason this is not hoisted out of hot loops (e.g. it is inside of another subroutine called in the loop body) this might have some performance impact (although I suspect this won't be huge compared to the context switches required for I/O). Fixes #101907 Patch is 25.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109303.diff 3 Files Affected:
diff --git a/flang/lib/Lower/IO.cpp b/flang/lib/Lower/IO.cpp
index 9e98b230b676f6..4306f71438419b 100644
--- a/flang/lib/Lower/IO.cpp
+++ b/flang/lib/Lower/IO.cpp
@@ -467,7 +467,6 @@ getNamelistGroup(Fortran::lower::AbstractConverter &converter,
const auto &details =
symbol.GetUltimate().get<Fortran::semantics::NamelistDetails>();
mlir::MLIRContext *context = builder.getContext();
- mlir::StringAttr linkOnce = builder.createLinkOnceLinkage();
mlir::Type idxTy = builder.getIndexType();
mlir::Type sizeTy =
fir::runtime::getModel<std::size_t>()(builder.getContext());
@@ -487,37 +486,10 @@ getNamelistGroup(Fortran::lower::AbstractConverter &converter,
// Define variable names, and static descriptors for global variables.
DefinedIoProcMap definedIoProcMap = getDefinedIoProcMap(converter);
- bool groupIsLocal = hasLocalDefinedIoProc(definedIoProcMap);
stringAddress(symbol);
- for (const Fortran::semantics::Symbol &s : details.objects()) {
- stringAddress(s);
- if (!Fortran::lower::symbolIsGlobal(s)) {
- groupIsLocal = true;
- continue;
- }
- // A global pointer or allocatable variable has a descriptor for typical
- // accesses. Variables in multiple namelist groups may already have one.
- // Create descriptors for other cases.
- if (!IsAllocatableOrObjectPointer(&s)) {
- std::string mangleName =
- Fortran::lower::mangle::globalNamelistDescriptorName(s);
- if (builder.getNamedGlobal(mangleName))
- continue;
- const auto expr = Fortran::evaluate::AsGenericExpr(s);
- fir::BoxType boxTy =
- fir::BoxType::get(fir::PointerType::get(converter.genType(s)));
- auto descFunc = [&](fir::FirOpBuilder &b) {
- auto box = Fortran::lower::genInitialDataTarget(
- converter, loc, boxTy, *expr, /*couldBeInEquivalence=*/true);
- b.create<fir::HasValueOp>(loc, box);
- };
- builder.createGlobalConstant(loc, boxTy, mangleName, descFunc, linkOnce);
- }
- }
// Define the list of Items.
- mlir::Value listAddr =
- groupIsLocal ? builder.create<fir::AllocaOp>(loc, listTy) : mlir::Value{};
+ mlir::Value listAddr = builder.create<fir::AllocaOp>(loc, listTy);
std::string listMangleName = groupMangleName + ".list";
auto listFunc = [&](fir::FirOpBuilder &builder) {
mlir::Value list = builder.create<fir::UndefOp>(loc, listTy);
@@ -575,21 +547,12 @@ getNamelistGroup(Fortran::lower::AbstractConverter &converter,
list = builder.create<fir::InsertValueOp>(loc, listTy, list, descAddr,
builder.getArrayAttr(idx));
}
- if (groupIsLocal)
- builder.create<fir::StoreOp>(loc, list, listAddr);
- else
- builder.create<fir::HasValueOp>(loc, list);
+ builder.create<fir::StoreOp>(loc, list, listAddr);
};
- if (groupIsLocal)
- listFunc(builder);
- else
- builder.createGlobalConstant(loc, listTy, listMangleName, listFunc,
- linkOnce);
+ listFunc(builder);
// Define the group.
- mlir::Value groupAddr = groupIsLocal
- ? builder.create<fir::AllocaOp>(loc, groupTy)
- : mlir::Value{};
+ mlir::Value groupAddr = builder.create<fir::AllocaOp>(loc, groupTy);
auto groupFunc = [&](fir::FirOpBuilder &builder) {
mlir::Value group = builder.create<fir::UndefOp>(loc, groupTy);
// group name [const char *groupName]
@@ -617,20 +580,9 @@ getNamelistGroup(Fortran::lower::AbstractConverter &converter,
loc, groupTy, group,
getNonTbpDefinedIoTableAddr(converter, definedIoProcMap),
builder.getArrayAttr(builder.getIntegerAttr(idxTy, 3)));
- if (groupIsLocal)
- builder.create<fir::StoreOp>(loc, group, groupAddr);
- else
- builder.create<fir::HasValueOp>(loc, group);
+ builder.create<fir::StoreOp>(loc, group, groupAddr);
};
- if (groupIsLocal) {
- groupFunc(builder);
- } else {
- fir::GlobalOp group = builder.createGlobal(
- loc, groupTy, groupMangleName,
- /*isConst=*/true, /*isTarget=*/false, groupFunc, linkOnce);
- groupAddr = builder.create<fir::AddrOfOp>(loc, group.resultType(),
- group.getSymbol());
- }
+ groupFunc(builder);
assert(groupAddr && "missing namelist group result");
return groupAddr;
}
diff --git a/flang/test/Lower/namelist-common-block.f90 b/flang/test/Lower/namelist-common-block.f90
index d16f886dadb5bc..70fe64a002de63 100644
--- a/flang/test/Lower/namelist-common-block.f90
+++ b/flang/test/Lower/namelist-common-block.f90
@@ -1,4 +1,4 @@
-! RUN: bbc -emit-fir -hlfir=false -o - %s | FileCheck %s
+! RUN: bbc -emit-hlfir -o - %s | FileCheck %s
! Test that allocatable or pointer item from namelist are retrieved correctly
! if they are part of a common block as well.
@@ -17,12 +17,56 @@ subroutine print_t()
end subroutine
end
-! CHECK-LABEL: fir.global linkonce @_QFNt.list constant : !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>> {
-! CHECK: %[[CB_ADDR:.*]] = fir.address_of(@c_) : !fir.ref<!fir.array<56xi8>>
-! CHECK: %[[CB_CAST:.*]] = fir.convert %[[CB_ADDR]] : (!fir.ref<!fir.array<56xi8>>) -> !fir.ref<!fir.array<?xi8>>
-! CHECK: %[[OFFSET:.*]] = arith.constant 8 : index
-! CHECK: %[[COORD:.*]] = fir.coordinate_of %[[CB_CAST]], %[[OFFSET]] : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
-! CHECK: %[[CAST_BOX:.*]] = fir.convert %[[COORD]] : (!fir.ref<i8>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
-! CHECK: %[[CAST_BOX_NONE:.*]] = fir.convert %[[CAST_BOX]] : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) -> !fir.ref<!fir.box<none>>
-! CHECK: %[[RES:.*]] = fir.insert_value %{{.*}}, %[[CAST_BOX_NONE]], [1 : index, 1 : index] : (!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<!fir.box<none>>) -> !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
-! CHECK: fir.has_value %[[RES]] : !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK-LABEL: func.func private @_QFPprint_t() attributes {fir.host_symbol = @_QQmain, llvm.linkage = #{{.*}}<internal>} {
+! CHECK: %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.ptr<i32>>
+! CHECK: %[[VAL_1:.*]] = fir.address_of(@c_) : !fir.ref<!fir.array<56xi8>>
+! CHECK: %[[VAL_2:.*]] = fir.convert %[[VAL_1]] : (!fir.ref<!fir.array<56xi8>>) -> !fir.ref<!fir.array<?xi8>>
+! CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
+! CHECK: %[[VAL_4:.*]] = fir.coordinate_of %[[VAL_2]], %[[VAL_3]] : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+! CHECK: %[[VAL_5:.*]] = fir.convert %[[VAL_4]] : (!fir.ref<i8>) -> !fir.ref<i32>
+! CHECK: %[[VAL_6:.*]]:2 = hlfir.declare %[[VAL_5]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+! CHECK: %[[VAL_7:.*]] = fir.convert %[[VAL_1]] : (!fir.ref<!fir.array<56xi8>>) -> !fir.ref<!fir.array<?xi8>>
+! CHECK: %[[VAL_8:.*]] = arith.constant 8 : index
+! CHECK: %[[VAL_9:.*]] = fir.coordinate_of %[[VAL_7]], %[[VAL_8]] : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+! CHECK: %[[VAL_10:.*]] = fir.convert %[[VAL_9]] : (!fir.ref<i8>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
+! CHECK: %[[VAL_11:.*]]:2 = hlfir.declare %[[VAL_10]] {fortran_attrs = #{{.*}}<pointer>, uniq_name = "_QFEp"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) -> (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>, !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>)
+! CHECK: %[[VAL_12:.*]] = arith.constant 6 : i32
+! CHECK: %[[VAL_13:.*]] = fir.address_of(@_QQclXcb818d0ca98c80cc21b98226ec6d98a0) : !fir.ref<!fir.char<1,71>>
+! CHECK: %[[VAL_14:.*]] = fir.convert %[[VAL_13]] : (!fir.ref<!fir.char<1,71>>) -> !fir.ref<i8>
+! CHECK: %[[VAL_15:.*]] = arith.constant 16 : i32
+! CHECK: %[[VAL_16:.*]] = fir.call @_FortranAioBeginExternalListOutput(%[[VAL_12]], %[[VAL_14]], %[[VAL_15]]) fastmath<contract> : (i32, !fir.ref<i8>, i32) -> !fir.ref<i8>
+! CHECK: %[[VAL_17:.*]] = fir.alloca !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK: %[[VAL_18:.*]] = fir.undefined !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK: %[[VAL_19:.*]] = fir.address_of(@_QQclX6900) : !fir.ref<!fir.char<1,2>>
+! CHECK: %[[VAL_20:.*]] = fir.convert %[[VAL_19]] : (!fir.ref<!fir.char<1,2>>) -> !fir.ref<i8>
+! CHECK: %[[VAL_21:.*]] = fir.insert_value %[[VAL_18]], %[[VAL_20]], [0 : index, 0 : index] : (!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<i8>) -> !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK: %[[VAL_22:.*]] = fir.embox %[[VAL_6]]#1 : (!fir.ref<i32>) -> !fir.box<!fir.ptr<i32>>
+! CHECK: fir.store %[[VAL_22]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.ptr<i32>>>
+! CHECK: %[[VAL_23:.*]] = fir.convert %[[VAL_0]] : (!fir.ref<!fir.box<!fir.ptr<i32>>>) -> !fir.ref<!fir.box<none>>
+! CHECK: %[[VAL_24:.*]] = fir.insert_value %[[VAL_21]], %[[VAL_23]], [0 : index, 1 : index] : (!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<!fir.box<none>>) -> !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK: %[[VAL_25:.*]] = fir.address_of(@_QQclX7000) : !fir.ref<!fir.char<1,2>>
+! CHECK: %[[VAL_26:.*]] = fir.convert %[[VAL_25]] : (!fir.ref<!fir.char<1,2>>) -> !fir.ref<i8>
+! CHECK: %[[VAL_27:.*]] = fir.insert_value %[[VAL_24]], %[[VAL_26]], [1 : index, 0 : index] : (!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<i8>) -> !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK: %[[VAL_28:.*]] = fir.address_of(@c_) : !fir.ref<!fir.array<56xi8>>
+! CHECK: %[[VAL_29:.*]] = fir.convert %[[VAL_28]] : (!fir.ref<!fir.array<56xi8>>) -> !fir.ref<!fir.array<?xi8>>
+! CHECK: %[[VAL_30:.*]] = arith.constant 8 : index
+! CHECK: %[[VAL_31:.*]] = fir.coordinate_of %[[VAL_29]], %[[VAL_30]] : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
+! CHECK: %[[VAL_32:.*]] = fir.convert %[[VAL_31]] : (!fir.ref<i8>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
+! CHECK: %[[VAL_33:.*]] = fir.convert %[[VAL_32]] : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) -> !fir.ref<!fir.box<none>>
+! CHECK: %[[VAL_34:.*]] = fir.insert_value %[[VAL_27]], %[[VAL_33]], [1 : index, 1 : index] : (!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<!fir.box<none>>) -> !fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+! CHECK: fir.store %[[VAL_34]] to %[[VAL_17]] : !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>
+! CHECK: %[[VAL_35:.*]] = fir.alloca tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+! CHECK: %[[VAL_36:.*]] = fir.undefined tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+! CHECK: %[[VAL_37:.*]] = fir.address_of(@_QQclX7400) : !fir.ref<!fir.char<1,2>>
+! CHECK: %[[VAL_38:.*]] = fir.convert %[[VAL_37]] : (!fir.ref<!fir.char<1,2>>) -> !fir.ref<i8>
+! CHECK: %[[VAL_39:.*]] = fir.insert_value %[[VAL_36]], %[[VAL_38]], [0 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, !fir.ref<i8>) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+! CHECK: %[[VAL_40:.*]] = arith.constant 2 : i64
+! CHECK: %[[VAL_41:.*]] = fir.insert_value %[[VAL_39]], %[[VAL_40]], [1 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, i64) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+! CHECK: %[[VAL_42:.*]] = fir.insert_value %[[VAL_41]], %[[VAL_17]], [2 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+! CHECK: %[[VAL_43:.*]] = fir.address_of(@default.nonTbpDefinedIoTable) : !fir.ref<tuple<i64, !fir.ref<!fir.array<0xtuple<!fir.ref<none>, !fir.ref<none>, i32, i1>>>, i1>>
+! CHECK: %[[VAL_44:.*]] = fir.convert %[[VAL_43]] : (!fir.ref<tuple<i64, !fir.ref<!fir.array<0xtuple<!fir.ref<none>, !fir.ref<none>, i32, i1>>>, i1>>) -> !fir.ref<none>
+! CHECK: %[[VAL_45:.*]] = fir.insert_value %[[VAL_42]], %[[VAL_44]], [3 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, !fir.ref<none>) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+! CHECK: fir.store %[[VAL_45]] to %[[VAL_35]] : !fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>>
+! CHECK: %[[VAL_46:.*]] = fir.convert %[[VAL_35]] : (!fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<2xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>>) -> !fir.ref<tuple<>>
+! CHECK: %[[VAL_47:.*]] = fir.call @_FortranAioOutputNamelist(%[[VAL_16]], %[[VAL_46]]) fastmath<contract> : (!fir.ref<i8>, !fir.ref<tuple<>>) -> i1
+! CHECK: %[[VAL_48:.*]] = fir.call @_FortranAioEndIoStatement(%[[VAL_16]]) fastmath<contract> : (!fir.ref<i8>) -> i32
diff --git a/flang/test/Lower/namelist.f90 b/flang/test/Lower/namelist.f90
index a96bbbfad0cd6b..9a8a6f827a02b1 100644
--- a/flang/test/Lower/namelist.f90
+++ b/flang/test/Lower/namelist.f90
@@ -114,11 +114,35 @@ subroutine sss
! CHECK-LABEL: c.func @_QPglobal_pointer
subroutine global_pointer
real,pointer,save::ptrarray(:)
- ! CHECK: %[[V_4:[0-9]+]] = fir.call @_FortranAioBeginExternalListOutput
- ! CHECK: %[[V_5:[0-9]+]] = fir.address_of(@_QFglobal_pointerNmygroup) : !fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>>
- ! CHECK: %[[V_6:[0-9]+]] = fir.convert %[[V_5]] : (!fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>>) -> !fir.ref<tuple<>>
- ! CHECK: %[[V_7:[0-9]+]] = fir.call @_FortranAioOutputNamelist(%[[V_4]], %[[V_6]]) fastmath<contract> : (!fir.ref<i8>, !fir.ref<tuple<>>) -> i1
- ! CHECK: %[[V_8:[0-9]+]] = fir.call @_FortranAioEndIoStatement(%[[V_4]]) fastmath<contract> : (!fir.ref<i8>) -> i32
+ ! CHECK: %[[VAL_0:.*]] = arith.constant 1 : i64
+ ! CHECK: %[[VAL_2:.*]] = arith.constant 10 : i32
+ ! CHECK: %[[VAL_3:.*]] = fir.address_of(@_QFglobal_pointerEptrarray) : !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
+ ! CHECK: %[[VAL_4:.*]] = fir.declare %[[VAL_3]] {fortran_attrs = {{.*}}<pointer>, uniq_name = "_QFglobal_pointerEptrarray"} : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) -> !fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>
+ ! CHECK: %[[VAL_5:.*]] = fir.address_of(@_QQclX51b4dc7bdbe935748434745744d2a699) : !fir.ref<!fir.char<1,58>>
+ ! CHECK: %[[VAL_6:.*]] = fir.convert %[[VAL_5]] : (!fir.ref<!fir.char<1,58>>) -> !fir.ref<i8>
+ ! CHECK: %[[VAL_7:.*]] = fir.call @_FortranAioBeginExternalListOutput(%[[VAL_2]], %[[VAL_6]], %{{.*}}) fastmath<contract> : (i32, !fir.ref<i8>, i32) -> !fir.ref<i8>
+ ! CHECK: %[[VAL_8:.*]] = fir.alloca !fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+ ! CHECK: %[[VAL_9:.*]] = fir.undefined !fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+ ! CHECK: %[[VAL_10:.*]] = fir.address_of(@_QQclX707472617272617900) : !fir.ref<!fir.char<1,9>>
+ ! CHECK: %[[VAL_11:.*]] = fir.convert %[[VAL_10]] : (!fir.ref<!fir.char<1,9>>) -> !fir.ref<i8>
+ ! CHECK: %[[VAL_12:.*]] = fir.insert_value %[[VAL_9]], %[[VAL_11]], [0 : index, 0 : index] : (!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<i8>) -> !fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+ ! CHECK: %[[VAL_13:.*]] = fir.convert %[[VAL_3]] : (!fir.ref<!fir.box<!fir.ptr<!fir.array<?xf32>>>>) -> !fir.ref<!fir.box<none>>
+ ! CHECK: %[[VAL_14:.*]] = fir.insert_value %[[VAL_12]], %[[VAL_13]], [0 : index, 1 : index] : (!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>, !fir.ref<!fir.box<none>>) -> !fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>
+ ! CHECK: fir.store %[[VAL_14]] to %[[VAL_8]] : !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>
+ ! CHECK: %[[VAL_15:.*]] = fir.alloca tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+ ! CHECK: %[[VAL_16:.*]] = fir.undefined tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+ ! CHECK: %[[VAL_17:.*]] = fir.address_of(@_QQclX6D7967726F757000) : !fir.ref<!fir.char<1,8>>
+ ! CHECK: %[[VAL_18:.*]] = fir.convert %[[VAL_17]] : (!fir.ref<!fir.char<1,8>>) -> !fir.ref<i8>
+ ! CHECK: %[[VAL_19:.*]] = fir.insert_value %[[VAL_16]], %[[VAL_18]], [0 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, !fir.ref<i8>) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+ ! CHECK: %[[VAL_20:.*]] = fir.insert_value %[[VAL_19]], %[[VAL_0]], [1 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, i64) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+ ! CHECK: %[[VAL_21:.*]] = fir.insert_value %[[VAL_20]], %[[VAL_8]], [2 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+ ! CHECK: %[[VAL_22:.*]] = fir.address_of(@default.nonTbpDefinedIoTable) : !fir.ref<tuple<i64, !fir.ref<!fir.array<0xtuple<!fir.ref<none>, !fir.ref<none>, i32, i1>>>, i1>>
+ ! CHECK: %[[VAL_23:.*]] = fir.convert %[[VAL_22]] : (!fir.ref<tuple<i64, !fir.ref<!fir.array<0xtuple<!fir.ref<none>, !fir.ref<none>, i32, i1>>>, i1>>) -> !fir.ref<none>
+ ! CHECK: %[[VAL_24:.*]] = fir.insert_value %[[VAL_21]], %[[VAL_23]], [3 : index] : (tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>, !fir.ref<none>) -> tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>
+ ! CHECK: fir.store %[[VAL_24]] to %[[VAL_15]] : !fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>>
+ ! CHECK: %[[VAL_25:.*]] = fir.convert %[[VAL_15]] : (!fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>>>>>, !fir.ref<none>>>) -> !fir.ref<tuple<>>
+ ! CHECK: %[[VAL_26:.*]] = fir.call @_FortranAioOutputNamelist(%[[VAL_7]], %[[VAL_25]]) fastmath<contract> : (!fir.ref<i8>, !fir.ref<tuple<>>) -> i1
+ ! CHECK: %[[VAL_27:.*]] = fir.call @_FortranAioEndIoStatement(%[[VAL_7]]) fastmath<contract> : (!fir.ref<i8>) -> i32
namelist/mygroup/ptrarray
write(10, nml=mygroup)
end
@@ -132,11 +156,40 @@ module mmm
subroutine rename_sub
use mmm, bbb => aaa
rrr = 3.
- ! CHECK: %[[V_4:[0-9]+]] = fir.call @_FortranAioBeginExternalListOutput
- ! CHECK: %[[V_5:[0-9]+]] = fir.address_of(@_QMmmmNaaa) : !fir.ref<tuple<!fir.ref<i8>, i64, !fir.ref<!fir.array<1xtuple<!fir.ref<i8>, !fir.ref<!fir.box<none>...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we look at compilation options to inform this decision, either now or in the long run?
In any case, I'm inclined to think it would be best to retain this code, even if it is programmatically deactivated for the time being.
Is this really legal OpenMP? I found this restriction in OpenMP standard section 5.3:
Maybe this does not really apply to REDUCE, but i it could then be an omission from OpenMP given the spirit of 5.3 restriction. Regarding DO CONCURRENT, I have not found a very convincing point about what happens with variables in LOCAL_INIT, LOCAL, or REDUCE that appeared in a namelist. I think it would deserve a standard clarification, if namelist IO is possible inside it (which I do not think is forbidden). I share Val's concern that namelist IO can be used with huge variable lists in old programs in parts that make little sense to parallelize, and that spending time creating the namelist object again and again at each namelist reference could be a significant hit. |
Thanks for taking a look.
I was originally thinking about doing this only when
I agree this probably shouldn't be legal OpenMP, but so far as I can tell, restrictions on privatisation do not apply to reductions. The upstream ticket indicates that this is supported by both ifort and gfortran. But I would be very happy to instead add a semantic check preventing this if sufficient consensus could be reached to deviate from OpenMP and F23. I agree that creating a very large namelist could be a performance hit. Do those programs declare all variables inside of the namelist inside of a common block (i.e. is this an optimization that users expect)?. If some of the variables are local then this optimization could not be performed anyway. |
I lack expertise about old code to judge here. I went ahead and ran an instrumented compiler through some of our regression tests. Out of about 400 tests using namelists, 150 fell into cases where the namelist data structure could be made global (37%). I am not sure however that this is an optimization that user expect given gfortran and ifort do not seem to be doing it, but nvfortran and classic flang are doing it I think. |
This is allowed by the OpenMP and F23 standards. But variables in a namelist are not allowed in OpenMP privatisation. I suspect this was an oversight. If we allow this we run into problems masking the original symbol with the symbol for the reduction variable when the variable is accessed via a namelist initialised as a global variable. See llvm#101907. One solution for this would be to force the namelist to always be initilized inside of the block in which it is used (therefore using the correct mapping for the reduction variable), but this could make some production applications slow. I tentatively think it is probably better to disallow a (perhaps mistaken) edge case of the standards with (I think) little practical use, than to make real applications slow in order to make this work. If reviewers would rather keep to the letter of the standard, see llvm#109303 which implements the alternative solution. I'm open to either path forward. Fixes llvm#101907
This is allowed by the OpenMP and F23 standards. But variables in a namelist are not allowed in OpenMP privatisation. I suspect this was an oversight. If we allow this we run into problems masking the original symbol with the symbol for the reduction variable when the variable is accessed via a namelist initialised as a global variable. See #101907. One solution for this would be to force the namelist to always be initilized inside of the block in which it is used (therefore using the correct mapping for the reduction variable), but this could make some production applications slow. I tentatively think it is probably better to disallow a (perhaps mistaken) edge case of the standards with (I think) little practical use, than to make real applications slow in order to make this work. If reviewers would rather keep to the letter of the standard, see #109303 which implements the alternative solution. I'm open to either path forward. Fixes #101907
Closing in favor of #110671 |
…10671) This is allowed by the OpenMP and F23 standards. But variables in a namelist are not allowed in OpenMP privatisation. I suspect this was an oversight. If we allow this we run into problems masking the original symbol with the symbol for the reduction variable when the variable is accessed via a namelist initialised as a global variable. See llvm#101907. One solution for this would be to force the namelist to always be initilized inside of the block in which it is used (therefore using the correct mapping for the reduction variable), but this could make some production applications slow. I tentatively think it is probably better to disallow a (perhaps mistaken) edge case of the standards with (I think) little practical use, than to make real applications slow in order to make this work. If reviewers would rather keep to the letter of the standard, see llvm#109303 which implements the alternative solution. I'm open to either path forward. Fixes llvm#101907
…10671) This is allowed by the OpenMP and F23 standards. But variables in a namelist are not allowed in OpenMP privatisation. I suspect this was an oversight. If we allow this we run into problems masking the original symbol with the symbol for the reduction variable when the variable is accessed via a namelist initialised as a global variable. See llvm#101907. One solution for this would be to force the namelist to always be initilized inside of the block in which it is used (therefore using the correct mapping for the reduction variable), but this could make some production applications slow. I tentatively think it is probably better to disallow a (perhaps mistaken) edge case of the standards with (I think) little practical use, than to make real applications slow in order to make this work. If reviewers would rather keep to the letter of the standard, see llvm#109303 which implements the alternative solution. I'm open to either path forward. Fixes llvm#101907
…10671) This is allowed by the OpenMP and F23 standards. But variables in a namelist are not allowed in OpenMP privatisation. I suspect this was an oversight. If we allow this we run into problems masking the original symbol with the symbol for the reduction variable when the variable is accessed via a namelist initialised as a global variable. See llvm#101907. One solution for this would be to force the namelist to always be initilized inside of the block in which it is used (therefore using the correct mapping for the reduction variable), but this could make some production applications slow. I tentatively think it is probably better to disallow a (perhaps mistaken) edge case of the standards with (I think) little practical use, than to make real applications slow in order to make this work. If reviewers would rather keep to the letter of the standard, see llvm#109303 which implements the alternative solution. I'm open to either path forward. Fixes llvm#101907
Namelists are allowed inside of OpenMP reduction regions. Those regions map the symbol for the reduction variable to a thread-local reduction variable, but this mapping is escaped if the namelist is generated as a global variable because the global variable will refer the original symbol mapping. See #101907 for a code example.
We can't do this only when inside of an OpenMP region because the IO function might be inside of a subroutine/function which is called from inside the OpenMP region.
There is also no restriction against namelist variables for the
reduce
clause ofdo concurrent
in F23. We currently don't seem to implement initial values for reduction variables (F23 table 11.1), but when we do I expect this would see the same issue as OpenMP.A disadvantage of doing this is that the namelist has to be constructed in place every time. If for some reason this is not hoisted out of hot loops (e.g. it is inside of another subroutine called in the loop body) this might have some performance impact (although I suspect this won't be huge compared to the context switches required for I/O).
Fixes #101907