[lldb/DWARF] Respect member layout for types parsed through declarations #110648

labath · 2024-10-01T10:56:52Z

LLDB code for using the type layout data from DWARF was not kicking in for types which were initially parsed from a declaration. The problem was in these lines of code:

  if (type)
    layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;

which determine the types layout size by getting the size from the lldb_private::Type object. The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet). The reasons why this hasn't been noticed so far are:

this is a relatively new bug. I haven't checked but I suspect it was introduced in the "delay type definition search" patchset from this summer -- if we search for the definition eagerly, we will always have a cached size value.
it requires the presence of another bug/issue, as otherwise the automatically computed layout will match the real thing.
it reproduces (much) more easily with -flimit-debug-info (though it is possible to trigger it without that flag).

My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

LLDB code for using the type layout data from DWARF was not kicking in for types which were initially parsed from a declaration. The problem was in these lines of code: ``` if (type) layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8; ``` which determine the types layout size by getting the size from the lldb_private::Type object. The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet). The reasons why this hasn't been noticed so far are: - this is a relatively new bug. I haven't checked but I suspect it was introduced in the "delay type definition search" patchset from this summer -- if we search for the definition eagerly, we will always have a cached size value. - it requires the presence of another bug/issue, as otherwise the automatically computed layout will match the real thing. - it reproduces (much) more easily with -flimit-debug-info (though it is possible to trigger it without that flag). My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

labath · 2024-10-01T11:00:39Z

lldb/test/Shell/SymbolFile/DWARF/x86/DW_AT_declaration-with-children.s

@@ -251,6 +253,7 @@ c1:
 .LB1:
        .byte   6                               # Abbrev [6] DW_TAG_class_type
        .asciz  "B1"                            # DW_AT_name
+        .byte   8                               # DW_AT_byte_size


Adding this was necessary because lldb crashed without it (in llvm record layout code). Removal of the check was hiding this problem by recomputing the size without lldb's input. While this is something that we may to fix, I don't think we're likely to run into this situation (structure definition without a size attribute) outside of hyper-reduced test cases.

Michael137

Seems reasonable to me, given we end up here only for definition DIEs. (did have couple of questions though)

My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

Was it possibly protecting against cases where we generate structure definitions without a size attribute? (as you allude to in your test comment)? Reporting back a ExternalLayout::Size of 0 back to the RecordLayoutBuilder seems problematic. If Alignment == 0, the RecordLayoutBuilder knows to infer it. But we don't have that for Size, so it just takes it at face-value. I don't know if that's something we should account for. Probably not?

Side-note, should we also check DW_AT_bit_size here? The DWARFv5 spec says:

A structure type, union type or class type entry may have either a
DW_AT_byte_size or a DW_AT_bit_size attribute (see Section 2.21 on page 56),
whose value is the amount of storage needed to hold an instance of the structure,
union or class type, including any padding.

The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet)

That sounds worrying. Wonder if we do this anywhere else around the DWARF AST parser. And is there any safeguarding against these recursive calls?

labath · 2024-10-01T13:07:40Z

Seems reasonable to me, given we end up here only for definition DIEs. (did have couple of questions though)

My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

Was it possibly protecting against cases where we generate structure definitions without a size attribute? (as you allude to in your test comment)?

I can't disprove that, but I think it would be a very roundabout way of achieving that. To support the scenario of missing DW_AT_byte_size, I'd probably do something like set the size field to max(offsetof(Struct, field)+sizeof(Struct::field) for field in Struct), or change the record layout builder to infer the size as well. If i had to guess, I'd say this was an attempt to avoid parsing the attributes of the record DIE (again) in the completion step. In a world where where the size was already parsed in the parsing step, I can see how accessing the parsed value could be considered faster (though probably not measurably) than going to DWARF.

Reporting back a ExternalLayout::Size of 0 back to the RecordLayoutBuilder seems problematic. If Alignment == 0, the RecordLayoutBuilder knows to infer it. But we don't have that for Size, so it just takes it at face-value. I don't know if that's something we should account for. Probably not?

In principle I would say that we should, but I think this problem is in the same bucket as compiler putting a nonsensical (small) value in the DW_AT_byte_size field. In theory that should not crash lldb, but we're not likely to run into an actual compiler that does that.

Side-note, should we also check DW_AT_bit_size here? The DWARFv5 spec says:

A structure type, union type or class type entry may have either a
DW_AT_byte_size or a DW_AT_bit_size attribute (see Section 2.21 on page 56),
whose value is the amount of storage needed to hold an instance of the structure,
union or class type, including any padding.

We could do that, but then we'd also need to do that in ParseStructureLikeDIE. And we may need to figure out what does it mean to occupy a non-whole number of bytes, so I'd say that's orthogonal to this.

The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet)

That sounds worrying. Wonder if we do this anywhere else around the DWARF AST parser. And is there any safeguarding against these recursive calls?

We have a bunch of calls like that, but I don't think they're a problem, as they would just trigger a regular type completion, compute the dwarf-guided layout and then get the size from that. The problem is really specific to this place in the code, where we've already completed the clang type, but we haven't created the layout for it.

Michael137 · 2024-10-01T13:21:20Z

Yea I agree, I think this makes things easier to reason about. And if the "no-DW_AT_byte_size on definition" case comes up, that can be addressed differently.

We could do that, but then we'd also need to do that in ParseStructureLikeDIE. And we may need to figure out what does it mean to occupy a non-whole number of bytes, so I'd say that's orthogonal to this.

Yup, since it wasn't there before, happy to defer it.

The problem is really specific to this place in the code, where we've already completed the clang type, but we haven't created the layout for it.

Ack

…ons (llvm#110648) LLDB code for using the type layout data from DWARF was not kicking in for types which were initially parsed from a declaration. The problem was in these lines of code: ``` if (type) layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8; ``` which determine the types layout size by getting the size from the lldb_private::Type object. The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet). The reasons why this hasn't been noticed so far are: - this is a relatively new bug. I haven't checked but I suspect it was introduced in the "delay type definition search" patchset from this summer -- if we search for the definition eagerly, we will always have a cached size value. - it requires the presence of another bug/issue, as otherwise the automatically computed layout will match the real thing. - it reproduces (much) more easily with -flimit-debug-info (though it is possible to trigger it without that flag). My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

labath requested review from clayborg and Michael137 October 1, 2024 10:56

labath requested a review from JDevlieghere as a code owner October 1, 2024 10:56

labath commented Oct 1, 2024

View reviewed changes

labath mentioned this pull request Oct 1, 2024

[libc++] Replace __compressed_pair with [[no_unique_address]] #76756

Merged

Michael137 approved these changes Oct 1, 2024

View reviewed changes

labath merged commit 15f9020 into llvm:main Oct 2, 2024
8 checks passed

labath deleted the layout branch October 2, 2024 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lldb/DWARF] Respect member layout for types parsed through declarations #110648

[lldb/DWARF] Respect member layout for types parsed through declarations #110648

labath commented Oct 1, 2024

labath Oct 1, 2024

Michael137 left a comment

labath commented Oct 1, 2024

Michael137 commented Oct 1, 2024

[lldb/DWARF] Respect member layout for types parsed through declarations #110648

[lldb/DWARF] Respect member layout for types parsed through declarations #110648

Conversation

labath commented Oct 1, 2024

labath Oct 1, 2024

Choose a reason for hiding this comment

Michael137 left a comment

Choose a reason for hiding this comment

labath commented Oct 1, 2024

Michael137 commented Oct 1, 2024