Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb/DWARF] Respect member layout for types parsed through declarations #110648

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

labath
Copy link
Collaborator

@labath labath commented Oct 1, 2024

LLDB code for using the type layout data from DWARF was not kicking in for types which were initially parsed from a declaration. The problem was in these lines of code:

  if (type)
    layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;

which determine the types layout size by getting the size from the lldb_private::Type object. The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet). The reasons why this hasn't been noticed so far are:

  • this is a relatively new bug. I haven't checked but I suspect it was introduced in the "delay type definition search" patchset from this summer -- if we search for the definition eagerly, we will always have a cached size value.
  • it requires the presence of another bug/issue, as otherwise the automatically computed layout will match the real thing.
  • it reproduces (much) more easily with -flimit-debug-info (though it is possible to trigger it without that flag).

My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

LLDB code for using the type layout data from DWARF was not kicking in
for types which were initially parsed from a declaration. The problem
was in these lines of code:

```
  if (type)
    layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;
```

which determine the types layout size by getting the size from the
lldb_private::Type object. The problem is that if the type object does
not have this information cached, this request can trigger another
(recursive) request to lay out/complete the type. This one, somewhat
surprisingly, succeeds, but does that without the type layout
information (because it hasn't been computed yet). The reasons why this
hasn't been noticed so far are:
- this is a relatively new bug. I haven't checked but I suspect it was
  introduced in the "delay type definition search" patchset from this
  summer -- if we search for the definition eagerly, we will always have
  a cached size value.
- it requires the presence of another bug/issue, as otherwise the
  automatically computed layout will match the real thing.
- it reproduces (much) more easily with -flimit-debug-info (though it is
  possible to trigger it without that flag).

My fix consists of always fetching type size information from DWARF
(which so far existed as a fallback path). I'm not quite sure why this
code was there in the first place (the code goes back to before the
Great LLDB Reformat), but I don't believe it is necessary, as the type
size (for types parsed from definition DIEs) is set exactly from this
attribute (in ParseStructureLikeDIE).
@@ -251,6 +253,7 @@ c1:
.LB1:
.byte 6 # Abbrev [6] DW_TAG_class_type
.asciz "B1" # DW_AT_name
.byte 8 # DW_AT_byte_size
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this was necessary because lldb crashed without it (in llvm record layout code). Removal of the check was hiding this problem by recomputing the size without lldb's input. While this is something that we may to fix, I don't think we're likely to run into this situation (structure definition without a size attribute) outside of hyper-reduced test cases.

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, given we end up here only for definition DIEs. (did have couple of questions though)

My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

Was it possibly protecting against cases where we generate structure definitions without a size attribute? (as you allude to in your test comment)? Reporting back a ExternalLayout::Size of 0 back to the RecordLayoutBuilder seems problematic. If Alignment == 0, the RecordLayoutBuilder knows to infer it. But we don't have that for Size, so it just takes it at face-value. I don't know if that's something we should account for. Probably not?

Side-note, should we also check DW_AT_bit_size here? The DWARFv5 spec says:

A structure type, union type or class type entry may have either a
DW_AT_byte_size or a DW_AT_bit_size attribute (see Section 2.21 on page 56),
whose value is the amount of storage needed to hold an instance of the structure,
union or class type, including any padding.

The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet)

That sounds worrying. Wonder if we do this anywhere else around the DWARF AST parser. And is there any safeguarding against these recursive calls?

@labath
Copy link
Collaborator Author

labath commented Oct 1, 2024

Seems reasonable to me, given we end up here only for definition DIEs. (did have couple of questions though)

My fix consists of always fetching type size information from DWARF (which so far existed as a fallback path). I'm not quite sure why this code was there in the first place (the code goes back to before the Great LLDB Reformat), but I don't believe it is necessary, as the type size (for types parsed from definition DIEs) is set exactly from this attribute (in ParseStructureLikeDIE).

Was it possibly protecting against cases where we generate structure definitions without a size attribute? (as you allude to in your test comment)?

I can't disprove that, but I think it would be a very roundabout way of achieving that. To support the scenario of missing DW_AT_byte_size, I'd probably do something like set the size field to max(offsetof(Struct, field)+sizeof(Struct::field) for field in Struct), or change the record layout builder to infer the size as well. If i had to guess, I'd say this was an attempt to avoid parsing the attributes of the record DIE (again) in the completion step. In a world where where the size was already parsed in the parsing step, I can see how accessing the parsed value could be considered faster (though probably not measurably) than going to DWARF.

Reporting back a ExternalLayout::Size of 0 back to the RecordLayoutBuilder seems problematic. If Alignment == 0, the RecordLayoutBuilder knows to infer it. But we don't have that for Size, so it just takes it at face-value. I don't know if that's something we should account for. Probably not?

In principle I would say that we should, but I think this problem is in the same bucket as compiler putting a nonsensical (small) value in the DW_AT_byte_size field. In theory that should not crash lldb, but we're not likely to run into an actual compiler that does that.

Side-note, should we also check DW_AT_bit_size here? The DWARFv5 spec says:

A structure type, union type or class type entry may have either a
DW_AT_byte_size or a DW_AT_bit_size attribute (see Section 2.21 on page 56),
whose value is the amount of storage needed to hold an instance of the structure,
union or class type, including any padding.

We could do that, but then we'd also need to do that in ParseStructureLikeDIE. And we may need to figure out what does it mean to occupy a non-whole number of bytes, so I'd say that's orthogonal to this.

The problem is that if the type object does not have this information cached, this request can trigger another (recursive) request to lay out/complete the type. This one, somewhat surprisingly, succeeds, but does that without the type layout information (because it hasn't been computed yet)

That sounds worrying. Wonder if we do this anywhere else around the DWARF AST parser. And is there any safeguarding against these recursive calls?

We have a bunch of calls like that, but I don't think they're a problem, as they would just trigger a regular type completion, compute the dwarf-guided layout and then get the size from that. The problem is really specific to this place in the code, where we've already completed the clang type, but we haven't created the layout for it.

@Michael137
Copy link
Member

Yea I agree, I think this makes things easier to reason about. And if the "no-DW_AT_byte_size on definition" case comes up, that can be addressed differently.

We could do that, but then we'd also need to do that in ParseStructureLikeDIE. And we may need to figure out what does it mean to occupy a non-whole number of bytes, so I'd say that's orthogonal to this.

Yup, since it wasn't there before, happy to defer it.

The problem is really specific to this place in the code, where we've already completed the clang type, but we haven't created the layout for it.

Ack

@labath labath merged commit 15f9020 into llvm:main Oct 2, 2024
8 checks passed
@labath labath deleted the layout branch October 2, 2024 09:14
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Oct 2, 2024
…ons (llvm#110648)

LLDB code for using the type layout data from DWARF was not kicking in
for types which were initially parsed from a declaration. The problem
was in these lines of code:

```
  if (type)
    layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;
```

which determine the types layout size by getting the size from the
lldb_private::Type object. The problem is that if the type object does
not have this information cached, this request can trigger another
(recursive) request to lay out/complete the type. This one, somewhat
surprisingly, succeeds, but does that without the type layout
information (because it hasn't been computed yet). The reasons why this
hasn't been noticed so far are:
- this is a relatively new bug. I haven't checked but I suspect it was
introduced in the "delay type definition search" patchset from this
summer -- if we search for the definition eagerly, we will always have a
cached size value.
- it requires the presence of another bug/issue, as otherwise the
automatically computed layout will match the real thing.
- it reproduces (much) more easily with -flimit-debug-info (though it is
possible to trigger it without that flag).

My fix consists of always fetching type size information from DWARF
(which so far existed as a fallback path). I'm not quite sure why this
code was there in the first place (the code goes back to before the
Great LLDB Reformat), but I don't believe it is necessary, as the type
size (for types parsed from definition DIEs) is set exactly from this
attribute (in ParseStructureLikeDIE).
VitaNuo pushed a commit to VitaNuo/llvm-project that referenced this pull request Oct 2, 2024
…ons (llvm#110648)

LLDB code for using the type layout data from DWARF was not kicking in
for types which were initially parsed from a declaration. The problem
was in these lines of code:

```
  if (type)
    layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;
```

which determine the types layout size by getting the size from the
lldb_private::Type object. The problem is that if the type object does
not have this information cached, this request can trigger another
(recursive) request to lay out/complete the type. This one, somewhat
surprisingly, succeeds, but does that without the type layout
information (because it hasn't been computed yet). The reasons why this
hasn't been noticed so far are:
- this is a relatively new bug. I haven't checked but I suspect it was
introduced in the "delay type definition search" patchset from this
summer -- if we search for the definition eagerly, we will always have a
cached size value.
- it requires the presence of another bug/issue, as otherwise the
automatically computed layout will match the real thing.
- it reproduces (much) more easily with -flimit-debug-info (though it is
possible to trigger it without that flag).

My fix consists of always fetching type size information from DWARF
(which so far existed as a fallback path). I'm not quite sure why this
code was there in the first place (the code goes back to before the
Great LLDB Reformat), but I don't believe it is necessary, as the type
size (for types parsed from definition DIEs) is set exactly from this
attribute (in ParseStructureLikeDIE).
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Oct 3, 2024
…ons (llvm#110648)

LLDB code for using the type layout data from DWARF was not kicking in
for types which were initially parsed from a declaration. The problem
was in these lines of code:

```
  if (type)
    layout_info.bit_size = type->GetByteSize(nullptr).value_or(0) * 8;
```

which determine the types layout size by getting the size from the
lldb_private::Type object. The problem is that if the type object does
not have this information cached, this request can trigger another
(recursive) request to lay out/complete the type. This one, somewhat
surprisingly, succeeds, but does that without the type layout
information (because it hasn't been computed yet). The reasons why this
hasn't been noticed so far are:
- this is a relatively new bug. I haven't checked but I suspect it was
introduced in the "delay type definition search" patchset from this
summer -- if we search for the definition eagerly, we will always have a
cached size value.
- it requires the presence of another bug/issue, as otherwise the
automatically computed layout will match the real thing.
- it reproduces (much) more easily with -flimit-debug-info (though it is
possible to trigger it without that flag).

My fix consists of always fetching type size information from DWARF
(which so far existed as a fallback path). I'm not quite sure why this
code was there in the first place (the code goes back to before the
Great LLDB Reformat), but I don't believe it is necessary, as the type
size (for types parsed from definition DIEs) is set exactly from this
attribute (in ParseStructureLikeDIE).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants