-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Fix complaints from ErrorProne static analysis #3380
Conversation
@parrt PTAL |
@@ -61,7 +61,7 @@ public ST getMessageTemplate(ANTLRMessage msg) { | |||
} | |||
if (msg.fileName != null) { | |||
String displayFileName = msg.fileName; | |||
if (format.equals("antlr")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow. weird that we never ran into this. seems like this would never be true.
thanks guys... |
I just push to commit to mimic the 2^31 fix for JavaScript; added a comment that java class file format might not require the same limit for other targets.. |
Hey, there is no such limit in JavaScript, it's plain text! I put 2 ^31 as max positive int = 2 gb |
But |
This code is in the tool, in Java |
The big lexer test shows that this value is not causing any problem. |
Yes, my mistake. But in Java it's also
It does not mean the value is correct and shouldn't be fixed. |
I don't quite remember how I ended up writing this misleading code, but what's certain is that the result is desired i.e. an array of readable strings, as opposed to a veeeeeryyyyy long string that no IDE is able to display nicely. |
Yes, it's possible is any runtime. It also depends on the size of generated code.
In this case, I don't understand such a big difference between JavaScript (32) and other runtimes (65535 / 3). It looks weird. |
Should I just make it |
I would just return 32 or 64. |
Do you mean 2 to the power 32 or 64? It's the len of the string not num bits right? |
yes: return 32; not return Math.pow(2, 32) |
sorry i am not following. why would we limit strings to 32 char? |
Because it's much more convenient when opening the generated file. Most words require 6 chars. The above is much easier to load by editors than if we don't limit: |
In this case, I suggest using the most widespread values of recommended line length in text editors: 80 or 120. Also, it's still unclear why the value for JavaScript differs from other runtimes, they also can be changed. |
okay, I spent a few minutes and looked at the source code again. I had completely forgotten what this does. It is a segment not the entire maximum length so it's fine to do whatever the target developer wants for a particular target I guess. @ericvergnaud I think that Java is a special case here because it has such fundamental limits on strings in the class file. Should we go for consistency across all targets other than Java and then make Java a special case? there's definitely something about the maximum length of a single string in a class file, which I guess is why we had to break them up... it also might slow things down if we had, say, 2000 strings in the class file versus one big one. The initialization of strings pulled from the class file is weird if I remember. |
ANTLR generator returns concatenation for serialized ATN for Java. Java compiler folds constant string concatenation to one single string for efficiency. So, class file contains one big string instead of concatenation. Thus, it does not make sense to take care of string length in Java source files. Also, string concatenation could be replaced with StringBuilder. But actually, I don't think it's a good idea. I don't think there is a limit in class file since we haven't encountered such a problem before. Maybe it's a limit on string in Java compiler itself, but not in class file. |
I remember the class file format limiting the length of any single string and I doubt that has changed for backward compatibility reasons. Maybe take a look at this to see if I'm still correct? https://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java https://stackoverflow.com/questions/45275732/increase-string-literal-length-limit |
The second link doesn't relate to Java because it's about the C language. The first link is also unlikely related to string size in class file, it's about bytecode:
Also, there are other quotes from the answer:
So, the current value of I've found another answer: https://stackoverflow.com/a/18361275/1046374:
Maybe the size of generated code was not so big all the time that's why we haven't encountered such a problem. |
4.4.7. The CONSTANT_Utf8_info Structure
The CONSTANT_Utf8_info structure is used to represent constant string values:
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
The items of the CONSTANT_Utf8_info structure are the following:
tag
The tag item of the CONSTANT_Utf8_info structure has the value CONSTANT_Utf8 (1).
length
The value of the length item gives the number of bytes in the bytes array (not the length of the resulting string). The strings in the CONSTANT_Utf8_info structure are not null-terminated.
bytes[]
The bytes array contains the bytes of the string. No byte may have the value (byte)0 or lie in the range (byte)0xf0 - (byte)0xff.
Envoyé de mon iPhone
… Le 4 déc. 2021 à 22:27, Ivan Kochurkin ***@***.***> a écrit :
The second link doesn't relate to Java because it's about the C language. The first link is also unlikely related to string size in class file, it's about bytecode:
It shows that the problem does not relate to string literals, but to array initializers.
Also, there are other quotes from the answer:
So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 ...U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-8 (i.e. U+10000 to U+10FFFF) take up 6 bytes each.
The Java Language Specification does not mention any limit for string literals:
So, the current value of 65535 / 3 looks correct for Java (ideally it should be checked). But if it's a limit in class file I'm not sure this limit affects result generation because of compiler string folding.
I've found another answer: https://stackoverflow.com/a/18361275/1046374:
Note that a String object can have up to 2^31 - 1 characters. The 2^16 -1 limit is for String literals; e.g. String constants that are embedded in the source code of a Java program.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
That converts into max string size = 65535 / 3
Envoyé de mon iPhone
… Le 5 déc. 2021 à 22:32, Wanadoo ***@***.***> a écrit :
4.4.7. The CONSTANT_Utf8_info Structure
The CONSTANT_Utf8_info structure is used to represent constant string values:
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
The items of the CONSTANT_Utf8_info structure are the following:
tag
The tag item of the CONSTANT_Utf8_info structure has the value CONSTANT_Utf8 (1).
length
The value of the length item gives the number of bytes in the bytes array (not the length of the resulting string). The strings in the CONSTANT_Utf8_info structure are not null-terminated.
bytes[]
The bytes array contains the bytes of the string. No byte may have the value (byte)0 or lie in the range (byte)0xf0 - (byte)0xff.
Envoyé de mon iPhone
>> Le 4 déc. 2021 à 22:27, Ivan Kochurkin ***@***.***> a écrit :
>>
>
> The second link doesn't relate to Java because it's about the C language. The first link is also unlikely related to string size in class file, it's about bytecode:
>
> It shows that the problem does not relate to string literals, but to array initializers.
>
> Also, there are other quotes from the answer:
>
> So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 ...U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-8 (i.e. U+10000 to U+10FFFF) take up 6 bytes each.
> The Java Language Specification does not mention any limit for string literals:
>
> So, the current value of 65535 / 3 looks correct for Java (ideally it should be checked). But if it's a limit in class file I'm not sure this limit affects result generation because of compiler string folding.
>
> I've found another answer: https://stackoverflow.com/a/18361275/1046374:
>
> Note that a String object can have up to 2^31 - 1 characters. The 2^16 -1 limit is for String literals; e.g. String constants that are embedded in the source code of a Java program.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or unsubscribe.
|
Heh, I think we left this discussion incomplete. we need to update the JavaScript per @ericvergnaud right? just trying to get my to do list together. |
I think a small size of strings is highly desirable because the ability to read code in an idea is worth the microseconds spent joining those strings once at startup. |
For python3 it says:
Should all but java be about 80 char? |
Please check out #3438 |
This pull request addresses issues identified by https://github.com/google/error-prone.