Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charset GB18030 don't show all GB2312 char #264

Closed
dbitc opened this issue Nov 16, 2021 · 16 comments
Closed

charset GB18030 don't show all GB2312 char #264

dbitc opened this issue Nov 16, 2021 · 16 comments

Comments

@dbitc
Copy link

dbitc commented Nov 16, 2021

charset GB18030 don't show all GB2312 char, please add GB2312 charset

error log:
mytest(support)#factory
mytest(support)#
请整机重启,在主控(备主控)ctrl下删除配置文件 /config.text .
是否 绦こР馐 [N/y]?

@kingToolbox
Copy link
Owner

Thank you, but I think GB2312 is backward compatible with GB18030.

Can you tell me what are the garbled characters? I can check the specific encoding of these characters.

@dbitc
Copy link
Author

dbitc commented Nov 16, 2021

fyi.
请整机重启,在主控(备主控)ctrl下?除配置文件 /config.text .
是否要继续工厂测试 [N/y]?

@kingToolbox
Copy link
Owner

Thank you, I will investigate this issue and any progress will be updated here.

@bestv5
Copy link

bestv5 commented Nov 17, 2021

Thank you, I will investigate this issue and any progress will be updated here.

please add GBK GB2312 GB18030 support, now ,session config only choose GB18030.

编码标准-GB2312 GBK GB18030

@kingToolbox
Copy link
Owner

Thank you @bestv5 for more detailed information. I have analyzed this problem. The reason is that one byte of the log data is missing, which causes the subsequent text to be decoded incorrectly. Let us compare the garbled text and the original text in detail.

Garbled Text:

(0020) (E01B) (E24C) Р
GB2312 CAC7 B7F1 20 invalid CCD0 invalid A4B3 A7B2 E2CA
GBK CAC7 B7F1 20 invalid CCD0 invalid A4B3 A7B2 E2CA
GB18030 CAC7 B7F1 20 AABC CCD0 F8B9 A4B3 A7B2 E2CA
UTF-16BE 662F 5426 0020 E01B 7EE6 E24C 3053 0420 9990

Original Text:

GB2312 CAC7 B7F1 D2AA BCCC D0F8 B9A4 B3A7 B2E2 CAD4
GBK CAC7 B7F1 D2AA BCCC D0F8 B9A4 B3A7 B2E2 CAD4
GB18030 CAC7 B7F1 D2AA BCCC D0F8 B9A4 B3A7 B2E2 CAD4
UTF-16BE 662F 5426 8981 7EE7 7EED 5DE5 5382 6D4B 8BD5

If we join and align the above GB18030 bytes, we will get:

GB18030 Bytes
Garbled Text CA C7 B7 F1 20 AA BC CC D9 F8 B9 A4 B3 A7 B2 E2 CA
Original Text CA C7 B7 F1 D2 AA BC CC D9 F8 B9 A4 B3 A7 B2 E2 CA D4

Obviously, it is because D2 was incorrectly stored as 20, which led to the wrong encoding of the entire text. Therefore, there are the following questions that need to be confirmed:

  • Is the text displayed on the screen correct or garbled? I guest it is correct.
  • Are the bytes stored in the log file correct or garbled? This needs to be viewed with a hex editor or sent to me for check. Just need to confirm whether the byte is D2 or 20.
  • If the bytes in the log is correct, will it be a decoding error when the text editor opens the log? This can be verified by changing to another text editor.

In addition, it can be concluded from the table above that GB2312 and GBK are completely backward compatible with GB18030. But if illegal characters appear, there is still a difference, so I will add GB2312 and GBK to the new WindTerm_2.2.0 version, but this is probably not the cause of this issue. For most use cases, GB18030 is sufficient and the best choice.

@dbitc
Copy link
Author

dbitc commented Nov 17, 2021

error show mode
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#fa
mytest(support)#factory
ytest(support)#

请整机重启,在主控(备主控)ctrl下删除配置文件 /config.text .
是否 绦こР馐 [N/y]?

error hex mode:
WXWorkCapture_16371541958798

@dbitc
Copy link
Author

dbitc commented Nov 17, 2021

right assic mode:
mytest(support)#factory
mytest(support)#

请整机重启,在主控(备主控)ctrl下?除配置文件 /config.text .
是否要继续工厂测试 [N/y]?

ASCII one byte
GB2312 one or two bytes
GBK one or two bytes
GB18030 one or two or four bytes // include GB18030 2000 and GB18030 2005
eg:
image

@dbitc
Copy link
Author

dbitc commented Nov 17, 2021

"请" is code: E8 AF B7.
A Chinese character is encoded by 3 bytes
why?

@kingToolbox
Copy link
Owner

Each character will be encoded as UTF-8 when it is displayed to facilitate the searching, coloring, folding, etc. of the text. This is why is encoded as EB AF B7. Otherwise, different encodings need to be processed, which will increase complexity and reduce performance.

If your screenshot is the text displayed in WindTerm, then I guess something went wrong when receiving and parsing the text. Can you collect the debug log for me, if possible, the steps are as follows:

  • Open your session.
  • Use the second button on the toolbar to disconnect the session.
  • Check the menu item Menubar - Mode (7) - Debug Mode.
  • Use the third button on the toolbar to reconnect the session.
  • Execute your command until 是否 绦こР馐 [N/y]? is displayed, uncheck the menu item Menubar - Mode (7) - Debug Mode.
  • Click the menu item Right click menu - Log - Open Log Folder.
  • Open the folder debug and paste your log file in your comment here. You can zip it before upload.

The log content is in plain text and will not contain any private information. You can use any editor to view it. Thank you.

@dbitc
Copy link
Author

dbitc commented Nov 17, 2021

2021-11-17_23.37.48.zip
fyi.

@dbitc
Copy link
Author

dbitc commented Nov 17, 2021

// 是 否 要 继 续 工 厂 测 试
expected : E698AF E590A6 E8A681 E7BBA7 E7BBAD E5B7A5 E58E82 E6B58B E8AF95
error log : E698AF E590A6 20 EE80 9B E7BBA6 EE898C E38193 D0A0 E9A690 20 20

@kingToolbox
Copy link
Owner

Thank you very much for providing the log, which is of great help in solving the problem. I have analyzed the log and found the cause of the problem.

When the server sends .\r\n是否要继续工厂 (2E 0D 0A CAC7 B7F1 D2AA BCCC D0F8 B9A4 B3A7), the server splits it into two 8-byte packets and send them. One is 2E 0D 0A CAC7 B7F1 D2, the other is AA BCCC D0F8 B9A4 B3. Obviously 要 (D2AA) is divided into two patrs D2 and AA. WindTerm had already taken this exception into account, but I don't know why it was not handled correctly here. After receiving the D2, it is directly discarded due to its incompleteness. As a result, the subsequent text became garbled.

I will fix this issue as soon as possible and release it in the WindTerm_2.2.0 version, which will be released today or tomorrow.

@kingToolbox
Copy link
Owner

Of course, the GB2312 and GBK you mentioned will also be added.

By taking this opportunity, other single-byte character sets will be added too, such as hp-roman8, IBM850, IBM866, IBM874, KOI8-U, macintosh, TSCII, TIS-620, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258 and WINSAMI2 etc. 😄

@kingToolbox
Copy link
Owner

The new Windterm_2.2.0 version has been released, which not only fixes this problem, but also added GB2312, GBK and many single-byte character sets. Please download and check it, thank you.

@dbitc
Copy link
Author

dbitc commented Nov 19, 2021

Windterm_2.2.0 version test ok.

log:

config GB2312.
image

mytest(support)#factory
est(support)#

请整机重启,在主控(备主控)ctrl下删除配置文件 /config.text .
是否要继续工厂测试 [N/y]?

@dbitc dbitc closed this as completed Nov 19, 2021
@kingToolbox
Copy link
Owner

I am glad that this issue has finally been resolved. Thank you very much for your great assistance, feedback and patience on this issue. Also thank @bestv5 for the great help. If you have any feature requests or find bugs, you are welcome to file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants