charset GB18030 don't show all GB2312 char #264

dbitc · 2021-11-16T07:43:05Z

charset GB18030 don't show all GB2312 char, please add GB2312 charset

error log:
mytest(support)#factory
mytest(support)#
请整机重启，在主控(备主控)ctrl下删除配置文件 /config.text .
是否 绦こР馐 [N/y]?

kingToolbox · 2021-11-16T08:53:15Z

Thank you, but I think GB2312 is backward compatible with GB18030.

Can you tell me what are the garbled characters? I can check the specific encoding of these characters.

dbitc · 2021-11-16T10:48:27Z

fyi.
请整机重启，在主控(备主控)ctrl下?除配置文件 /config.text .
是否要继续工厂测试 [N/y]?

kingToolbox · 2021-11-16T13:14:03Z

Thank you, I will investigate this issue and any progress will be updated here.

bestv5 · 2021-11-17T02:26:00Z

Thank you, I will investigate this issue and any progress will be updated here.

please add GBK GB2312 GB18030 support, now ,session config only choose GB18030.

编码标准-GB2312 GBK GB18030

kingToolbox · 2021-11-17T12:22:43Z

Thank you @bestv5 for more detailed information. I have analyzed this problem. The reason is that one byte of the log data is missing, which causes the subsequent text to be decoded incorrectly. Let us compare the garbled text and the original text in detail.

Garbled Text:

	是	否	(0020)	(E01B)	绦	(E24C)	こ	Р	馐
GB2312	CAC7	B7F1	20	invalid	CCD0	invalid	A4B3	A7B2	E2CA
GBK	CAC7	B7F1	20	invalid	CCD0	invalid	A4B3	A7B2	E2CA
GB18030	CAC7	B7F1	20	AABC	CCD0	F8B9	A4B3	A7B2	E2CA
UTF-16BE	662F	5426	0020	E01B	7EE6	E24C	3053	0420	9990

Original Text:

	是	否	要	继	续	工	厂	测	试
GB2312	CAC7	B7F1	D2AA	BCCC	D0F8	B9A4	B3A7	B2E2	CAD4
GBK	CAC7	B7F1	D2AA	BCCC	D0F8	B9A4	B3A7	B2E2	CAD4
GB18030	CAC7	B7F1	D2AA	BCCC	D0F8	B9A4	B3A7	B2E2	CAD4
UTF-16BE	662F	5426	8981	7EE7	7EED	5DE5	5382	6D4B	8BD5

If we join and align the above GB18030 bytes, we will get:

	GB18030 Bytes
Garbled Text	`CA C7` `B7 F1` `20` `AA BC` `CC D9` `F8 B9` `A4 B3` `A7 B2` `E2 CA`
Original Text	`CA C7` `B7 F1` `D2 AA` `BC CC` `D9 F8` `B9 A4` `B3 A7` `B2 E2` `CA D4`

Obviously, it is because D2 was incorrectly stored as 20, which led to the wrong encoding of the entire text. Therefore, there are the following questions that need to be confirmed：

Is the text displayed on the screen correct or garbled? I guest it is correct.
Are the bytes stored in the log file correct or garbled? This needs to be viewed with a hex editor or sent to me for check. Just need to confirm whether the byte is D2 or 20.
If the bytes in the log is correct, will it be a decoding error when the text editor opens the log? This can be verified by changing to another text editor.

In addition, it can be concluded from the table above that GB2312 and GBK are completely backward compatible with GB18030. But if illegal characters appear, there is still a difference, so I will add GB2312 and GBK to the new WindTerm_2.2.0 version, but this is probably not the cause of this issue. For most use cases, GB18030 is sufficient and the best choice.

dbitc · 2021-11-17T13:03:54Z

error show mode
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#
mytest(support)#fa
mytest(support)#factory
ytest(support)#

请整机重启，在主控(备主控)ctrl下删除配置文件 /config.text .
是否 绦こР馐 [N/y]?

error hex mode:

dbitc · 2021-11-17T14:05:23Z

right assic mode:
mytest(support)#factory
mytest(support)#

请整机重启，在主控(备主控)ctrl下?除配置文件 /config.text .
是否要继续工厂测试 [N/y]?

ASCII one byte
GB2312 one or two bytes
GBK one or two bytes
GB18030 one or two or four bytes // include GB18030 2000 and GB18030 2005
eg:

dbitc · 2021-11-17T14:16:31Z

"请" is code: E8 AF B7.
A Chinese character is encoded by 3 bytes
why?

kingToolbox · 2021-11-17T15:09:34Z

Each character will be encoded as UTF-8 when it is displayed to facilitate the searching, coloring, folding, etc. of the text. This is why 请 is encoded as EB AF B7. Otherwise, different encodings need to be processed, which will increase complexity and reduce performance.

If your screenshot is the text displayed in WindTerm, then I guess something went wrong when receiving and parsing the text. Can you collect the debug log for me, if possible, the steps are as follows:

Open your session.
Use the second button on the toolbar to disconnect the session.
Check the menu item Menubar - Mode (7) - Debug Mode.
Use the third button on the toolbar to reconnect the session.
Execute your command until 是否 绦こР馐 [N/y]? is displayed, uncheck the menu item Menubar - Mode (7) - Debug Mode.
Click the menu item Right click menu - Log - Open Log Folder.
Open the folder debug and paste your log file in your comment here. You can zip it before upload.

The log content is in plain text and will not contain any private information. You can use any editor to view it. Thank you.

dbitc · 2021-11-17T15:50:25Z

2021-11-17_23.37.48.zip
fyi.

dbitc · 2021-11-17T16:47:32Z

// 是否要继续工厂测试
expected ： E698AF E590A6 E8A681 E7BBA7 E7BBAD E5B7A5 E58E82 E6B58B E8AF95
error log : E698AF E590A6 20 EE80 9B E7BBA6 EE898C E38193 D0A0 E9A690 20 20

kingToolbox · 2021-11-17T16:55:13Z

Thank you very much for providing the log, which is of great help in solving the problem. I have analyzed the log and found the cause of the problem.

When the server sends .\r\n是否要继续工厂 (2E 0D 0A CAC7 B7F1 D2AA BCCC D0F8 B9A4 B3A7), the server splits it into two 8-byte packets and send them. One is 2E 0D 0A CAC7 B7F1 D2, the other is AA BCCC D0F8 B9A4 B3. Obviously 要 (D2AA) is divided into two patrs D2 and AA. WindTerm had already taken this exception into account, but I don't know why it was not handled correctly here. After receiving the D2, it is directly discarded due to its incompleteness. As a result, the subsequent text became garbled.

I will fix this issue as soon as possible and release it in the WindTerm_2.2.0 version, which will be released today or tomorrow.

kingToolbox · 2021-11-17T17:12:10Z

Of course, the GB2312 and GBK you mentioned will also be added.

By taking this opportunity, other single-byte character sets will be added too, such as hp-roman8, IBM850, IBM866, IBM874, KOI8-U, macintosh, TSCII, TIS-620, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258 and WINSAMI2 etc. 😄

kingToolbox · 2021-11-18T20:40:29Z

The new Windterm_2.2.0 version has been released, which not only fixes this problem, but also added GB2312, GBK and many single-byte character sets. Please download and check it, thank you.

dbitc · 2021-11-19T04:58:53Z

Windterm_2.2.0 version test ok.

log:

config GB2312.

mytest(support)#factory
est(support)#

请整机重启，在主控(备主控)ctrl下删除配置文件 /config.text .
是否要继续工厂测试 [N/y]?

kingToolbox · 2021-11-19T13:16:41Z

I am glad that this issue has finally been resolved. Thank you very much for your great assistance, feedback and patience on this issue. Also thank @bestv5 for the great help. If you have any feature requests or find bugs, you are welcome to file a new issue.

dbitc closed this as completed Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

charset GB18030 don't show all GB2312 char #264

charset GB18030 don't show all GB2312 char #264

dbitc commented Nov 16, 2021 •

edited

Loading

kingToolbox commented Nov 16, 2021

dbitc commented Nov 16, 2021

kingToolbox commented Nov 16, 2021

bestv5 commented Nov 17, 2021

kingToolbox commented Nov 17, 2021

dbitc commented Nov 17, 2021 •

edited

Loading

dbitc commented Nov 17, 2021

dbitc commented Nov 17, 2021

kingToolbox commented Nov 17, 2021

dbitc commented Nov 17, 2021

dbitc commented Nov 17, 2021 •

edited

Loading

kingToolbox commented Nov 17, 2021

kingToolbox commented Nov 17, 2021

kingToolbox commented Nov 18, 2021

dbitc commented Nov 19, 2021

kingToolbox commented Nov 19, 2021

charset GB18030 don't show all GB2312 char #264

charset GB18030 don't show all GB2312 char #264

Comments

dbitc commented Nov 16, 2021 • edited Loading

kingToolbox commented Nov 16, 2021

dbitc commented Nov 16, 2021

kingToolbox commented Nov 16, 2021

bestv5 commented Nov 17, 2021

kingToolbox commented Nov 17, 2021

dbitc commented Nov 17, 2021 • edited Loading

dbitc commented Nov 17, 2021

dbitc commented Nov 17, 2021

kingToolbox commented Nov 17, 2021

dbitc commented Nov 17, 2021

dbitc commented Nov 17, 2021 • edited Loading

kingToolbox commented Nov 17, 2021

kingToolbox commented Nov 17, 2021

kingToolbox commented Nov 18, 2021

dbitc commented Nov 19, 2021

kingToolbox commented Nov 19, 2021

dbitc commented Nov 16, 2021 •

edited

Loading

dbitc commented Nov 17, 2021 •

edited

Loading

dbitc commented Nov 17, 2021 •

edited

Loading