Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python IDLE stops outputting a string on encountering a null character both for STDOUT and STDERR #119614

Closed
A-Paint-Brush opened this issue May 27, 2024 · 7 comments
Assignees
Labels
topic-tkinter type-bug An unexpected behavior, bug, or error

Comments

@A-Paint-Brush
Copy link

A-Paint-Brush commented May 27, 2024

Bug report

Bug description:

I noticed that if you pass a string containing a null character (\0) as an argument to the built-in print function, IDLE only outputs that string to the point of the null character, then it jumps to outputting the next argument to print as if that string has actually ended. This is not consistent with the behavior when using the Python interpreter directly in a terminal, where the null character is usually printed as a space-looking character and does not terminate the string. If further arguments to the same print call also contains null characters, the same thing happens (it'll skip to the next argument immediately on encountering a \0). If the sep string contains a null character, it'll also only print to the point of the null character before starting to print the next positional argument, as if the sep string had actually ended there. The same thing happens with the end string. And the same behavior happens when calling sys.stdout.write directly. Writing to STDERR has the same effect of stuff getting cut off on the null character. Basically anything that results in text being outputted to IDLE triggers this bug. Maybe tk's Text widget treats null characters as string terminators because tk is written in C? By the way, the bug works the same both in interactive mode and running a file. Below's an example, run it in IDLE and you should get the same erroneous output:

>>> print("hello\0world!")  # expected output: hello world!
hello
>>> print("wait", "what?", sep="\0, ")  # expected output: wait , what?
waitwhat?
>>> print("fo\0o", "\0bar", "ba\0z", "spa\0m", "ha\0m", "e\0ggs", end=" xyz\0zy\n")  # expected: fo o  bar ba z spa m ha m e ggs xyz zy
fo  ba spa ha e xyz
>>> import sys
>>> _ = sys.stdout.write("same\0here\n")  # expected output: same here
same
>>> _ = sys.stderr.write("in errors\0too\n")  # expected output: in errors too
in errors
>>> raise RuntimeError("something ba\0d happened.")
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    raise RuntimeError("something ba\0d happened.")
RuntimeError: something ba
>>> 

Version Info:
I have tested this on the 64 bit "Windows Installer" versions of Python 3.9.7 and 3.12.3. My OS is Windows 10 Version 22H2, build 19045.4412. The sys.version value on the 3.9.7 version is 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)], and the 3.12.3 version is 3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)]. Hope this helps.

CPython versions tested on:

3.9, 3.12

Operating systems tested on:

Windows

Linked PRs

@A-Paint-Brush A-Paint-Brush added the type-bug An unexpected behavior, bug, or error label May 27, 2024
@terryjreedy
Copy link
Member

I verified the behavior on 3.13 on Windows. And on Mac (at least the print part). It may be a tk/tkinter issue somehow but I am not sure. I hypothesized that it was a socket issue, but starting IDLE in CommandPrompt with -n (no subprocess or socket connection) resulted in the same clipping. @serhiy-storchaka Any ideas?

@A-Paint-Brush
Copy link
Author

I just did some testing, and it does appear to be a tk/tkinter problem. I wrote the below test script:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import tkinter as tk
root = tk.Tk()
root.geometry("300x300")
root.title("tk test")
tk.Label(root, text="hello\0world!").pack()
text = tk.Text(root)
text.pack(expand=True, fill="both")
text.insert("1.0", "foo bar \0baz spam ham eggs\n")
text.insert("2.0", "hello\0world!\n")
root.mainloop()

Both the label and the text widget exhibit the same bugged behavior of stopping on a null character. Running from Command Prompt with ".\tk_null_test.py", this is what the window looks like:
tk screenshot

@A-Paint-Brush
Copy link
Author

A-Paint-Brush commented May 28, 2024

Maybe we should try directly writing a tcl script next to test whether it's a problem in the tkinter binding or in the underlying tk?

@serhiy-storchaka
Copy link
Member

In 3.6:

>>> import tkinter
>>> b = tkinter.Button(text='abc\0def')
>>> b.tk.call(str(b), 'cget', '-text')
'abc\x00def'
>>> b.tk.eval(str(b) + ' cget -text')
'abc\x00def'

In 3.7:

>>> import tkinter
>>> b = tkinter.Button(text='abc\0def')
>>> b.tk.call(str(b), 'cget', '-text')
'abc\x00def'
>>> b.tk.eval(str(b) + ' cget -text')
'abc'

And only the portion before \0 is displayed in the GUI (if you add b.pack()).

This looks like a weird Tcl/Tk bug. It knows that there are characters past NUL, and cget can return them, but in other cases it truncates the string.

There are two internal representations of strings in Tcl. Before 3.7 all strings were created with Tcl_NewUnicodeObj() which uses a sequence of Tcl_Char encoded with UCS2/UTF16 or UTF32. In 3.7 some of strings are created with Tcl_NewStringObj() which uses a UTF8 encoded char* string. Different representations are needed on different platforms to represent non-BMP Unicode strings. Tcl_NewStringObj() is now always used for ASCII-only strings as a pure optimization. But it seems that strings created with Tcl_NewStringObj() are truncated at NUL in some cases.

We need to check whether this issue is solved in Tcl 8.7 or 9.0. If it is solved on their side, we do not need to do anything. Otherwise we will need to use some workarounds which will never be perfect: the null character can be removed or replaced with some weird sequence, and this will not be consistent across platforms and the way of using the string.

@terryjreedy
Copy link
Member

Are either 8.7 or 9.0 near release, so we can use them?

@terryjreedy terryjreedy reopened this May 28, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jun 23, 2024
…ers in Tkinter

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
@serhiy-storchaka
Copy link
Member

Nothing was changed in 8.6.14, 8.7 and 9.0.

I reported a bug in Tcl: https://core.tcl-lang.org/tcl/tktview/8c4082563d. But it seems that they are no going to change anything here. #120909 adds a workaround from our side.

serhiy-storchaka added a commit that referenced this issue Jun 24, 2024
… Tkinter (GH-120909)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 24, 2024
…ers in Tkinter (pythonGH-120909)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
(cherry picked from commit c38e2f6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 24, 2024
…ers in Tkinter (pythonGH-120909)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
(cherry picked from commit c38e2f6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Jun 24, 2024
…ters in Tkinter (GH-120909) (GH-120939)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
(cherry picked from commit c38e2f6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Jun 24, 2024
…ters in Tkinter (GH-120909) (GH-120938)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
(cherry picked from commit c38e2f6)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@terryjreedy
Copy link
Member

Thank you for fixing this non-trivial issue. For print('he\0lo'), IDLE and 3.12 REPL print \0 as a space. The current main REPL on Windows (debug free-thread build) skips the character. Not an issue for me.

mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
…ers in Tkinter (pythonGH-120909)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ers in Tkinter (pythonGH-120909)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ers in Tkinter (pythonGH-120909)

Now the null character is always represented as \xc0\x80 for
Tcl_NewStringObj().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-tkinter type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants