Textualize / rich Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API to support tracking progress while reading from a file #1759
Conversation
|
Hi Martin! For a moment there I thought I was on a different repos. This looks like a great addition. Would you mind updating the docs for this? A single sub-section under progress should do it. Thanks. |
|
Rebased against latest |
|
Hi @willmcgugan , any update on this? |
Codecov Report
@@ Coverage Diff @@
## master #1759 +/- ##
==========================================
- Coverage 99.46% 99.07% -0.40%
==========================================
Files 72 72
Lines 7339 7551 +212
==========================================
+ Hits 7300 7481 +181
- Misses 39 70 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
Hi Martin, Sorry about the delay! It's been on the backburner for a while. Love the idea, but I think we can refine the API a bit to make it easier to grok. Am I right in saying it is overloaded to either open and return a file / context manager and also to read from an open file? I suspect that will be tricky to explain to users. Could we separate those so we have What do you think? |
|
No worries, I also have things that have been lying around for so long and I don't always get myself up to speed, so I can definitely understand. At the moment the code will either wrap a file through a context-manager that does need to be closed, or wrap a file-like object in a context-manager that does not need to be closed, and i can understand this may be confusing. I can totally split this into two functions. |
|
Okay, I've split the code into two methods:
I also allowed skipping specifiying the with Progress() as progress:
# specify a total in a task beforehand
task_id = progress.add_task("Reading...", total=100)
with progress.wrap_file(file_handle, task_id=task_id) as reader:
pass
# specify a total directly
with progress.wrap_file(file_handle, total=100) as reader:
pass |
|
Okay, I've updated In addition, I've fixed the type annotations, and added overloaded annotations to determine the return type of |
|
@darrenburns Could you give this one a look through. It's a feature to wrap a file so that reading updates a progress bar. |
Made a few small comments, really liking it overall though. Nice idea
| Reading from a file | ||
| ~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| You can obtain a progress-tracking reader using the :meth:`~rich.progress.Progress.open` method by giving it a path. You can specify the number of bytes to be read, but by default :meth:`~rich.progress.Progress.open` will query the size of the file with :func:`os.stat`. You are responsible for closing the file, and you should consider using a *context* to make sure it is closed :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Progress.open API
|
Thanks @darrenburns ! Idea is actually coming from my Rust experience with |
|
Thanks for making those changes @althonos! Could you run |
|
Okay this should be it! I tried running |
A few more comments for your consideration
rich/progress.py
Outdated
| def readall(self) -> bytes: | ||
| block = self.handle.readall() # type: ignore | ||
| self.progress.advance(self.task, advance=len(block)) | ||
| return block # type: ignore | ||
|
|
||
| def readinto(self, b: Union[bytearray, memoryview, mmap]): # type: ignore | ||
| n = self.handle.readinto(b) # type: ignore | ||
| self.progress.advance(self.task, advance=n) | ||
| return n | ||
|
|
||
| def readline(self, size: int = -1) -> bytes: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the thinking behind these type: ignore comments?
As far as I can see,self.handle which is typed as a BinaryIO doesn't contain readinto or readall (those come from RawIOBase). mypys warnings seem valid to me here.
Indeed, calling readall on the _Reader instance will result in AttributeError: '_io.BufferedReader' object has no attribute 'readall'. Did you mean: 'readable'?
I'm wondering if these methods are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as far as I've seen while doing I/O related typing in Pyfilesystem2, the I/O typing is really inconsistent between the typing and the io modules.
Basically, when you open a file in binary mode with open, you get either an io.BufferedReader or an io.FileIO (if buffering is disabled), both of which have a readinto method, . However when mypy cannot determine the buffering statically, it will have open default to returning a BinaryIO, which doesn't have these methods, although the actual objects being returned will always have readinto:
That's why I decided to silence mypy here. I can definitely remove the readall implementation, but I'd leave the readinto, in particular because it can really improve performance in some cases by avoiding useless copies of byte data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will defer to your experience here, Martin! Regarding the readall, it is probably better to be defensive here even if it defies the typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's actually safer here to remove readall because I'll expect the majority of calls to progress.open to be buffered, so the returned io.BufferedReader won't have a readall method. It's also much more marginal than readinto in my experience, so I don't think you'll get a lot complaints that _Reader misses a readall method because there's very limited reason to use it instead of read() without arguments.
|
@althonos We added a Mypy switch recently that require the error code in any |
|
Thanks, @althonos ! This is a fantastic feature. Will do a release v soon. |
|
Don't mention it! As usual, thanks a lot both of you for the review, and very excited to use |
Type of changes
Checklist
Description
Hi Will! I've started using😉
richmore and more, so far it's a great experienceOne of my common use case for the
rich.progressmodule is tracking the progress made while reading from a file. In bioinformatics it's not rare to process gigabytes of files so I like to have an idea how long of a break I can take while processing my input data!Manually updating the progress bar works for certain things, but for instance, when I want to pass a file-like handle to an API that consumes it (e.g
json.load), there's no way to update the progress bar between each chunk. To do so, the only way I've found was to wrap the file-like object itself so that it updates the progress bar whenever the file is read from. This is something you could partially do withtqdm.tqdm.wrapattr.Since the
rich.progressAPI has no straightforward support for this (like it does withtrackfor an iterator), I made my own snippet for this, which ends up working well. I've been copying around in every project that usesrich, so I think it would make a good addition to the library!Changes
This PR adds the
rich.progress._Readerclass, which wraps a file-like object, and updates aProgressinstance while being read from. The class itself is private; to get a reader, you have to use the newProgress.readmethod, which takes either a path or a file-like object, and returns a file reader with progress support.This lets you do something like this (e.g. loading JSON-serialized data from a file, using
os.statto get the total number of bytes to be read):You can also directly pass a file-like object to
progress.read, in which case you must specify the total:In addition, I added a wrapper function
rich.progress.readlikerich.progress.trackwhich handles setting up a progress, so that you can get a progress reader in just two lines:Notes
rich.progress.readandProgress.read, I madetotalanintinstead of afloat, because it should be a number of bytes so a float would not make sense here.seekthe_Reader, it will also reset the position of the progress bar to the new position.