Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pathlib Paths #3149

Open
tadejsv opened this issue Aug 11, 2021 · 7 comments
Open

Support pathlib Paths #3149

tadejsv opened this issue Aug 11, 2021 · 7 comments

Comments

@tadejsv
Copy link
Member

@tadejsv tadejsv commented Aug 11, 2021

Right now, there does not seem to be a very good support for pathlib.Path objects in jina, for example

BaseExecutor.load_config(Path(__file__).parent / 'config.yml')

will throw an error (as jina tries to apply regex to Path). In (base) python, path strings and Path objects can be used interchangably, jina should follow this practice.

@JoanFM
Copy link
Member

@JoanFM JoanFM commented Aug 20, 2021

@tadejsv may u point to where jina tries to apply the regex, so that it may be easier for a community member to tackle this issue?

@tadejsv
Copy link
Member Author

@tadejsv tadejsv commented Aug 20, 2021

Yes, sure. One of the places it happens is for here

jina/jina/helper.py

Lines 1181 to 1189 in 706b41c

def is_yaml_filepath(val) -> bool:
"""
Check if the file is YAML file.
:param val: Path of target file.
:return: True if the file is YAML else False.
"""
r = r'^[/\w\-\_\.]+.ya?ml$'
return re.match(r, val.strip()) is not None

@Bharat123rox
Copy link

@Bharat123rox Bharat123rox commented Aug 20, 2021

I would like to take on this issue but I'm unaware of the process here, should I ask to be assigned or I can directly submit a PR?

@JoanFM
Copy link
Member

@JoanFM JoanFM commented Aug 20, 2021

@Bharat123rox we assigned this to you. You can ask for any help

@Bharat123rox
Copy link

@Bharat123rox Bharat123rox commented Aug 22, 2021

Right now, there does not seem to be a very good support for pathlib.Path objects in jina, for example

BaseExecutor.load_config(Path(__file__).parent / 'config.yml')

will throw an error (as jina tries to apply regex to Path). In (base) python, path strings and Path objects can be used interchangably, jina should follow this practice.

@tadejsv @JoanFM
Firstly, is there some sort of script/command I can come up with to search and find more (or possibly all?) occurrences or usages of Path objects?

I'm thinking of two possible approaches here to solve the problem:

  1. (Preferred by me) Convert all Path objects to strings inside each occurrence / each Jina function, so regex and other functions truly intended for strings will work as-is and we will not need to refactor the functions other than adding something like:
s = str(s) if isinstance(s, pathlib.Path) else s
  1. Continue to accept the path-like object as-is inside the Jina functions and refactor the base Python functions to their Path equivalent (For eg:- Path.match() for regex) but here in this case, it's likely that an equivalent Path function may not exist for some cases, so this may be better only for specific use-cases like a regex

Would you please be able to elaborate and let me know which approach would be better for this?

@tadejsv
Copy link
Member Author

@tadejsv tadejsv commented Aug 22, 2021

@Bharat123rox The proper approach is the second one, although it is harder. It might even be worth considering converting all string paths into Path objects, although I would not do this in this PR. Case in point: the regex used in the function I showed you is an overkill, all that needs to be checked is if the path ends in .yaml or .yml. Using Path this would be

return val.suffix in ['.yaml', '.yml']

@Bharat123rox
Copy link

@Bharat123rox Bharat123rox commented Sep 2, 2021

Hey @tadejsv, in case we are going with the 2nd approach, then does that mean we are only going to fix methods where there are simple string regexes or specific use-cases where Path manipulations can be directly used? In other words, should we go through all functions accepting strings and try to refactor only those which can be replaced by a Path equivalent method like the above?

(P.S: I'm still working on this, it's been a hectic week and I hope to get a PR by this weekend!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants