-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110
Comments
|
path 'g' in 'http:g' becomes '/g'. >>> urlsplit('http:g')
SplitResult(scheme='http', netloc='', path='g', query='', fragment='')
>>> urlunsplit(urlsplit('http:g'))
'http:///g'
>>> urlsplit('http:///g')
SplitResult(scheme='http', netloc='', path='/g', query='', fragment='')
>>> urljoin('http://a/b/c/d', 'http:g')
'http://a/b/c/g'
>>> urljoin('http://a/b/c/d', 'http:///g')
'http://a/g'The problematic part of the code is: def urlunsplit(components):
[...]
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
---> if url and url[:1] != '/': url = '/' + url
url = '//' + (netloc or '') + urlNote also that urllib has decided on the interpretation of 'http:g' (in test). def test_RFC3986(self):
[...]
#self.checkJoin(RFC3986_BASE, 'http:g','http:g') # strict parser
self.checkJoin(RFC3986_BASE, 'http:g','http://a/b/c/g') #relaxed parser |
|
@op368 I don't think that this is a bug, [1] literally uses this exact example and shows the expected behaviour. [1] https://datatracker.ietf.org/doc/html/rfc3986#section-5.4.2 |
|
hello, @jaswdr, but I can't understand what's wrong with my point. |
|
@op368 as far as I can see, regarding of any miss interpretation, yes, the RFC has this section: What I can understand is that for "http:g" it will be translated to "http:///g" because of backward compatibility, this seems to be an edge case for the parser, since the RFC text also mention that this should be avoided. |
|
'http:///g' has absolute path '/g', >>> urljoin('http://a/b/c/d', 'http:///g')
'http://a/g' # 'a' is netlocSo you are proposing third interpretation. |
|
Not exactly, in the RFC example they use a/b/c for the path, but when using http:g there is no nested path, so it should be http:///g, no? |
|
I tried hard (even read RFC1630), |
|
The problem is that An exception for empty path was added to not break an existing test for #104139. All other existing tests are passed with this change. |
…b.parse.urlunsplit()
…b.parse.urlunsplit()
…b.parse.urlunsplit() (pythonGH-123179) (cherry picked from commit 90c892e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…b.parse.urlunsplit() (pythonGH-123179) (cherry picked from commit 90c892e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…b.parse.urlunsplit() (pythonGH-123179)
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: