urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

openandclose · 2020-06-10T11:18:48Z

BPO	40938
Nosy	@orsenthil, @openandclose, @jaswdr

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-06-10.11:18:48.328>
labels = ['3.7', '3.8', 'type-bug', 'library', '3.9']
title = 'urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g)'
updated_at = <Date 2021-05-13.14:11:56.189>
user = 'https://github.com/openandclose'

bugs.python.org fields:

activity = <Date 2021-05-13.14:11:56.189>
actor = 'op368'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2020-06-10.11:18:48.328>
creator = 'op368'
dependencies = []
files = []
hgrepos = []
issue_num = 40938
keywords = []
message_count = 7.0
messages = ['371179', '393544', '393574', '393576', '393577', '393578', '393583']
nosy_count = 3.0
nosy_names = ['orsenthil', 'op368', 'jaswdr']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue40938'
versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

Linked PRs

openandclose · 2020-06-10T11:18:48Z

path 'g' in 'http:g' becomes '/g'.

    >>> urlsplit('http:g')
    SplitResult(scheme='http', netloc='', path='g', query='', fragment='')
    >>> urlunsplit(urlsplit('http:g'))
    'http:///g'
    >>> urlsplit('http:///g')
    SplitResult(scheme='http', netloc='', path='/g', query='', fragment='')

    >>> urljoin('http://a/b/c/d', 'http:g')
    'http://a/b/c/g'
    >>> urljoin('http://a/b/c/d', 'http:///g')
    'http://a/g'

The problematic part of the code is:

    def urlunsplit(components):
        [...]
        if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
--->        if url and url[:1] != '/': url = '/' + url
            url = '//' + (netloc or '') + url

Note also that urllib has decided on the interpretation of 'http:g' (in test).

    def test_RFC3986(self):
        [...]
        #self.checkJoin(RFC3986_BASE, 'http:g','http:g') # strict parser
        self.checkJoin(RFC3986_BASE, 'http:g','http://a/b/c/g') #relaxed parser

jaswdr · 2021-05-12T19:09:30Z

@op368 I don't think that this is a bug, [1] literally uses this exact example and shows the expected behaviour.

[1] https://datatracker.ietf.org/doc/html/rfc3986#section-5.4.2

openandclose · 2021-05-13T11:57:17Z

hello, @jaswdr, but I can't understand what's wrong with my point.
What is 'the expected behaviour'?

jaswdr · 2021-05-13T12:37:57Z

@op368 as far as I can see, regarding of any miss interpretation, yes, the RFC has this section:

  "http:g"        =  "http:g"         ; for strict parsers
                  /  "http://a/b/c/g" ; for backward compatibility

What I can understand is that for "http:g" it will be translated to "http:///g" because of backward compatibility, this seems to be an edge case for the parser, since the RFC text also mention that this should be avoided.

openandclose · 2021-05-13T13:08:12Z

'http:///g' has absolute path '/g',
and as urljoin shows:

    >>> urljoin('http://a/b/c/d', 'http:///g')
    'http://a/g'  # 'a' is netloc

So you are proposing third interpretation.

  "http:g"        =  "http:g"         ; for strict parsers
                  /  "http://a/b/c/g" ; for backward compatibility
                  /  "http://a/g"     ; (yours)

jaswdr · 2021-05-13T13:10:32Z

Not exactly, in the RFC example they use a/b/c for the path, but when using http:g there is no nested path, so it should be http:///g, no?

openandclose · 2021-05-13T14:11:56Z

I tried hard (even read RFC1630),
but I think no.

serhiy-storchaka · 2024-08-20T15:00:06Z

The problem is that urljoin(base, urlunsplit(urlsplit(relurl))) is not equivalent to urljoin(base, relurl). I think that the problem cannot be solved without large breaking changes (like #67041), but this particular case can be mitigated. urlunsplit) can be made preserving non-empty relative path with non-empty scheme and empty netloc.

An exception for empty path was added to not break an existing test for #104139. All other existing tests are passed with this change.

…b.parse.urlunsplit()

…e.urlunsplit() (GH-123179)

…b.parse.urlunsplit() (pythonGH-123179) (cherry picked from commit 90c892e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

…b.parse.urlunsplit() (pythonGH-123179)

…ib.parse.urlunsplit() (GH-123179) (#123187) gh-85110: Preserve relative path in URL without netloc in urllib.parse.urlunsplit() (GH-123179) (cherry picked from commit 90c892e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

…ib.parse.urlunsplit() (GH-123179) (#123188) (cherry picked from commit 90c892e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

openandclose mannequin added 3.8 only security fixes type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jun 10, 2020

wyz23x2 mannequin added 3.7 (EOL) end of life 3.9 only security fixes labels Aug 11, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka mentioned this issue Dec 29, 2023

Wrong formatting of url in urlunsplit() function when used with _replace function to change scheme #99901

Closed

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 20, 2024

pythongh-85110: Preserve relative path in URL without netloc in urlli…

2889a7b

…b.parse.urlunsplit()

bedevere-app bot mentioned this issue Aug 20, 2024

gh-85110: Preserve relative path in URL without netloc in urllib.parse.urlunsplit() #123179

Merged

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 20, 2024

pythongh-85110: Preserve relative path in URL without netloc in urlli…

dfcf523

…b.parse.urlunsplit()

serhiy-storchaka added a commit that referenced this issue Aug 21, 2024

gh-85110: Preserve relative path in URL without netloc in urllib.pars…

90c892e

…e.urlunsplit() (GH-123179)

This was referenced Aug 21, 2024

[3.13] gh-85110: Preserve relative path in URL without netloc in urllib.parse.urlunsplit() (GH-123179) #123187

Merged

[3.12] gh-85110: Preserve relative path in URL without netloc in urllib.parse.urlunsplit() (GH-123179) #123188

Merged

blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024

pythongh-85110: Preserve relative path in URL without netloc in urlli…

a8199cd

…b.parse.urlunsplit() (pythonGH-123179)

ambv pushed a commit that referenced this issue Sep 6, 2024

[3.12] gh-85110: Preserve relative path in URL without netloc in urll…

0edfc66

…ib.parse.urlunsplit() (GH-123179) (#123188) (cherry picked from commit 90c892e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

ambv closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

openandclose mannequin commented Jun 10, 2020 •

edited by bedevere-app bot

Loading

openandclose mannequin commented Jun 10, 2020

jaswdr mannequin commented May 12, 2021

openandclose mannequin commented May 13, 2021

jaswdr mannequin commented May 13, 2021

openandclose mannequin commented May 13, 2021

jaswdr mannequin commented May 13, 2021

openandclose mannequin commented May 13, 2021

serhiy-storchaka commented Aug 20, 2024

urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

Comments

openandclose mannequin commented Jun 10, 2020 • edited by bedevere-app bot Loading

Linked PRs

openandclose mannequin commented Jun 10, 2020

jaswdr mannequin commented May 12, 2021

openandclose mannequin commented May 13, 2021

jaswdr mannequin commented May 13, 2021

openandclose mannequin commented May 13, 2021

jaswdr mannequin commented May 13, 2021

openandclose mannequin commented May 13, 2021

serhiy-storchaka commented Aug 20, 2024

openandclose mannequin commented Jun 10, 2020 •

edited by bedevere-app bot

Loading