bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #60956] stylesheet and icon <link> elements not properly classified


From: anonymous
Subject: [bug #60956] stylesheet and icon <link> elements not properly classified as page requisites
Date: Wed, 21 Jul 2021 16:25:51 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0

URL:
  <https://savannah.gnu.org/bugs/?60956>

                 Summary: stylesheet and icon <link> elements not properly
classified as page requisites
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Wed 21 Jul 2021 08:25:49 PM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
                 Release: None
         Discussion Lock: Any
        Operating System: None
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: No

    _______________________________________________________

Details:

*Summary:*
stylesheet and icon <link> elements not properly classified as page
requisites

*The version of GNU Wget I was using:*
1.20.3 from
https://eternallybored.org/misc/wget/releases/wget-1.20.3-win64.zip
(None of the changes listed in NEWS for newer versions seem relevant to this
issue.)

*How I invoked wget:*

wget --debug --page-requisites
https://mdn.github.io/css-examples/alt-style-sheets/


*What I expected wget to do:*
fetch https://mdn.github.io/css-examples/alt-style-sheets/ and save as
index.html
fetch https://mdn.github.io/css-examples/alt-style-sheets/default.css and save
as default.css
fetch https://mdn.github.io/css-examples/alt-style-sheets/simple.css and save
as simple.css
fetch https://mdn.github.io/css-examples/alt-style-sheets/fancy.css and save
as fancy.css

*What wget did:*
fetch https://mdn.github.io/css-examples/alt-style-sheets/ and save as
index.html
fetch https://mdn.github.io/css-examples/alt-style-sheets/default.css and save
as default.css

*Output messages (Hopefully the relevant portion; the rest can be provided if
needed):*

[IRI Enqueuing
'https://mdn.github.io/css-examples/alt-style-sheets/default.css' with
'utf-8'
Not following due to 'link inline' flag:
https://mdn.github.io/css-examples/alt-style-sheets/simple.css
Not following due to 'link inline' flag:
https://mdn.github.io/css-examples/alt-style-sheets/fancy.css
Not following due to 'link inline' flag:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link


*Comments:*

The page used in the example above references three stylesheets:

<link href="default.css" rel="stylesheet" type="text/css" title="Default
Style">
<link href="simple.css" rel="alternate stylesheet" type="text/css"
title="Simple">
<link href="fancy.css" rel="alternate stylesheet" type="text/css"
title="Fancy">

Wget only considers the first one (with rel="stylesheet") to be a page
requisite as far as --page-requisites is concerned.  This seems incorrect
because the value of the rel attribute is specified by the HTML Standard
section 4.2.4 (The link element)
<https://html.spec.whatwg.org/multipage/semantics.html#attr-link-rel> to be an
unordered set of unique space-separated tokens.  All three in this example
include "stylesheet", so they should all be considered page requisites.

A similar problem occurs with favicons.  Wget seems to consider those
referenced with rel="shortcut icon" to be page requisites, but not those
referenced with rel="icon".  For an example of the latter, try the following:

wget --page-requisites https://savannah.gnu.org/

The relevant portion of the HTML is:

<link rel="icon" type="image/png" href="/images/Savannah.theme/icon.png" />

It seems that rel="shortcut icon" is actually deprecated but still allowed for
historical reasons.  See HTML Standard section 4.6.6.8 (Link type "icon")
<https://html.spec.whatwg.org/multipage/links.html#rel-icon>, especially the
note at the end of the section.

*Posssible cause and solution:*

I haven't investigated thoroughly and I'm not very familiar with C, but I'm
guessing that the function tag_handle_link in the file src/html-url.c is
responsible for this issue and is just checking the rel attribute of the
<link> element for case-insensitive equality to "stylesheet" or "shortcut
icon".

A better approach might be to split the rel attribute on ASCII whitespace as
described by the HTML Standard section 4.6.6 (Link types)
<https://html.spec.whatwg.org/multipage/links.html#linkTypes> and then check
each of the resulting tokens for case-insensitive equality to "stylesheet" and
"icon".

An alternative which would probably be less correct and less reliable but may
be easier to implement and still useful would be to check the rel attribute
for case-insensitive equality to "stylesheet", "alternate stylesheet",
"stylesheet alternate", "shortcut icon", and "icon".


*See also:*
HTML Standard section 4.6.6.1 (Link type "alternate")
<https://html.spec.whatwg.org/multipage/links.html#rel-alternate>
HTML Standard section 4.6.6.22 (Link type "stylesheet")
<https://html.spec.whatwg.org/multipage/links.html#link-type-stylesheet>
<link>: The External Resource Link element (MDN)
<https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link>
Link types (MDN)
<https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types>





    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60956>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]