bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33844: Rename ghc-pandoc to pandoc


From: zimoun
Subject: bug#33844: Rename ghc-pandoc to pandoc
Date: Thu, 27 Feb 2020 14:10:15 +0100

Hi Mike,

On Thu, 27 Feb 2020 at 02:23, Mike Gerwitz <address@hidden> wrote:

> Ah, for the record, I had searched for pandoc using `guix package -s
> pandoc` in the past and didn't find what I was looking for, and so fell
> back to a Debian system.  It turns out what I wanted was ghc-pandoc
> after all.

Thank you for pointing the issue.

My remark is *not* about the rename which seems fine. For the very
same reason than the "git-annex" software is named 'git-annex' and not
'ghc-git-annex'.


Well, your comment is pointing: a) that the description is badly
written and b) the 'relevance' score is too rough.

The command "guix search pandoc" returns as the highest ranked
package: ghc-pandoc-citeproc with the relevance score of 17. The
package of interest 'ghc-pandoc' appears at the 6th position with a
relevance score of 8. (And after emacs-pandoc-mode, ghc-pandoc-types,
emacs-ox-pandoc and python-pandocfilters; well less relevant packages,
IMO.)
Why? Because the number of occurrences of the term 'pandoc' in
synopsis+description+name.
ghc-pandoc-citeproc: 1+5+1
ghc-pandoc: 0+2+1

To be precise, the score uses weights and so it reads:

ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
ghc-pandoc: 3*0 + 2*2 + 4*1 = 8

And the rename bumps the score because there is an additional weight
(5) for exact match (which normally happens only for the 'name'
field).

ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
pandoc: 3*0 + 2*2 + 4*1*5 = 24

It apparently fixes the issue and now the package named 'pandoc' will
show up first. But it is an artefact because it is easy* to find other
weights that invalidate this expected ranking; and the current weights
are a working rule of thumbs but not deeply thought, AFAIK.


*For example instead of 5, let choose 2, then the score becomes:
3*0+2*2+4*1*2=12 which is less than 17. Well, not so easy because 2 is
the same as 'description' and it seems less natural; i.e., it appears
more natural to have a high weight for an exact match. But the point
is: it is possible to find another working rule of thumb which will
not return the expected result for all the packages.


The real problem is not the non-obvious name (ghc-pandoc instead of
simply pandoc) but it is: a) some descriptions are badly written and
b) the 'relevance' scoring function is not enough "smart" to detect
them.



All the best,
simon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]