Re: [Koha-devel] Searching and ILL (from: Searching Group Meeting Notes)

koha-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Koha-devel] Searching and ILL (from: Searching Group Meeting Notes)

From:	Joshua Ferraro
Subject:	Re: [Koha-devel] Searching and ILL (from: Searching Group Meeting Notes)
Date:	Thu Jul 28 05:45:43 2005
User-agent:	Mutt/1.4.1i
Hi all,

Sorry to delay my response to MJ's original message. I hope what I have
to say is worth the wait ...

I've been thinking about the discussions we've had over the past couple
months about Zebra, CQL, SRW/SRU, Google Query Syntax, Opensearch, RSS,
RDF, Metasearching, OAI-PMH etc. and I've organized my thoughts a bit in
the hopes that we can begin planning implementation directions.

I think we all agree on a few things:

  1. Koha needs a query syntax

  2. Koha databases should be open to external sources

  3. Koha's OPAC should have union catalog / metasearching / federated
      searching capability and integrated ILL (inter-library loan)

  4. Koha should allow queries to be syndicated via some form of RSS

Where we get hung up is actual implementation details. So RDF vs RSS 2.0 vs.
Atom; OpenSearch vs. SRW/U vs. OAI-PMH; CQL vs. the still unnamed ':' syntax
adopted by so many search engines and described by MJ in an earlier email.

So here's what I propose:

1. Koha needs a query syntax
A: CQL 'Common Query Language'

"Traditionally, query languages have fallen into two camps: Powerful and 
expressive languages which are not easily readable nor writable by 
non-experts (e.g. SQL, PQF, and XQuery), on one hand; one the other hand, 
simple and intuitive languages not powerful enough to express complex 
concepts (e.g. CCL or google's query language). CQL's goal is to combine 
simplicity and intuitiveness of expression with the richness of Z39.50's 
type-1 query. As any good text based interface, CQL is intended to 'do 
what you mean' for simple, every day queries, while allowing means to 
express complex concepts when necessary."

Examples of CQL queries:

        cat
        title = fish
        title exact fish
        cat or dog
        cat not frog
        a or b and c not d
        "fish food" prox/unit=sentence

More examples at:

http://www.loc.gov/z3950/agency/zing/cql/sample-queries.html

CQL is a mature, well-defined, and easy-to-use syntax for searching
library catalogs and other sources; and support for it comes with 
Zebra automatically. The downside is it's not very widely implemented
(it's new still). I propose that Koha formally adopt CQL as the
default syntax for searching: looking forward it's going to be the
next Z39.50 standard for library catalogs (and hopefully other search
engines as well). 

The main problem I have with going with MJ's suggestion is that we've
not found a well-defined syntax definition. So we're not really sure how
to do thinks like proximity searching or other more complex search
syntax types.

If we can find a well-defined document describing this "googlish"
syntax, it would be trivial to translate that syntax into CQL
so that Koha can support both syntaxes within the main input box.

We should continue to support the 'advanced' search page for allowing
patrons to perform complex queries without having to learn the syntax.

Finally, it's important to remember that although some users will use
the syntax and advanced search, 99% probably won't. But that doesn't
mean that it's not important to have the syntax. There's a (mostly) 
nicely written article in Library Journal that brings up some good
points regarding research and the weaknesses of the keyword method:
http://www.libraryjournal.com/article/CA623006.html

I don't agree with everything there but it's certainly worth some
consideration..

2. Koha databases should be open to external sources

With Zebra, Koha will automatically be open to SRW/U and Z39.50. I
see no harm in also including an OpenSearch gateway (the one I
wrote is basically a OpenSearch->Z39.50 proxy). OpenSearch enables
Koha's catalog to be searchable by A9's OpenSearch portal as well
as other OpenSearch portals out there. So I propose that Koha
support all three of the major standards for record sharing to
maximize the number of clients that can access the database.

3. Koha's OPAC should have union catalog / metasearching / federated
        searching capability and integrated ILL (inter-library loan)

The goal here is to allow Koha maximum flexibility when selecting
sources for searching for the metasearch interface. So we don't want to
limit ourselves to the library world. So while ideologically
OpenSearch may be flawed (RSS vs. RDF), the fact that it's so easy to
implement (when compared to SRW/U for instance) means that lots of sources
have appeared almost overnight. On the other hand, Z39.50 and SRW/U
do allow more targeted searching. But SRW/U is not widely implemented
and Z39.50 is limited to library sources.  So I propose a three layer OPAC
(at lease conceptually): front-end for syntax processing and 
interface design; a proxy to pick the correct protocol to use for
searching; and a series of back-end search services that conform
to the three major query resolvers (SRW/U, Z39.50, OpenSearch).

I also propose that we work together with the PINES project and
possibly Amazon.com to extend the OpenSearch standard to include
ranking, support for ILL, CQL in the query term, etc. Of course,
PINES is open to this (I've been working with Mike Rylander on
OpenSearch for a couple of months now) ... and it seems Amazon.com
may be as well. Here's an excerpt from an email I recieved from 
Amazon.com regarding OpenSearch and SRW/U:

        Thanks for your comments!  We've been speaking with the people over at
        NISO that are responsible for the SRW/SRU specifications.  There is a
        lot of value in there -- and we're definitely interested in making
        OpenSearch a useful tool for as many people as possible.

        In fact, the effort to define OpenSearch 2.0 is already under way.  We
        recently launched a blog over at blog.a9.com, and over the next several
        weeks I will be posting about our plans for future versions of
        OpenSearch and soliciting community involvement.  It would be great if
        you could add your thoughts on the blog when I post about where we'd
        like to see version 2.0 go.

        In fact, I'll be posting later today about OpenSearch 1.1.  This point
        release won't break back-compatibility, so it won't have most of the new
        features that you are referring to, but it is a good starting point for
        discussion.

        I really appreciate your work with OpenSearch, and hope that you don't
        hesitate to contact me directly with any ideas that you have about the
        project.


4. Koha should allow queries to be syndicated via some form of RSS

The best way I can address the RSS 0.9/1.0(RDF) vs. RSS 2.0 is in the context
of MJ's comments ... so here goes:

On Sun, Jun 26, 2005 at 06:46:11PM +0100, MJ Ray wrote:
> = Summary =
> 
> Resource Description Framework is popular with librarians and
> RDF Site Summary is RSS 1, which is not the same as Really
> Simple Syndication (RSS 2).  RDF Site Summary versions are 1.x
> and Really Simple Syndication are 2.x, so many developers go
> for the higher number and never mind the different words. I'm
> surprised it's happened in koha-devel, as RDF is popular with
> librarians and information scientists, who are using it to help
> build the Semantic Web, which is where this talk of distributed
> searching seems to be heading.
> >-> http://www.w3.org/RDF/
> >-> http://purl.org/rss/1.0/
> 
> I think RSS 1 already has solved some of the problems facing us
> if we use opensearch, I think more RDF use could open interesting
> applications for koha and I think RSS 2's namespace problem is
> a pain.
Good summmary ... thanks for that.

> = The Namespace Problem =
> 
> The problem is that the spec RSS 2 says "the elements defined
> in this document are not themselves members of a namespace"
> and while that looks like a really smart idea to simplify
> parsing, it makes a few processes and applications difficult.
> There are these elements, floating around without a namespace,
> disconnected and trying to claim to be the root in any file
> containing RSS 2 elements.
> 
> Basically, imagine writing a large perl system without using
> modules at all, putting it all in the global namespace.  Yes,
> it used to be done and can still be done, but most people don't
> do it any more. Why don't we do it? Isolation. It helps to
> keep things in neat little units, making it easier to test and
> easier to change one with less risk of messing up the others.
> I know we're still not very good at unit testing koha modules,
> but can everyone agree with the general idea it's better we
> use modules than have it all in one big flat namespace?

I grok the Perl analogy and I agree that RSS 2 namespaces aren't
ideal. The problem is that OpenSearch is widely adopted and if
we want to tap into those sources we'll need an OpenSearch 
search and retrieval engine.

> = Problems Already Solved =
> 
> Also interesting for libraries is the availability of the
> Dublin Core metadata elements in an RSS 1.0 main module.
> A lot of the things opensearch is trying to do have already
> been in RDF Site Summary for years, such as returning metadata
> appropriate to search results. Look at the mod_search module -
> what do we need to do that isn't already developed by the
> XML-DEV hackers?
> >-> http://purl.org/DC
> >-> http://purl.org/rss/1.0/modules/search/

We need to tap into the available sources using other standards
instead of just focusing on library-specific search applications.

> = Interesting Applications =
> 
> Almost certainly, ILL is one thing I've not seen yet. I think the
> OpenIll namespace is interesting and should be used, especially
> if we can build bridges to other system developers. I'm not sure
> what should be in there, as I've not done much with ILL. I hope
> that it can be used alongside RDF and maybe be more general for
> it, linking with Dublin Core and other useful namespaces. Is
> that possible?

Absolutely. I think that Koha's metasearching should definitely
support searching of DC and related namespaces.

> = Other Parts of OpenSearch =
> 
> So, if we avoid having OpenSearch Really Simple Syndication in
> the koha's core (use a translator or something loosely coupled),
> that leaves the query and description parts of OpenSearch. I
> wondered whether we can convince other library systems to put a
> <link rel="index" type="application/xml+rdf" href="..." /> tag
> or similar in their page's head. Then configuring an "external
> searches" setting in koha's parameters could be as simple as
> cut-and-pasting or drag-and-dropping URLs, with koha figuring
> out the details from that (actually, we could probably do some
> from a search form... but that's getting far too clever for now).
I think maybe the COinS project may do something like this. Here's
a recent email on the web4lib list:

        A group of us have been working to crystallize a spec for putting
        OpenURL metadata into HTML (following on a paper by Dan Chudnov and
        friends http://www.ariadne.ac.uk/issue43/chudnov/. )

        Ross Singer came up with a catchy name for this: "COinS", short for
        ContextObject in Span. After a bunch of trials, we've declared it
        "stable enough for implementation", and put the spec at
        http://ocoins.info/

        Version 2.0 of our OpenURL Referrer Firefox plugin adds support for
        OpenURL COinS; we hope that soon there will be many other ways that
        COinS can be put to use, as well as many sites that support COinS.
        So far there is an open-access journal, the Wikipedia Book sources
        page, Peter Binkley's Blog and a few static web pages demonstrating
        how it may be used. 

        Eric Hellman

> The main attraction of opensearch seems to be that it lets
> your results appear on A9. I've yet to meet many A9 users:
> do any other search engines use opensearch yet? 
Yes ... lots. Peruse through the 'columns' section of the 
opensearch.a9.com site and you'll see many many search engines
that have adopted the standard. So in my view, the main attraction
of opensearch is that so many search engines have (and will) adopt
it because it's simple to implement and works 'well enough' to get
the job done 99% of the time. The fact that Koha catalogs can show
up in a A9 search is secondary to me.

>Also, given
> their past, have Amazon said that it is patentless? If it
> was through some loose part, it wouldn't be too painful if
> it's unusable by some later.
I'm curious about the patent issues ... I'll contact Amazon and
find out.


So ... I know that was long. I hope you made it this far. Please
give me some feedback. I'm not trying to polarize the discussion
so if you've got points to make please say them and I'll do my
best to understand and then respond ...

Cheers,
-- 
Joshua Ferraro               VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE
President, Technology       migration, training, maintenance, support
LibLime                                Featuring Koha Open-Source ILS
address@hidden |Full Demos at http://liblime.com/koha |1(888)KohaILS
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Koha-devel] Searching and ILL (from: Searching Group Meeting Notes), Joshua Ferraro <=
Prev by Date: Re: [Koha-devel] Website / ID Progress
Next by Date: [Koha-devel] new "letter" system in HEAD
Previous by thread: [Koha-devel] Website / ID Progress
Next by thread: [Koha-devel] new "letter" system in HEAD
Index(es):
- Date
- Thread