Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)

koha-zebra

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)

From:	Sebastian Hammer
Subject:	Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)
Date:	Wed, 22 Mar 2006 22:43:40 -0500
User-agent:	Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Joshua Ferraro wrote:

On Wed, Mar 22, 2006 at 08:28:26PM -0500, Sebastian Hammer wrote:
Can't do XOR today. I suppose it would be a possible new feature, butI've frankly never heard of it in an ILS.. can a XOR b be mapped to
(a OR b) NOT (a AND b) ? or am I just showing my fading math skills toill effect, here?
Yep, that's the correct mapping. Voyager's where NPL originally
saw the XOR function.

Ok. It can be faked in the front-end then, or implemented deeper in theguts of Zebra.

Why do you see yourelf limited to Bib-1? Within Koha, you can dowhatever you want -- specifically extend Bib-1 into the 8000-range(IIRC) for local USE attributes or define a private set.
Right, I was just hoping there was some way to map it to bib-1 as
I assume that would be useful in cross-domain searching. If not we
can certainly do a locally defined attribute or set.

I think beyond what's in the Bath profile or the US national profile,you have little hope of interoperable search.. in my experience,cross-domain searching still entails the need to do query-mappingindependently per target or for groups of targets with similarcharacteristics. I use the CCL parser that's available through the YAZZOOM API, and include a reference to a set of mapping directives as partof the configuration for each target.. that allows you to get pretty fartowards an interoperable-feeling search with a minimum of code.

This would, I believe, require new development. It's possible that oneof the experimental ranking algorithms that are included might providebetter results for these people, but I *think* that boosting the scorefor one field in a ranked keyword search would require an extension tothe index structure.
I've looked high and low for documentation on the ranking algorithms in
Zebra but haven't found much more than a few sentences in the official
docs and some list messages ...

It isn't documented beyond what's in the code, AFAIK.

AUTHOR SEARCHING

Again, the current relevance ranking doesn't quite cut it. A good
example is a relevance ranked author search on "James Joyce". Some
records sneak into high relevance because they have multiple authors
with names like "James Henry" and "Paul Joyce" (take  "Bob the Builder
in the NPL database as an example
It might be worth checking whether one of the custom ranking algos didbetter on this..you an look in the NEWS file for instructions on how toenable them.
Will do.
relevance ranking
should account for proximity and use that as the highest ranking
consideration to ensure that a search on "James Joyce" returns all the
books by "James Joyce" first. Also, they requested that the default
ranking secondarily sort the items by date as well because they oftenare asked to find the 'latest' book by so and so. We concluded thatthe copyright date stored in the 008 is probably the only datenormalized enough to use for sorting though I'm not sure if zebra canuse that for sorting.
It could with the XSLT index rules of Zebra 1.4.
Cool, and are there docs on that somewhere? :-)

There will be by the time Zebra 1.4 is released. For now, it'spre-release stuff. However, the CVS version of Zebra contains an examplesetup under examples/alvis-oai/conf. I think for really gnarly indexingschemes, this is probably the wave of the future, since it's pretty muchinfinitely flexible. It should also be pretty easy to perl-map one ofthe existing ABS files into this format.

Same thing. I don't know how hard it would be to add a score forproximity.. that data is at least in the index structure, but I've noidea how hard it would be to fit into the code. We can ask the Zebrawranglers what it would entail if you're interested.
Yes, please do, we're very interested in that particular one.

Ok.

SUBJECT HEADING SEARCH

NPL would like to see a demonstration of a 'Subject Heading' search
using authorities generated from the data to compile a list of
authoritative headings (which would be compiled from multiple fields
within a given subject tag such as $650$a$v$x, etc.). So I thinkto do this right we'd need to look at putting our authority records
in Zebra as well.
Hmm. Not sure I fully grok the requirement here.. you seem to suggestboth constructing a specific index key based on a concatenation ofmultiple fields (easy in the XSLT indexing rules of 1.4, not compatiblewith the 'melm' directive.
I'm unclear about the differences between 'elm' and 'melm'. The docs
seem to indicate that they are the same...

They are actually described as being quite different, but I can see howthe nature of the difference could be more clear.

The 'elm' directive is the original.. it's parameter structure is basedon the way that Z39.50 abstract record models were typically representedin the old days.. hence the weird ordering of elements, etc. It also hasthe limitation that you can't address attributes, because the old Z39.50record model didn't have attributes. The xelm directive was introducedto fix that.. it allows you to express tag paths in the XPATH style, andto address attributes, either in [predicates] or directly, for indexing.

The usmarc.abs file that comes with Zebra assumes that records wereingested in ISO2709 using the record type grs.marc.<absfilename>. Thegrs.marc input filter actually generates an internal abstract structurewhich is incompatible with MARCXML.. it looks more like<245><11><a>content</a></11></245>. When MARCXML came along it becameclear that it'd be nicer to work with that.. so the grs.marcxml inputfilter was introduced to parse ISO2709 and map them internally toMARCXML. Of course, if you're starting with MARCXML, you can just usegrs.xml with the same effect.

But now the old usmarc.abs file won't work anymore, because MARCXML isall about attributes for field names and subfield codes, and the 'elm'directive can't handle that... in fact, to index 245$a, you'd have towrite something like


xelm /*/address@hidden/address@hidden     title

At some point, we got a bit of money from the LoC to develop a simpleset of Bath level 0 indexing rules for Zebra.. I started working onthat, but got so fed up with the syntax above that I rebelled andimplemented the 'melm' directive (and it takes a lot for me to touch theinnards of Zebra, in my old days), so instead of the above, I could write


melm 245$a  title

Which is totally equivalent to the above, but nice and to the point..however, none of these mechanisms allows you to construct phrase indexesthat span multiple subfields.. and they don't allow you to do cool stufflike extract a date from the guts of 008... in fact, there are lots ofsituations where you'd like to do some form of massaging on the inputbefore processing. In the past, I would sometimes translate MARC recordsto an ASCII-line based format, and use the magic of the regexp inputfilters (http://www.indexdata.com/zebra/doc/record-model.tkl#id2530050)to massage the data at index/retrieval time... because I can write Tclcode in the input filters to do stuff to the data, the sky is thelimit.. but, because I have to write Tcl code to accomplish anything, Ibecome sad and gray-haired. So when I build applications on Zebra thesedays, I am more likely to do some form of preprocessing of the recordsin Perl or similar BEFORE feeding them to Zebra.. not very satisfying,but it brings home the bacon.

Well, in Zebra 1.4, XSLT comes to the rescue, in a way that only XSLTcan do it, with lots of angular brackets and much verbosity.... forinstance, in an XSLT index filter,


melm 245$a title:w

becomes

<xsl:templatematch="marc:record/marc:address@hidden'245']/marc:address@hidden'a']">

 <z:index name="title"type="w">
   <xsl:value-of select="."/>
 </z:index>
</xsl:template>

Eek.

But of course the magic of that is that you could put just aboutanything you could possibly imagine instead of that simple<xsl:value-of> in the middle... using substr() to extract a date from008, a code from the leader, combining subfields, doing math, lookingstuff up in supporting tables, etc... the sky is the limit, and I'dprefer this to programming in Tcl anytime. And of course, if you want amore compact configuration file, you could write something like


<koha:melm field="245$a" index="title:w"/>

and use XSLT to map that into the diatribe above before sending it toZebra.. we might even offer some options like that as part of thesoftware down the road. In addition to the stylesheet which maps recordsto 'index documents' like above, Zebra 1.4 can be configured to supportmultiple retrieval schemas (i.e. DC, MODS, MARCXML), simply by providingstylesheets for each desired schema -- the translation is done on thefly when records are retrieved.


--Sebastian

Thanks!


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-zebra] Koha Zebra Searching Report (from NPL), Joshua Ferraro, 2006/03/22
- Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Joshua Ferraro, 2006/03/22
- Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Sebastian Hammer, 2006/03/22
  - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Joshua Ferraro, 2006/03/22
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Sebastian Hammer <=
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Mike Taylor, 2006/03/27
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Sebastian Hammer, 2006/03/23
  - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Joshua Ferraro, 2006/03/27
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Sebastian Hammer, 2006/03/27
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Mike Taylor, 2006/03/28
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Chris Cormack, 2006/03/28
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Mike Taylor, 2006/03/29
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Adam Dickmeiss, 2006/03/29
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Mike Taylor, 2006/03/29
    - Re: [Koha-zebra] Koha Zebra Searching Report (from NPL), Joshua Ferraro, 2006/03/29

Prev by Date: Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)
Next by Date: Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)
Previous by thread: Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)
Next by thread: Re: [Koha-zebra] Koha Zebra Searching Report (from NPL)
Index(es):
- Date
- Thread