koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] Differently index same field numbers from different re


From: Thomas Dukleth
Subject: Re: [Koha-zebra] Differently index same field numbers from different record types
Date: Fri, 4 Aug 2006 21:58:26 -0000 (GMT)
User-agent: SquirrelMail/1.4.7

In my hurry to obtain a quick answer, I assembled the original question
with excess haste.  The question was not well explained at the beginning
of the message and I left out a possibility which might be an important
part of an efficient solution.  As there has been no answer yet, I here
completely replace the original form of the question with this reorganised
and corrected message.  I also made a minor mistake with the attribution
of a quote from a previous thread from the zebra list, which I correct
here.

I am trying to obtain at least a quick partial answer with enough
information to know how or whether to start designing a schema for an
XML-meta-record containing other related MARCXML records.  If there enough
of an easy answer to point the meta-record schema work in a correct
direction, I would like to have that answer as soon as possible.  If an
answer which will help with other aspects of the question takes a little
longer to think about, then, give an additional more complete answer
later.


1.  INDEXING PROBLEM.

We need to be able to differently index fields with the same field number
from different record types differently.  How can different indexing for
the same field number from different record types be accomplished without
storing the different record types in separate databases?

One example is the case of MARC 21 bibliographic records with 500, general
note, and also MARC 21 authority records with 500, see also from
tracing--personal name.

The records are liable to be in some XML form that may be easy to work
with and helpful for indexing.


1.1.  STANDARD MARCXML.

<record type="Bibliographic">
    <leader>content</leader>
    <controlfield tag="some field">content</controlfield>
    <datafield tag="some field" ind1=" " ind2=" ">
        <subfield code="some subfield">content</subfield>
    </datafield>
    <datafield tag="some other field" ind1=" " ind2=" ">
        <subfield code="some subfield">content</subfield>
    </datafield>
    <datafield tag="500" ind1=" " ind2=" ">

<record type="Authority">
    <leader>content</leader>
   <datafield tag="some field" ind1=" " ind2=" ">
        <subfield code="some subfield">content</subfield>
    </datafield>
    <datafield tag="500" ind1="1" ind2=" ">


1.2.  VARIATIONS ON STANDARD MARCXML.

1.2.1.  SUPPLEMENTARY ATTRIBUTES VARIATION FROM STANDARD MARCXML.

Adding additional attributes to standard elements provides a short
predictable path between the field needing indexing and the point where
record types are distinguished.  Non-standard syntax and recordtype
attributes are added for this possibility.

    <datafield tag="500" ind1=" " ind2=" " syntax="MARC 21
recordtype="Bibliographic">

    <datafield tag="500" ind1="1" ind2=" " syntax="MARC21"
recordtype="Authority">


1.2.2.   CHANGED ELEMENT NAMES FROM STANDARD MARCXML.

Changing element names standard elements provides a short predictable path
between the field needing indexing and the point where record types are
distinguished.  However, this variation requires additional record
transformation before and after using standard MARCXML tools to change
element names to and from standard names.  Non-standard element names
record the record syntax and record type for this possibility.

    <datafield_marc21_bib tag="500" ind1=" " ind2=" ">

    <datafield_marc21_auth tag="500" ind1="1" ind2=" ">


1.3.  XML META-RECORDS.

The records may have a structure like the following simplified possibility.

<collection>
    <bibliographic_record>
        <related_authority_records>
        </related_authority_records>
        <related_holdings_records>
        </related_holdings_records>
    </bibliographic_record>
</collection>


1.3.1  STANDARD MARCXML INSIDE.

Using standard MARCXML in meta-records is much the same as indexing
standard MARCXML alone where the type attribute in the record element is
not part of the datafield element being indexed.  The conclusion should be
the same for this issue as it would be for indexing standard MARCXML
without including it in a meta-record.

1.3.2.  NON-STANDARD MARCXML INSIDE.

Using non-standard MARCXML in meta-records is much the same as indexing
non-standard MARCXML alone where the means of determining record type
would be part of the element being indexed.  The conclusion should be the
same for this issue as it would be for indexing non-standard MARCXML
without including it in a meta-record.


2.  PREVIOUS THREAD FOR SIMILAR ISSUE.

This issue revisits an indexing problem related to the problem which
appeared in the thread "[Zebralist] how to index everything ?"  Perhaps
the answer to this issue would be similar to one of the answers given in
that earlier thread.

Sebastian Hammer wrote:
> Hi Paul,
>
> I don't know if this helps, but if you add the line 'xpath enable' to
> your .abs file, Zebra will build additional index structures to enable
> searches like:
>
> Z> find @attr 1=/*/title someterm
>
> What is supported is a subset of the XPATH spec, but I *think* you can do:
>
> Z> find  @attr 1=/*/address@hidden'245'] someterm
>
> In other words, XPATH-statements are used to select elements for
> searching, as an alternative to numerical USE attributes.
>
> Performance is not quite as good as for the regular indexes, so it's not
> something you want to do a lot in production on a 10M record database...
> but it's fine for smaller applications.

Unlike the issue presented in the earlier thread, this issue requires high
performance.

Marc wrote:
> Hmm. You can speed things up by having a specialized tag index.
>
> something like
>
> xelem /record/datafield/@tag tag
>
> in your abs file.
>
> then you can query something like
>
> Z> find  @and @attr 1=tag '245' @attr 1=/*/datafield/subfield[code='9']
> someterm
>
> to speed things up a bit.
>
>
> You could also define an index for each combination of tag/subfields,
> but that might be an administration nightmare.

Sebastian Hammer wrote:
> That wouldn't work out of the box. But this 'should work' (haven't tried
> it):
>
> Z> find @attr 1=/*/address@hidden'245']/address@hidden'a'] someterm

Maybe we will need an administration nightmare to have the system function
as needed.


3.  DISTINGUISHING MARC RECORD TYPES.

MARC Record type can be distinguished by the value of 000/06 but I am
uncertain that will help properly in all circumstances where we do
actually want to search across multiple record types as part of a
meta-record when the records are related.

Furthermore, there are multiple values for 000/06 for the same major
record type.


4.  MARC CONFLICT EXAMPLES.

I have not inspected well to consider all the cases risking false results
if the record types are not distinguished well.

4.1.  FIXED LENGTH FIELD CASE.

I have always believed that the basic fixed length data elements fields
need local use field analogues with appropriate values to ease searching
because record type and even bibliographic level within a record type
changes the meaning of fixed length data elements.  MARC 21 008 and
UNIMARC 100 have this variance problem.

Supplementary local use fields might be a reasonable choice for solving
other problems in the case of the fixed length data elements fields.


4.2.  A MARC 21 CASE.

I gave a case, in the problem statement for conflicting field use between
MARC 21 bibliographic records with 500, general note, and also MARC 21
authority records with 500, see also from tracing--personal name.


4.3.  A UNIMARC CASE.

One UNIMARC case for conflicting field use would be UNIMARC bibliographic
records with 200, title and statement of responsibility, and also UNIMARC
authorities records with 200, heading--personal name.


5.  ISO 2709 PROBLEM RESTATEMENT.

The immediate need is to know what direction to go in designing an XML
meta-record.

However, I would still be interested in knowing how to index the same
field number differently based on the value in 000/06 or an abstracted
value in a local use field.

An abstracted local use value might be in a local use field such as 01k as
follows.

000 content
001 content
01k ## $a MARC21 $b Bibliographic
100 ## $a content
245 ## $a content $c content
500 ## $a general note

000 content
001 content
01k ## $a MARC21 $b Authority
100 ## $a content
500 ## $a see also from personal name


Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
212-674-3783






reply via email to

[Prev in Thread] Current Thread [Next in Thread]