Internet subject gateways knowledge organisation systems evaluation.

M.a Jesús García Mateu, *T. Sánchez Sanz
Biblioteca de Ciencias de la Salud. Universitat de València
*Area de Biblioteca Docente. Universidad Politécnica de Valencia
Avda. Blasco Ibanez, 17
46010 Valencia (Spain)
mjgarcim@uv.es, marsanch@bib.upv.es


Click to see the Web sites used in the study


1. Objectives and methodology
2. Background
3. Subject gateways definition and objectives
4. The role of classification in a subject gateway
5. Evaluable issues
5.1. Classification scheme basic elements
5.2. Semantic issues
5.3. Use of the classification by the classificator
5.4. User's tools
5.5. Updating
6. Conclusions
7. Bibliography
8. Appendix: web sites
 

1. Objectives and Methodology

Subject gateways are those value-added services which select, evaluate, catalogue and classify Internet resources with the aim to help users in the information retrieval, being a very useful resource for the respective user communities.

This work aims to study the use of classification schemes done by these services underlying the issues that improve their quality, increase their ergonomy and help information to be more accessible.

We pretend to analyse how a medium like Internet is allowing to take advantage of the traditional classification schemes and to avoid problems related to them. It is also permitting the appearance of new classification schemes, consequence from a new medium and a new vision of the world.

The point of departure has been a bibliographical revision. Although there is enough information, specially institutional, about subject gateways, few studies have been done about the classification scheme they use. Some services in the net have been selected. They meet certain requirements in order to be considered subject gateways and they are interesting for their subject or their classification scheme. Next, a detailed analysis of this set of subject gateways have been done. We focused on the use gateways do of a chosen scheme in order to classify the resources they select. Issues considered evaluables have been grouped attending to basic elements of the classification, the use by the classificator or the tools they offer the users.

What is the reason for evaluating? On one hand, we need evaluation when we have to choose a classification scheme in order to start one of these services. On the other hand, it can be useful to study gateways possitive and negative features when we try to improve our own existing service.
 

2. Background

One of the first works whose aim is to study the classification schemes used in the Internet is Gerry McKiernan's web page "Beyond Bookmarks" (McKiernan, 1996). As he librarian of the Iowa State University Library, compiles and mantains a list of web sites which have applied or adopted standard classification schemes or controlled vocabularies to provide access to Internet resources. Ordered by type of classification scheme, this list contains some subject gateways and some other services in the net.

This work is the point of departure of the American Library Association SAC subcommitte on Metadata and Classification report about the application of classification schemes as metadata for digital resources. Only DDC, LCC and NLM are studied.

In most of the sites studied, classifications are only used as an access structure to the resources and not as an element of metadata in the description of the resources. From 1998, they work in a questionnaire which includes the main characteristics that should be evaluated. The answers to the definitive version were presented to the 1999 American Library Asssociation Annual Conference (Beall, 1999).

On the other hand, in 1997 the European Union project DESIRE I started. DESIRE I (Development of a European Service for Information on Research and Education), in the Telematic Applications Programme aims to improve information discovery and retrieval for European researchers. One of its main points is to help the creation of subject gateways.

As a part of this project the D3.2.3 The role of classification schemes in Internet resources description and discovery report (Koch, 1997) describes positive aspects of the resource classification for an Internet subject gateway and analyzes the advantages and disadvantages of the different kinds of classification schemes.

In 1999 the DESIRE Information Gateways Handbook (1999-2000) came out. One chapter is dedicated to subject classification, browsing and searching. Advantages and disadvantages of the use of a classification scheme to organize web resources are studied and some advices are given to choose the most appropiate scheme for a service and to put this service in work.

In Spain some universities are collaborating in a new project. DARWIN (Directorio Analítico de Recursos Web Informativos) pretends helping in the web resource organization and evaluation. An adapted version of the UDC is used to classify the evaluated resources (Merlo Vega, 1999).

The last works are focused on the improvement of the interoperability between gateways (cross-searching and cross-browsing) and on the automation of some processes (automatic indexes and classification schemes production).


3. Subject Gateways Definition and Objectives

Subject gateways can be defined (DESIRE, 1999-2000) as quality controlled information services, produced by a public or private organism, that operate through the Web, and that work on the localization, selection, and description of Internet based resources, following a methodology and established quality criteria. The aim is to facilitate information search and recovery to their users, generally linked to the academic world.

The following characteristics have been marked as distinctive of a subject gateway face to simple collections of links:

  • It is an online service that provides links to web sites or documents in Internet. 
  • The selection is done by an intellectual process according to a quality policy within the audience information needs. 
  • It produces subject descriptions since each item is catalogued in old or minor depth. 
  • The order or structure follows a scheme or classification that facilitates navigation. 
  • Each resource is assigned a series of metadata.
All these operations are of intellectual sort and a human team, usually integrated by librarians or by specialists in the topic, carries them out by hand (Koch, 2000).

The main objective of subject gateways is to provide fast and effective access to their users to high quality information through Internet. From the beginning, subject gateways have been linked to the academic world, which nowadays continues to be their principal producer, and they are tools of much utility for students, researchers, educational, and for the public in general interested in accessing general and specialised information.
 

4. The role of classification in a subject gateway

Classification in a subject gateway is one of the central elements of the system, since one of the main functions of the gateway is to facilitate the access to the collection. Usually this becomes through two means, search and browsing. The second option is based on the use by the gateway of a classification scheme in order to organize the information hierarchically, and in order to provide a method of recovering information.

The advantages of the use of a classification scheme in the web (Koch, 1997) are the following:

  • As an aid to browsing, it is especially useful for users that are non experienced or familiarized with the subject.
  • Due to their hierarchical structure, they allow broadening or narrowing a search, improving the precision or increasing the recall.
  • The terms appear in a context that allows to solve problems of homonyms, and broaden the search with associate terms.
  • It permits multilingual access to the collection, in some cases and conditions. This is carried out through notation and indexes in several languages that could permit the multilingual access to the same resources without making important changes in the collection.
  • The use of a thoroughly well-known classification, which is an advantage, facilitates cross searching and cross browsing. Moreover, it is subjected to a continuous process of revision and updating.

Koch also points out some disadvantages that are associated to the classification and that are independent of the medium. He highlights the arbitrary division of the logical collections of material,that can be solved with a good system of crossed the illogical division of classes, that creates difficulties when using the scheme for browsing purposes, and the slow assimilation of new materials overalls in those established classifications whose updating is done by complicated organisms and committees.

According to the ALA (Beall, 1999)classification schemes have the following main functions:

  • Location 
  • Browsing 
  • Hierarchical movement 
  • Retrieval 
  • Identification 
  • Limiting/partitioning 
  • Profiling
Subject gateways use the following classification schemes:
  • Universal classification systems. They are very diffused, they cover all areas of knowledge, they allow to browse and to make cross-searches between collections and services with different subjects. They have a good potential in order to favor multilingual access to the collections through the notation and because the majority has been thoroughly translated. Another advantage is thatthese classifications are in electronic format and there are tools wich organize web sites with these classifications. 
  • National general schemes 
  • Subject specific schemes: they give a structure and a terminology which are nearer to their subject and are more easily updated than the previous classification scheme. 
  • Home-grown schemes: they are made for a concrete Internet service and therefore they are completely adapted to this medium. They are flexible and easily updated.


5. Evaluable issue

Before begining with the evaluation we should place the subject gateway in its context, specifing name, URL, date, browser used, version, classification name,version, subject covered and target audience. It is very important to explain how the classification has been adapted or developed and how it works.

5.1. Classification scheme basic elements

5.1.1. Classes and subclasses division

First of all, we have considered how subject gateways show classes and subclasses divisions.

Gateways try to persuade users to search by browsing. Therefore, it is essential for the user to see main classes when entering the web site. That is the reason why we can usually find on the first page the enumeration of classification main classes. This does not happen in OMNI or Mednets, maybe due to the high number of main classes. The whole list wouldn't fit in a screen, so they adopt other solutions and show information about the service or news in the main page, putting the search option in a frame.

There is a relationship between number of main classes and number of subclasses. A classification with few main classes presents several sublevels: for example, Yahoo! can arrive down to seven levels. In contrast, OMNI presents 84 main classes that lead directly to the results. The same happens with Mednets (48 main classes) which has no hierarchy.

The subject gateway scope must be clear from the first moment, in order to permit the user to decide if the service is useful for him. In the services analyzed, the scope is clear through its main classes. In addition, some selected subclasses are also placed in the first page reinforcing the meaning of the upper class.

That is useful in the one hand to provide direct access to the resources and on the other hand to delimit better the subject. Yahoo!, MedicalMatrix, BUBL Link or SOSIG indicate with dots that there are more subcategories other than those shown. However, none of them explain the criteria used ¡n the selection of these subcategories: alphabetic order, more resources contained, more frequently accessed...

Some gateways mix at the same page categories from different levels. This occurs in Mednets or in BUBL Link. Emphasizing these subcategories facilitates the access to them because of the great number of resources they contain, the importance of the subject they cover or the frequency of the users visits.

5.1.2. Notation

Notation is an important element in classification schemes. However its traditional ubication function has been broadened in the new medium.

Presence of notation has some advantages. It allows faster searches for advanced users, facilitates searches whatever the language of the gateway (multilingual searches), helps discovering resources from one of interest and avoids translation of the scheme to other languages.

However, most of the services don't use this powerful resource. Only BUBL Link show the notation and allows searching in it. It could be explained because even in libraries notations are badly known by users. It is possible that gateways responsables have thought it is not absolutely necessary.

BUBL Link uses Dewey classification and notation helps a lot. Categories belonging to different levels are shown on the same screen: notation indicates the level in the hierarchy. This gateway may use the notation this way because its responsables come from the library world and its users have good knowledge of classification schemes.

Dewey classification is not the option BUBL Link offers by default. Its meaning and use are not explained. This could be useful for the users. However, search by notation and combination with other search options are allowed, even classification number truncation.

5.1.3. Captions

Whatever the subject of the gateway is, it is important to show the labels of the categories. They must be presented complete and not truncated, and must be clear. That happens on hundred per cent of the cases studied.

It is essential to display the category with all the labels of its sublevels, without having to use the backwards button. This eliminates ambiguous meanings and focuses the subcategory on its context. Good examples are Yahoo!, Biz/Ed and SOSIG. In gateways like OMNI or Mednets is not necessary because they don't have hierachical levels.

In this sense, it could be suitable to add some explanation about the content of the categories and subcategories as Argus Clearinghouse does.

Another good possibility is searching by words within the captions. It is not an option taken into account by all the services but Yahoo! offers it. It is useful not only to find the category we are looking for but also to see if a concept is included in several categories and can be searched from different points of view.

5.2. Semantic issues

5.2.1. Indexes and vocabularies

Information search rests not only on classification schemes but also on indexes and controlled vocabularies. It is a very neglected issue by the majority of gateways. Indexes of classifications are available in OMNI (Mesh) and BUBL Link (LCSH).

In OMNI a very correct solution is adopted: there are two different search ways, by NLM classification and by Mesh terms. Both can be browsed hierarchically and associatively.

In BUBL Link, in the "Subject menu" option we can use the LCC subject headings. However, there is no direct link between vocabulary and classification. Moreover, in BUBL Link, classification can be used in combination with other subject metadata.

5.2.2. Cross references

Cross references are links to categories belonging to different hierarchies. It is a solution to the illogical division of material characteristic of some classifications and a feature favoured by the hypertextual medium. This resource must be clearly indicated.

In Biz/Ed it is shown with the epigraph "Related sections" and in Argus Clearinghouse as "Cross-listed under..." On the part of Yahoo! it is indicated with the "@" symbol when a subcategory is under more than one category.

5.3. Use of the classification by the classificator

5.3.1. Specificity

Specificity is quite related to the objectives of the gateway, its audience and the size of its collection. It is hard to measure.

The high grade of specificity has been found in Yahoo!, one of the gateways which contains more resources. It is obvious that if it wasn't so specific, browsing through such quantity of information would be difficult. A high degree of specificity involves more hirarchical levels in browsing.

Subject gateways can adapt specificity to their needs: as the number of evaluated and classified resources growths, new categories and subcategories can be added. In the case of existing schemes (as in OMNI with the NLM classification) there are categories not used, but when the number of resources of the subject in question is significant, these categories are brought into service.

5.3.2. Assignment

Assignment makes reference to the possibility for a resource to be assigned to several different classes. Hypertextual medium is better than paper in this aspect and it is one of the most advantageous issues of using a classification in an Internet service.

Multiple assignation is another way to avoid illogical division of the collection inherent to the use of some classification schemes.

Quite often, the only indication found about this point is the gateway policy, the explanations of the method for suggering or correcting documents. We find it in MedMatrix and Yahoo! in the questionnaire to suggest new resources.

5.3.3. Consistency

In order to evaluate consistency, we need to study if similar items are assigned to the same categories, which involves a very detailed examination. In part, it depends on the gateway policy and its staff since many of them incorporate resources proposed by users and classification proposed by them will influence consistency.

Another factor to take into account is if main classes are balanced in the number of resources  contained. In MedicalMatrix there is a great difference betwen the "Specialites" category (3.328 resources) and the rest of categories. On the contrary, in BUBL, most of the subclasses are balanced and contains about 10 or 15 resources.

5.4. User's tools

The aim of this epigraph is to consider some issues that influence the use of the gateway by the users and whose objective is helping them to find information in an effective way.

5.4.1. Language

Internet documents and services can be used by people from all over the world. Therefore, we must study if the gateway offers multilingual access through ranslated labels or indexes: most of the services analyzed don't. Even notation, which could improve this point, is not present, as we have seen before.

In Yahoo!, inside each category or subcategory, we can access to the same class in another server with a different language. It is a tool that aims to solve the multilingual acess to the collection.

5.4.2. Interface

Interface is the intermediary between the user and the service. Gateways seems to have taken great care over this issue. Design uses to be simple, practical, without graphics and very similar among the different services.

However, we don't find information about the classification scheme in the gateways studied. Perhaps it is not an important issue for the users, since the medium makes the classification scheme usable without difficulties through the labels and the hypertext. Sometimes, the users doesn't even know which is the classification employed.

Suggestions and feedback mechanisms are essential to the subject gateway functionning. All of them without exception provide this mechanism since it is an unsurpassable way to obtain new resources, correct mistakes and modify the classification of an existing resource. In some cases, the users' opinions can come from experts and can be very valuable.

5.4.3. Results order

The most frequent result order is alphabetical, although numerous gateways also provide a presentation by resource type. In MedMatrix we find in addition a quality order.

The quality criteria suppose a very interesting issue when ordering result. It is specially recommended for services collecting a great deal of resources since they would probably offer numerous results. In these cases, it is useful for the user to know which of them have been considered better and find them first.

If the search result is high, a typological order is also practical. Given the variety of resources in the net, this kind of classification helps the user to choose directly the type of resource he is looking for. SOSIG and MedicalMatrix offer this order in their results.

5.4.4. Records

The point is to study the number of records in each class and the information given about them. It is an interesting possibility for the user to know "a priori" the number of results in a category or subcategory, so he can anticipate what will be found.

In OMNI and EVVL the number of ocurrences appear between parentheses after the label; in ADAM it appears at the heading of the page. Both are good solutions.

It is important to analyze the complete display of the record because it is the subject gateway distinguishing feature face to other internet services that just list links. We have to pay attention to whether this complete display is given and the catalographic elements it has: title, resposibility mention, short description or abstract, classification, keywords, date of last updating, date of incorporation to the gateway, URL... We can consider as essential title, short description or abstract and URL. There are great differences among services at this point: some only include title and one line of explanation; others give a lot of details.

5.4.5. Hierarchical display

IIn most of the gateways, hierarchical relationships are shown explicitly. However, from the range of possibilities offered by the hypertextual medium, only few of them are used. Browsing tools are employed with little power.

When hierarchy is indicated, it is done by indented labels. No examples of tree diagrams or other resources have been found.

Different levels in the hierarchy involve passing from one screen to another one with more specific infomation; but, if you do that several times, context is absolutely lost.

It is an improvable issue. Several levels are not usually shown, nor the user is allowed to interact with the presentation changing the number of upper or lower levels he wants to see.

The care designers take on pointing out the way followed in the search will definetly influence navigation and orientation.

5.4.6. Navigation and orientation

In this section we try to study how easy is navigation in the services analyzed. Some gateways indicate clearly the path followed from the begining of the search (Yahoo!) and navigation and orientation are improved. However, much use of the browser tools have to be done.

Most gateways use frames with menus and they usually offer a simple design which improves these features. An important element in order to make orientation better is that the selected resource is opened in another page; so, one can always consult again the gateway page. Other advantageous issues for navigation and orientation are indication of visited links (changing the colour is enough), new or revised labels, no banners or annoying elements, etc.

5.5. Updating

It is important to know what version of the classification is being used and the frequency of updating or incorporation of new categories. Few agencies give information on this topic. Depending on the type of classification its updating is more or less easy. Home-grown schemes are more easily updated than the traditional classifications that rely on many organisms of revision. It is convenient that a classification can be rapidly and easily updated and that this doesn't suppose the reclassifiction of resources already in the system.

The medium favors the frequency in the updating. Subject gateways modify the classifications when there is a significant number of documents which allow to open a new subcategory. In Yahoo, the own navigators suggest the categories when they propose the inclusion of a new resource.
 

6. Conclusions

Subject gateways are quality services which have proliferated in Internet, even in specific areas of knowledge.

The classification occupies a fundamental place in subject gateways since they use it fundamentally to organize information by subject and facilitate the recovery by means of browsing.

The evaluation is necessary in order to choose a classification scheme that suits to our users and adapts to the collection and in order to improve our own service.

Classification schemes used by subject gateways can be traditional or home-grown. In the first case, they are almost always adapted to the gateway necessities while the new schemes are more flexible and reflect better the vision of the user.

Internet is a perfect medium for the use of classifications if the possibilities offered by hypertext are exploited. In the studied services updating is rapid and simple; the concept of assignment is changing, being now a much more dynamic element and specificity is favored for the own structure of the hypertext.

However, it seems that in some cases hypertext advantages have not been totally exploited. An element like notation is not used completely. Its wider use would permit the access without linguistic barriers to the collections. Regarding the hierarchical display of the categories, all browsing tools characteristic of hypertext are not used. Lastly, result order could also be improved: the presentation by type of resource or by ranking of evaluation would be quite useful.

Subject gateways have a short period of existence. All of them are in continuous evolution and improvement. They are very useful services, so it is essential that we make them known to our users and we favor their use in the search of information.
 

7. Bibliography

- BEALL, J., MASSEY, K., MICHEL, D., WHITED, M., WILSON, MD. ALCES CCS Subject Analysis Comittee, Subcommittee on metadata and classification. Final report, [on line] ALA Annual, 1999
http://www.ala.org/alcts/organization/ccs/sac/metaclassfinal.pdf [Consultation: 25th May 2000]
- BELCHER, Martin, PLACE, Emma, CONOLE, Grainne. Quality assurance in subject gateways: creating high quality portals on the Internet. Quality Assurance in Education, 2000, v. 8, n. 1, p. 38-47
- BRÜMMER, Anna. Subject based information gateways: an introduction [on line]. Internet'98. 1998
http://ix.db.dk/inettema/rapport98/03_ab.htm [Consultation 14th December de 2000]
- DESIRE. DESIRE Information gateways handbook [on line], 1999-2000. http://www.desire.org/handbook/ [Consultation 29th May 2000]
- HEERY, Rachel. Information gateways: collaboration on content. Online Information Review, 2000, v. 24, n. 1
- KIRRIEMUIR, John, BRICKLEY, Dan, WELSH, Susan, KNIGHT, Jon, HAMILTON, Martin. Cross-searching subject gateways: the query routing and forward knowledge approach [on line]. D-Lib Magazine, 1998, en. http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/dlib/january98/01kirriemuir.html [Consultation: 14th December 2000]
- KOCH, Traugott Automatic classification. DESIRE II workplan WP D3.6ª [on line], 1999.
http://www.lub.lu.se/tk/desire2/desire2-autoclass-plan.html [Consultation: 18th December 2000]
- KOCH, Traugott, DAY, Michel, HIOM, Debra PEEREBOOM, Marianne, POULTER, Alan, WORSFOLD, Emma. The role of classification schemes in Internet resource description and discovery [on line]. 1997
http://www.lub.lu.se/desire/radar/reports/D3.2.3 [Consultation: 14th December 2000]
- KOCH, Traugott. Quality controlled subject gateways. Online Information Review, 2000, v. 24, n. 1, p. 24-34
- KOCH, Traugott. Quality controlled subject services [on line].
http://www.lub.lu.se/tk/SBIG-definition.txt [Consultation: 26th April 2001]
- MACKIE, Morag, BURTON, Paul F. The use and effectiveness of the eLib subject gateways: a preliminary investigation. Programm, 1999, oct., v. 33, n. 4, p. 327-337
- McKIERNAN, Gerry. Beyond bookmarks: schemes for organizing the Web [on line], 1996.
http://www.iastate,edu/~CYBERSTACKS/CTW.htm [Consultation: 18th December 2000]
- McKIERNAN, Gerry. Hand-made in Iowa: organizing the Web along the Lincoln Highway [on line]. D-Lib Magazine, 1997, feb.
http://www.dlib.org/dlib/february97/02mckiernan.html [Consultation: 18th December 2000]
- MERLO VEGA, JA., GRACIA ARMENDÁIZ, J., ZAPICO ALONSO, FF., RODRIGUEZ GAIRÍN, JM. DARWIN: Una propuesta de organización y evaluación del conocimiento accesible en línea. En ISKO (4.1999. Granada) La representación y la organización del conocimiento en sus distintas perspectivas: su influencia en la recuperación de la información. Actas del IV Congreso ISKO-España EOCONSID'99, 22 - 24 de abril de 1999, Granada.
- PLACE, Emma. International collaboration on Internet subject gateways. IFLA Journal, 2000, v. 26, n. 1, p. 52-56
- UKOLN METADATA GROUP. Selection criteria for quality controlled information gateways [on line], 1998
http://www.ukoln.ac.uk/metadata/desire/quality/toc.html [Consultation: 26th May 2000]
- VIZINE-GOETZ, Diane. Online classification: implications for classifying and document (-like object) retrieval [on line]. En International ISKO Conference (4. 1996. Washington). Knowledge organization and change: proceeding of the 4th International ISKO Conference, 1996
http://orc.rsch.oclc.org:6109/dvgisko.html [Consultation: 18th December 2000]
- VIZINE-GOETZ, Diane. Using library classification schemes for Internet resources [on line]. En OCLC Internet Cataloguing Project Colloquium, [s.a.].
http://www.oclc.org/oclc/man/colloq/v-g.html [Consultation: 18th December 2000]
- XIE, H. Web browsing: current and desired capabilities. En National Online Meeting (20.1999. New York) 20th Annual National Online Meeting: proceedings 1999. Nueva York: Information Today, 1999, pp. 523-537