Saturday, February 10, 2007

Creating a Metadata Schema for the MSIM iSchool Website at the University of Washington

Approach
When considering administration of the University of Washington’s iSchool MSIM website, and what metadata might be helpful to do this, I considered human issues first and then the larger scope of the organizations needs, the users needs, and lastly system’s or machine level needs. My questions were “what are the needs of a website administrator”, “what is this organization and what are their needs”, and “what kind of content I am working with”, as well as “how may information be accessed and why”.

An alternative approach is to view metadata as just another level of information in the metadata stack, one which can aid the site administrator, one which can be automated using systems, and one which can be accessed using interoperable systems.

Starting with the human element first - the website administrator acts as a consultant and advisor, in effect a domain expert working to improve the interface between users and their needs, the organization and it’s needs, and what computers can do, balanced with the costs. However, administrators also actually have their own needs when handling data so I focused on that person’s real needs first and foremost.

Considering which elements are most important to include in any metadata schema, I first thought about the human input and process issues. “How much time could a website administrator reasonably be expected to spend inputting this kind of data into pages,” was my baseline question. Even if you have tools to perform this function, adding data adds cost to publication. This question lead me to conclude that “Administrators have little time to add metadata,” no matter how passionately they feel about the semantic web, or about the possible eventual use of such data.

What are the organization’s needs? The needs of institutions are many and varied whether the University of Washington or other institutions. I came to the following conclusions:

The similarities of large educational organizations needs can be summed up in one paragraph – they need three major things:

1. Educational resources, Books, images, and other stored or linked Materials

2. Knowledgeable speakers, Teachers or other Knowledge holders

3. Support from the community to maintain these resources, Funds

But in order for the combination of these to survive and thrive they also require a fourth –

4. Communication so that the educators and students, as well as other relationships, other teachers, donors, students in distance learning programs, etc can be built by finding, influencing, and interacting with each other.

Marketing arises as a delta communication and support, it may be described as “communication to increase support” - this means knowing where hits are coming from and to what resource. (See more about this subject in section two on Impact.)

The Meta data elements and controls included are based on the minimum number of requirements for an Administrator to bring those things together literally on the same page. They answer several questions shown in Table 1 below, What, When, Who, Where, and How Valuable? Providing the answers to these questions in the metadata would support the needs of the Administrator. They also include questions to provide effective feedback into the decision making process to rework the site structure, and as a process of inquiry to help decide if the metadata elements need modification.

Exploring the various existing schemes on the Metamap I found that many referred to or leveraged the Dublin Core version, based on 15 elements. As a result I reasoned that approximately 15 elements would be the maximum, this number has the practical appeal of suiting both effective and economical administrative overhead. The new ones were derived from the Dublin Core options.

Dublin Core Metadata Element Set has the main attributes both the administrator and the institution need, and the interoperability that computers require, because many other standards include it. Taking into account the fact that Dublin Core is an accepted standard, it would not require too much in the way of selling it to team members unfamiliar with metadata schemas, or standards themselves.

Impact

The conclusion that information is only valuable in context implies a corollary: contextualized information can create more relevant information for the user. From such relevant information users can realize knowledge.

Table 1

Questions

Expanded Questions and Answers

MetaData Element

What?

What is this stuff, the format I am handling?

“Format” is for ease of use and knowing what to do with it and what to expect (.HTML, etc). This has somewhat limited use in non-text applications, such as graphics because you can not label an image inside the image. Image headers contain information about the image, not about the context of the image or the elements within the image or images.

Format

What?

What is it called?
“Title” is what the document of X format type is called.

Title

When?

What date was it created? (when was it made?)
“Date” provides the when it was created or last saved. In some ways this can imply versioning.

Date

Who?

Who provided it?
“Source” tracks where the file originated from.

Source

What/who?

What language is it in? (who is it for?)
“Language” helps make sure it is located in the right area, and for the right audience.

Language

Who?

Who owns the legal rights?
“Rights” provides information on ownership and establishes some idea of copyright.

Rights

Who?

Who created it?
“Creator” provides a information as to who made it. Often webpages have more than one author.

Creator

Who?

Who approved it?
“dc.Creator.approver” gives another set of eyes[1] watching who approved the document to be published.

dc.Creator.approver

Who?

Which organization owns it?

dc.Creator.org

What?

What is the subject?

Subject

When?

What does it cover?

Coverage

When?

For what dates/time period is it valid?

dc.coverage.date

Where?

For what location does it apply or is it valid for? “dc.coverage.location” provides a detail about where the document belongs, such as an association with a specific campus.

dc.coverage.location




How Valuable/How found?

How valuable is it stacked in an outside Search entity, such as Google?
“pageRank” page rank as provided by Google search engine, through Google’s API.[2]
Future improvements could add pageRank.searchengine.

pageRank

Value?

How valuable is it in terms of hits?
“pageRank.hits” total number of hits as compared to other pages on the MSIM site, from “
Clicktracks” data.

pageRank.hits

Value?

How valuable is it scoring within the scope of the MSIM site?
“pageRank score” is the mean of the PageRank and.hits, to provide scoring. This could enable authority control to other search systems, when combined with date for example or location for instance.

pageRank.score

The last three elements of Table 1 above, pageRank, seek to display how valuable a page is, which is included as an experiment. These would be auto generated on the fly as the page was assembled. These categories would be based in part on the Google PageRank API and on indexing and hits actively made in the system.

Why these and why does it matter? To help end users find the school they want and for the school to welcome them, is just one reason. In my own experience working with the Seattle Community Colleges as a web designer, reading the log files, it became obvious to me that many of the foreign nationals inquiries on the Seattle Community Colleges’ sites were not serving their potential student body, in part because no one was examining the logs. Those who rarely did examine the logs did not take the next logical step to advise the senior leadership on how to improve their website as their most functional, least expensive form of marketing. It is likely that such a field cross-referenced with other metadata could aid systems to enable end users to locate the most current authoritative information on a subject.

Nervana’s Sharon L. Bolding advised students to “Iterate to improve over time” in her class presentation. During class discussions she mentioned that organizations needs for information changes over time. If you try to think of everything you might need the administrative overhead is too large and quite simply you will fail. It’s better to think of what you need as experimental, try something, and iterate as you go along, just like all of Web design and application development.


Understanding “what is being used and by whom” in order to identify patterns, can include some automated information from trusted authorities. Combined with the need for pattern recognition the idea of trapping active logging, and exposing some of it for users occurred to me. Click tracking is done at the University of Washington by “Clicktracks”, according to Joel Larson. But what if that data was exposed at some level, such as in the metadata comments? How could this be helpful? Who will it be helpful for?

The reasons someone comes to the University website are fairly well known, however the right to view some materials is based on authentication. In this environment such user rights are controlled by the ‘PubCookie’ as an authority grantor, using databases as control agents. Customers such as students are a common end user. But are they able to find what they want right away from navigation? How many use a search engine instead, and how often? What Search engines do they use? Where do they come from and where are they going?


This solution could make a difference in the user experience on the MSIM website. It might help improve the logic in the user interface design. From the standpoint of site administration, it might help show where some static materials are helpful in certain ways, and yet completely static materials fall out of use over a longer period of time. What it may show we do not know and that itself is interesting.

Problems

Still with this additional data, we need real human beings to observe, analyze, and accept critical feedback and try exceptional things in order to improve information findability and refine that slippery thing known as User Experience. Computers are here to serve man, not the other way around.

Even the reason to ask these questions of Value is implied in the need for large organizations to continually try to improve the findability and usability of their information. In some way, one wants to take the data out of the hands of those who believe they know what, when, where, and how information is being used, and put it into the hands of the end user to make those decisions themselves. The question really is how to help users do it, and what logical steps can one take to make this possible.

What I sacrificed to make the choices was time and lack of experience in terms of what I could or actually would do with this metadata. Once exposed, how useful is it really? Does it only serve me as the administrator? How can I make it more useful?

While many of the metadata schemas were very intriguing I felt I would need more experience actually using and “munging” data to make it useful for end users, the organization, and at the machine level.

Resources

  • Dublin Core. (February-1-2007) Dublin Core Metadata Element Set, Version 1.1.
    http://dublincore.org/documents/dces/

Considering metadata elements over several days I realized that I could not conceive of better buckets myself, not really, just derivative ones. Other appealing systems I considered were RDF, OWL and GEMS. I researched RDF and OWL because my own interest tend toward technical issues. GEMS is included because it is a metadata schema specific to education.

But ultimately I returned to Dublin Core for metadata due to simplicity. I appended that schema, and expect that I would learn much more from it “in the wild.”

“Module 3a: Metadata” PowerPoint presentation, slide 4, from IMT530b class and website.

· Element refinements allow narrower definition of the elements for specific purposes http://dublincore.org/documents/dcmi-terms/#H3

· Encoding schemes (authority control) can be used as rules for the values allowed within elements http://dublincore.org/documents/dcmi-terms/#H4

Email: Date: Thu, 25 Jan 2007 12:29:54 -0800 Subject: RE: [Imt530b_wi07] FW: Exercise 3 instructions
… examine metadata standards that are related to content/resource description… looking for is repeated elements across many schemas, which will give you clues about what is commonly used for describing content… The standards I list in the first part of the assignment are recommended starting points, because many of the others are more technical and not appropriate for this work.

· w3schools. (Feburary-7-2007) RDF Reference.
http://www.w3schools.com/rdf/rdf_reference.asp

· Herman, I. (Feburary-7-2007) Web Ontology Language OWL.
http://www.w3.org/2004/OWL/

· Seeley, R. (February-7-2007) The Semantic Web: The OWL has landed.
http://www.adtmag.com/article.aspx?id=8144&page=

“The language provides a standard way to define Web-based ontologies so that data can be described as what it is -- an enzyme in a biological application or a hotel in a travel industry application -- instead of as document in a tree structure or other database abstraction.”

W3.org information on how metatags should be formatted

  • Berners-Lee, T. (February-7-2007) Paper Trial: Web Architecture Ideas.
    http://www.w3.org/DesignIssues/PaperTrail.html

    Introduction
    Social processes look like state machines. However, they don't exist as a state variable stored in one place, but as a trail of documents. You know the true state of the machine only if you have access to the latest documents. (This is not the problem addressed here, this is real life being modeled.) Paper-trail is a system which allows one to follow a strict process by creating new documents in a constrained fashion. Every paper-trail document has a pointer to a "paper-trail schema" which defines its document type (eg "constitutional amendment") a pointer to its justification documents (maybe) a notarization of when it was checked against the schema by the paper-trail program. The schema defines:

· Prerequisites for a document being valid, in terms of other documents

· Hints to other document types you can make from this one (state transitions)”



No comments: