Saturday, March 10, 2007

Del.icio.us Metadata Tagging

NEEDS EDITING: This is a document, a JPG

Of a subject, A person named Linda Lane, In a program, categorized as a student, and an assistant teacher

at the University of Washington in Seattle, USA
c. March 10th, 2007

Tag You’re It! What? Who? Where? When?
dc.Format
dc.Format=PDF
dc.Format=JPEG
dc.Format=doc

dc.Subject

ll.Location

ll.People
ll.People=ProspectiveStudents
ll.People=CurrentStudents
ll.People=InternationalStudents
II.People=ExecutiveStudents
II.People=DayStudents
ll.People=Women
II.People=Disabled

ll.Program
ll.Program=Capstone
ll.Program=Cost
ll.Program=Cost.Scholarship
ll.Program=Curriculum
ll.Program=Curriculum.Subject
ll.Program=Dates
ll.Program=Degrees
ll.Program=Titles

Linda Murry Lane

IMT 530- Organization of Information Resources
Winter Quarter Final, 2007
Michael Crandall | Instructor

• Approach
This document is a reflection of my learning experience to construct vocabularies posted as metatags to my Del.icio.us.com site based on the MSIM website. I made many approaches and evaluations including working closely with a leading thinker on the subject of metadata, Mike Crandall, my instructor, and academic advisor at the iSchool, University of Washington, Seattle.

There are many ways to develop a classification scheme. There are four ways recommended by some experts: rational, empirical, cultural, and contextual.(Taylor, 2004) Rational is logical. Empirical uses your observations. Cultural is what everyone believes. Contextual is what works best within a situation. The site we are asked to tag is the Master of Science in Information Management (MSIM) and display in http://www.del.icio.us.com.

Since I know little about education as a field I decided to review the MSIM Web site and build a model to see if I can understand what the designers intended and represented by looking at their navigation and the related pages, and build on that. This is the contextual method, and it relies on the rational, empirical, and cultural methods.

But having second thoughts I finally arrived at a purely personal and subjective point of view based on Dublin Core and what I know about social tagging (Golder, 2006). I want a classification / categorization of facets and controlled vocabularies, a system I understand and agree with especially because this is an exercise, and play is one of the ways we learn. (Bransford, 2000)

Originally I thought I would use an approach assisted by computers, which is to say that I would strip the existing metadata comments off the HTML pages as I have seen done by Search engine accumulators.

As a result of having a multi-page Web site with one of the most popular personal resumes published on the World Wide Web I inferred information about search before I read anything about it. My resume page is optimized for a number of returns in Google, Live, Yahoo! and other search engines, floating to the top of more than 4.5 million similar items indexed on the Web. Such queries include ‘resume + Product Manager’, ‘Program Manager’, ‘Microsoft’, ‘Information Architect’, and similar keywords.

Following the queries back to their sources using a source tracker taught me several things about searching, and in this way I have been exposed to ways that people search, actual text strings, languages, personal settings in their browsers, data accumulators, products related to resumes and searches, and so forth. At least two other of the resumes which rate as high as mine do so because I advised those people how to publish their resumes on the Web, obtain search results, and reinforce them. Very few people that I can detect locate my site through any other method except search.

The key component of that exercise is publishing a document (Buckland, 1997) on the Web takes place as a part of human endeavor, as does the reinforcement. The machines and related software perform their indexing and search algorithms, and then, human beings, searching for specific information seek through search engines, and by making selections reinforce results. This makes their intuitive and intellectual results based on what is returned in an iterative cycle. But at the beginning and end are people doing things for human reasons.

A recent news article announcing the Freebase database to serve the Web is at its core just another way to include human side facets for computers to mechanically serve information.

What I had decided to do:
1. Take a compilation of all of the MSIM pages - All 97 MSIM URLs:
http://docs.google.com/Doc?id=dvcqpwq_17d4cqn6*

2. Run each of them through the Dublin Core metadata editor engine linked off the DC site, (http://www.ukoln.ac.uk/cgi-bin/dcdot.pl)

3. Group the results of each header set into bundles

4. Then enter all of those into Del.icio.us

This would be one rather obsessive way to respond to the assignment, but it would work. In this way it would all be computer generated with the exception of my manual inputs into fields. It is mechanical in nature, and no doubt Darth Vader would be proud. (Lucas, 1977)

There is also a manual way to do it, based on the intention of Del.icio.us as a more human social tagging tool but I liked the first way because it is complete. However the project assignment had several human components. I felt that a controlled vocabulary would arise naturally as I had more experience with the MSIM site, even though my choice had been to view the site through the different eyes of the various intended users – mostly students -- prospective, current, international, disabled, and alumni.

At making this suggestion, of an experiment mixing manually created metatags and computer generated ones my academic advisor provided feedback that the idea behind the assignment was an experiment –
I wanted you to think about what the intersection of an organized tagging system and a freeform one would mean, and what it would look like over time. I'm curious about how this grows over time as subsequent classes come along (also interested to see how many people actually discover and use previous tags).

Thinking about intelligent information system designs I had already reached conclusions based on class discussions and my own experience with Web based information systems that human systems, especially those based on human emotion, not just purely intellectual motivations are a preferable design factor.

“Unintelligent information system design assumes that the 'heavy' end of the 'intelligence' of the system is with the software and its designers, and that the frustration or 'feelings' of end users are of little consequence” was an understanding I reached with a friend of mine, Brent Barr, a former Boeing engineer, when discussing the MSIM site and security and log in issues I experienced just in the first few weeks of the MSIM program. Together we concluded that reality is the reverse when considering information systems because we can not completely remove our subjective views, and nor should we want to, because such systems are for humans to use, and we are naturally guided by our feelings, if we let ourselves be.

Intelligent information design is achieved when it actively engages the highest intelligence present in the system. From a keenly aware design point of view the highest intelligence is not purely intellectual, abstract, or objective, but includes these ‘feelings' of the user. Structuring content so that targeted intellectual information may be found, and from there personal choices in reading, navigating, sorting, sense making, and decision making can occur without undue stress to the weighted side, that is what benefits the high end of an information system; this means driving systems to deliver information for users, for human beings is the goal. “Computers are here to serve mankind not the other way around.” (H.H. The 14th Dalai Lama, 2000)

Humans naturally categorize, sift, structure, and organize physical stuff and information (Taylor, 2004). It is a part of human intelligence, it is crucial to the activity of our determining knowledge and belief, making decisions, and storing and retrieving objects and ideas later. But categorizing content is difficult, because it is based on human views of ‘what and how and why’ we observe and organize. Ranganathan’s view in the facets classification he developed was especially of interest to me, I want to say “charming” because it was universal in approach. He saw it more as we do today, very much like the news, who, what, where, when, and why – or as he described fundamental his facets classification structure as: Personality, Matter, Energy, Space, and Time. (Steckel, 2007)

Our scientific goal is to be objective as much as possible. However with more experience information scientists have realized that all categorization and classification is performed within weighted human structures, colored by belief, politics, history, education, etc, as well as existing information structures. These are all categories in themselves.

Just as observing a people who have a category called “Woman, Fire, and Dangerous Things” or a heavily weighted political view, or the intention to be humorous, charges a classification structure. All of our knowledge and the environment do so as well.

So our goals really are to decide on what is meaningful in context. That is a human system. How we can teach the machine to understand the content and context will help us fulfill personal aspirations. Applied ontologies are explicit mirrors for conceptual modeling of ideas, and data in contextual frameworks. They help computers appear like they think!

This means that structuring information which makes the most sense within the environment it will be used, and understanding in at least a basic way how an information system works and what it will be used for are the most important starting places. A corollary is the need to be aware of what systems are actually capable of doing, what they do well and what they do poorly.

From purely observing we all have maps of the world which are some combinations of rational and emotional, objective and subjective, when combined with motivations, needs, and desires apparently impel our actions. Metatagging is a relatively small part of our mapping which allows ourselves and others to see what we are doing, where we are going, and show it through links.

Regarding the specific question of the MSIM website, my feelings tell me the navigation is not structured well and that the tone of the site is old fashioned, and too cluttered. My reasoning is that I continue to use search instead of navigating, and as I am sensitive to the number of mouse clicks this probably means I could not find things in the past. Emotionally the turquoise color in particular looks like nearly every thing I designed for the conservative Microsoft sites I worked on a couple of years ago. I would love to see a neutral clean, clear, fashion-forward extremely slimmed down site. The expectation I have for RSS feeds may be provided as a logged in user of tools the University of Washington provides, but when I am logged in they do not appear on the MSIM site. Also pages which receive more attention through searching may be something which takes preeminence in structuring the presentation of content.

• Implementation
As I moved my metadata schema and vocabularies (Leise Fast Steckel, 2003) into Del.icio.us, I discovered many frustrating problems. Del.icio.us bundles work differently than expected, creating a new bundle overwrote the prior one; I could not understand what the system was trying to tell me. I just wanted to create a bunch of bundles and add the groups of predetermined metatags to them.

Alternatively since I am working from a slow machine I wanted to create a bundle, and subtags and just copy and paste all the complete URLs on one subject into the open field. The results of doing this produced hidden tags, so I had to delete them multiple times, because all I had were piles of unrelated metatags that made less sense to me than a simple structure of Capstone.PDF.NameofSubject would.

Is there was any reasonably automatic way to derive the same information such as I first proposed from the information stored in the headers? The problems meant I kept redesigning the vocabulary, which is not a workable strategy. Naturally I began to use open tagging (Guy, 2006) to help me personally recall what posters look like, for example, with just a few keywords in order to get any thing out of the system in the future, because I could not easily copy and paste using older technology on Win2 K machine. Gifted with a good visual memory I should be able to relate a few tags combined with the title and “pink” or “yellow arrows” if I have seen the image before. This is unlikely to help anyone else locate specific capstone posters or remember them.

Hierarchies provide direction in understanding, and I believe one could develop a flat style of tagging in Del.icio.us to describe a hierarchy but it would not appear hierarchical easily.

The MSIM Microsoft scholarship page is an apparent victim of Web rot. Considering the Web rot issue I wondered why spend time storing URLs which invariably fade over time? It’s really the entire content, or the key components of it needed. One of the favorite videos presented to the Informatics INFO 344 class has already been removed by the user – possibly someone with no legal right to post internal Second Life materials to a public site. Embedded as video link from YouTube, it does no good nicely linked on a blog, it is an empty reference, and it took significant time to create. This example enforces the idea to quote more fully any text and copy images wanted from original sources because it is possibly the last time they will be present in the system.

That we can only control our own creation and placement of materials on the Web, not anyone else’s stuff, implies that search should be fast, continuous, and retrieve and store the original if possible or important enough

It is increasingly obvious how difficult it is to tell systems and data how to interact as more data and systems are created and digital almanacs may help return one answer to a specific question. (Markoff, 2007)

Some things such as personal names don’t work visually to be crammed together without spaces or with underlines between them, because a name is generally brief, and a name is what it is, not a metadata chunk of what it is. Names can be represented in many ways, and culturally what a name is varies, such as middle, surname, nicknames, which of the names you would use an initial for and so forth, so having names even more complex muddies the water even more. (McCulloch, 2004)

Doubtlessly Del.icio.us will cut out spaces (just look at the name), and make names into separate tags, just like other less useful tagging structures. Flickr allows for spaces, but I am not certain why “The Beatles” are also “TheBeatles” or what this means in search results over the entire Web.

• Influences
How my solution make a difference in the management or user experience on the MSIM website would be to create a completely flexible interface, highly discoverable which individual end users can subscribe to. For example I have no interest in seeing anything other than the links to the class Websites I attend. For the sake of time I am simply not interested in the other materials, except for a list of teachers, staff, their position of responsibility and contact information which I could find by searching and adding to My Site at will.

Experienced end users may prefer when they go to the MSIM Web site the only materials available are what they choose to put there, with changes and updates offered through mechanisms such as an RSS feed. The same goes for the inbox in email, students are required to subscribe to a number of listservs. Listserv was an innovative technology in 1987(Internet.com) fully twenty years ago. The iSchool announcements should scroll along on the right of the screen for example, so one can scroll through them and in this way the information would not arrive mixed with the end users business and personal email.

As far as improving searches, the input device needs to be updated to work in conjunction with the new software tools. Mice should be capable of storing multiple query objects so end users can click on Web objects, text images or whatever and have search understand that they want the deltas of those things, and related things or contrariwise their opposites.(Lane, 2007) Searches should be able to lean in a particular direction based on the user’s selections.(Steckel, 2002) (McCulloch, 2004) This can be accomplished with algorithms, metadata consider related items, and the date of things; the system should be able to gradually learn from the end users selections, and the more easily anticipated needs, such as a newly enrolled student.

As we know search is not the answer to every human need, random access or browsing is a way to learn. (Taylor, 2004)

With familiarity, when there is too much information to cope with, or the stakes get too high, people get lazy and selfish, and want systems to stand in for them – if you consider it from a system point of view an example is - any operating system should learn where (what directory) and what the work flow is, and how the user expects it to be sorted, instead of returning to the preset configuration. There are serious physical implications because one’s wrist gets tired of clicking.

That is the same reason the MSIM website should to be completely in the end users control to discover upon their invitation and at their will. In short the MSIM website to suit individual needs completely not be a general presentation for everyone which then they need to negotiate each time they use it, it should be self configuring. We need to trust end users to add those webparts that will help them study or work or do whatever they want at the University, and scan through the information the University feels its people should review, without making them store it.

The empirical standard against which 'intelligent' systems should be judged, we know by the real, accurate, data communicated by the greatest active intelligence present: the feelings of the user. "Trust your feelings, Luke." advised the young Skywalker’s Jedi Master Obi-Wan Kenobi, and Luke lowered his targeting computer to rely on his mastery of skill and instinct.

• Trusting Problems
In the end considering relativistic verses unambiguous objectivist empiricism is besides the point of all this and beyond our ability to confirm either way (Gould,1983); the only truth to be found in these things is change itself. Empirical science changes and in that respect it is relativistic, so a middle path appears to be the logical one.

As far as practical websites that are used frequently should provide a measure of the control of appearance and change to the end user.

While working on this project I was advised I was going too low of a level down with the data analysis. (Crandall, 2007) Once I had higher level clusters of metatags the project seemed doable.

The mechanics of Del.icio.us I believe I will understand again, because when using it just for my own purposes I did understand exactly how bundles worked, and now perhaps due to the complexity of my needs it appears not to function as I expected.

The reading helped, they were enjoyable, but in the end it was just too much to guide me effectively in this project although I highly enjoyed our investigation of the related topics. The trees and forest got lost in the volcano of information. I needed to just try something, without concern for being right. The introduction of so much information, too many three letter acronyms, arguments for and against styles and ideas (Gould, 1983), and significant information from the readings and book, data about data about data (Shirky, 2005) can be an overload. It created a feeling that I am unworthy person because I can not do all of this, figure out the easy to use software, and deliver a paper on time. Similarly when seeking and shifting through information other people must encounter such feelings of frustration, around the massive amount of information and their goals, when all they seek is just this one little thing, which they may have difficulty finding in a reasonable amount of time.

In reading various arguments back and forth I found myself wondering, “Where’s the fun?” It’s a human character, the adventurous Hans Solo, with all his limitations and helplessness (no magic powers, no high Jedi status, without any hip light-sword waving guru “Master Yoda”), simply with his desires and inherent human capacity for understanding to control and influence his environment, he would cheer that sentiment and just enjoy himself and his surroundings.

• Informants

• Taylor, Arlene G. The Organization of Information, 2nd ed. Westport, Conn. : Libraries Unlimited, 2004.

Returned to Taylor for the basics and reread sections such as the Metadata chapters.

• Buckland, M. (1997). What is a "Document"? Journal of the American Society for Information Science, 48(9), 804-809.
What is a “Document” I asked myself several times is it ok if I call an image a document?

• Internet.com, What is a Listserv? http://www.webopedia.com/TERM/L/Listserv.html Retrieved March 9, 2007

• Markoff, John. Published: March 9, 2007 http://www.iht.com/articles/2007/03/09/business/webdata.php
“Creating a database to organize the Internet, Metabase envisions a repository that is like a digital almanac”

• Lakoff, George. 1987. Women, Fire and Dangerous Things: What Categories Reveal About the Mind. Chicago: Univ. of Chicago Press.

Class work, provided a framework around the terms “cognitive and linguistic semantics”.

• McCulloch, E. (2004). Multiple terminologies: An obstacle to information retrieval. Library Review, 53 (5/6), 297-300.

It is becoming progressively impractical for users to consult the wide range of sources available to satisfy an information query. Will Mapping prevail?

• Guy, Marieke; Tonkin, Emma. (2006). "Folksonomies: tidying up tags?" D-Lib Magazine 12(1). doi:10.1045/january2006-guy.

Description of ‘Tag’ = a word that signifies a relationship between an online thing and someone’s mental construct (Context). Mario Popish discussion: “Flaws in folksonomies [social tagging (Del.icio.us / flickr)] Nonsense tags like lbnl, that limit effective searching, misspellings, Single words like runonsentences, trying to satisfy the personal and the collective's organizational need. Interesting theory presented that tags will tend to converge thus “evolving” meta-data in a natural way. Tagging Best Practices: pluralize, lowercase, use_underscores_in_phrases, follow conventions, add synonyms (computer should do that).”

• Golder, S.A. & B.A. Huberman. (2006). The structure of collaborative tagging systems. Journal of Information Science, Vol. 32, No. 2, 198-208.

Taxonomies are hierarchical and exclusive while tagging are non-hierarchical and inclusive.

• Peterson, Elaine. (2006). "Beneath the Metadata. Some Philosophical Problems with Folksonomy". D-Lib Magazine, November 2006, Volume 12 Number 11. http://www.dlib.org/dlib/november06/peterson/11peterson.html

Brian Dorsey discussion – “Peterson seems to completely miss one critical difference between library style subject classification and tagging. Subject classification is a binary choice - a given object either does or does not belong to that subject. Tagging however is more like voting for a subject, the systems know exactly how many people have used a specific tag for that object and can choose to only present the most common, agreed upon tags.”

• Shirky, Clay. (2005). Ontology is overrated. http://www.shirky.com/writings/ontology_overrated.html
Brian Dorsey discussion – “Religion = Christian from a Christian point of view - unstable categories (East Germany). A significant break -- by users tag URLs and aggregating those tags, allows alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another.”

• Steckel, Mike. (2002). Ranganathan for IAs. Boxes and Arrows. http://www.boxesandarrows.com/view/ranganathan_for_ias

Description of Facets, and Ranganathan’s thoughts. Ranganathan’s reductionism in his classification facets, as related to ideas in classical Indian literature is appealing to me because the very basis of wisdom, shunyata (Sanskit), is the concept of everything arising from emptiness; in the ontology of Mahayana Buddhism it is stated in the celebrated paradox:

“form is emptiness; emptiness is form”

This is the core of the Heart Sutra. The Buddhist notion of emptiness logically concludes that ultimate reality is knowable, there is clear-cut ontological basis for phenomena, and that we can communicate and derive useful knowledge from it about the world, which Ranganathan restates in his modern terminology. From the human experience side, it is the same for form, feeling, perception, intention, and consciousness.

Portions of this explanation derived from my own education and the following website: http://www.thebigview.com/buddhism/emptiness.html It is interesting to note that a copy of “The Diamond Sutra” is the oldest known dated printed book in existence.

• Spender, J. (1998). Pluralist Epistemology and the Knowledge-Based Theory of the Firm. Organization , 5 (2), 233-256.

Bill Marriott discussion: “One definition of knowledge is something that others can gain evidence from, and this knowledge can be separated from the knower (Spender, 1998)”

• Gould, S. (1983). What, if anything, is a zebra? Ch. 28 of Hen's Teeth and Horse's Toes: Further Reflections on Natural History. New York: Norton, 355-365.

• Lane, L. (2007) Modal, multiple input search mouse is an original idea, I did not read it elsewhere, it just occurred to me when trying to think of ways to save my wrist when combining ideas for search attempts on the MSIM site. It would work by stored selections or groups of stored selections either with or without positive reinforcement (like a fly out collection of lists of search parameters or locations). The search input box would be contained in the listing mechanism.

• Lucas, George Star Wars (1977). Film, (USA)
A social view in the computer industry as computers being the dark side of the force, or essentially non-human goes back to conversations with co-workers in during the boom and bust cycle of that industry.

• Bransford, J. and Ann L. Brown (2000) How People Learn: Brain, Mind, Experience, and School: Expanded Edition by National Research Council (U. S.) Committee on Learning Research and Educational Practice, National Research Council, (Paperback )

Learning is a natural, playful process, which in children is frequently accompanied by joy.

• HH The 14th Dalai Lama Ethics for the New Millennium, Riverhead Books, 1999

His Holiness’s work has mentioned computers since 1980 that I am aware of, and I have attended more than one speech when he said “Computers are here to serve mankind not the other way around.” Which when you are working 60+ hours a week at Microsoft seems like a novel approach to life, liberty, and the pursuit of happiness.

• Leise, Fred; Fast, Karl; Steckel, Mike. (2003). Creating a Controlled Vocabulary. Boxes and Arrows. http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary
Construct sets of CVs, and related terms, look inward, look outward, ask people -- sounds like great advice.

No comments: