From this week’s reading I learned three new things and added a fourth from the class discussions.
1. Data and information is fluid to information managers in the same way that steel is a liquid to a steel worker.
Imagine a person holding a piece of data between their hands like someone might hold a basketball – but this blob of information is superflexible, and changes completely in shape or dimension while it is moved around -- because it is data it may be viewed from any position or shape changed as affected by a model or compared to other data.
2. Data has three basic states:
• Stored
• Processed
• Communicated
The original data should not be written back to its original source, so information once it has been altered, if it is going to be kept, and not just the results, needs to be reposited someplace other than the original source location. Storage is expensive.
3. Managing data and transforming it into information or actionable knowledge means moving the data, by applying different dimensions and techniques.
Storing data is expensive. Generally unless it is a secret, and retains value by being secret, like CIA intelligence, or a corporate secret like the Coca Cola recipe stored data is not useful.
Data in motion, that is data being viewed, or altered, or accessed is commercially valuable and can produce value and revenue.
The The Two Crows Corporation article "Introduction to Data Mining and Knowledge Discovery" was particularly engaging to me beginning with its definition –
"Data mining is a process that uses a variety of data analyst tools to discover patterns and relationships in data that may be used to make valid predictions."
Wow, sounds like software can do something spooky – as in supernatural – and that special talent is to "predict the future." But this is in fact what software can do – and it requires three things to do this –
+ data from a source
+ ability to munge data to information
+ a place to store either the data or just the results, but the data to be effective must affect some thing else.
All this means is that either you move and store, or you move and munge and never permanently store results. Data is currently stored, or moving, processing or mungeing, which includes modifying other data. ( see: http://www.twocrows.com/booklet.htm )
From this appears that the removal of one of these nodes might serve to improve speed – such as the uploading into memory, and logically this means that computers which never turn off and continuously churn data in some way, such as very refined "data supermarts" would be most efficient.
The illustrations provided in these articles were very helpful in visualizing and made the understanding of these models and concepts easy. Reflecting on this made me think about what a visual model for the entire process of Data Engineering into Actionable Information. What I envisioned but haven't had time to draw is a person holding a glob or blob of data between their hands like someone might hold a basketball – but this is superflexible, and changes completely in shapes while it is moved around -- because it is data it may be viewed from any position or shape changed as affected by a model or compared to other data.
Reviewing data to detect empirical patterns and so forth makes sense – but this section was particularly interesting:
"Data mining is a tool for increasing the productivity of people trying to build predictive models."
If this isn't the most interesting thing a futurist, a scientist, a medical researcher, or a sales team, can hear and understand about computer science and data modeling I don't know what would be. Predictive models in and of themselves are recursively fascinating. This may lead us to the question of what does that take?
"While the power of the individual CPU has greatly increased, the real advanced in
Yes, that is more exciting news, this is the same way that linked computers in off times are used to search as a massive array for unexplained patterns in space's background noise hoping to hear a signal. They are looking for alien life. Clearly such a method is helped by all that linked volunteer processing power. But that's the lesson, if you really want to find something out, it's possible, and lots of businesses and individual people use these techniques in all kinds of application because it's cheap enough.
"Visualization works because it exploits the broader information bandwidth of graphics as opposed to text or numbers. It allows people to see the forest and zoom in on the trees. Patterns, relationships, exceptional values and mission values are often easer to perceive when show graphically, rather that as list of numbers and text."

One of the interesting ideas which came from reading the information on link analysis, from mention of the two kinds of inquiry commonly used "association discovery" and "sequence discovery" (with the factors of support, relative frequency, confidence, association) is the idea that a database that links to an additional database besides association and sequence, over time might arrive at many expected detections in patterns of data – if for example the data base was the "Life Database of Patterns of Obvious Qualities". Such a database would contain many thousands of facts such as 'dead people do not buy anything', and corollaries such as, 'so there is no point in sending advertising to their residence.'
Such a mammoth database would be a unique scientific challenge to create, maintain and link to, of particular interest to me, is how much of the data collected would be true and how much be useful? We can not always tell how solving a problem may serve to inform something else, such as the discussion on Microsoft's edge checking algorithm, email scanning software in use to weed out spam from Microsoft email servers – as it turned out this edging method was used to sort through DNA in the successful search for a vaccine for AIDs going into clinical trials.2
Even the acronyms at this level sound spacey – MARS – the Multivariate Adaptive Regression Splines. This is a much more interesting field than I was prepared to encounter, and in summation, I come away quite curious at how far our creative intelligence and need and desire to know will be able to drive the technology to the computational limit, and over into helping humanity on a mass level.
1. West T, "In the Mind's Eye: Visual Thinkers, Gifted People With Dyslexia and Other Learning Difficulties, Computer Images and the Ironies of Creativity". Prometheus Books; Upd Sub edition (September 1997)
2.The application description from Phil Fawcett, Microsoft Research Liaison PM, in person presentation on "optimized applications" http://research.microsoft.com/ivm/HDView/HDGigapixel.htm, University of Washington, Seattle, April 17, 2007.
Week 5: Modalities of Information Delivery
Data Mining
The Two Crows Corporation, "Introduction to Data Mining and Knowledge Discovery, Third Edition" 1999. Accessed on 2/25/2006 from http://www.twocrows.com/intro-dm.pdf.
Witten, I.H. and Frank, E. (2000). "What's It All About?" In Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. (Chap. 1). San Francisco: Morgan Kaufmann. pp. 1-35. (Focus on Sections 1.5 and 1.6)
Editorial Review & Delivery
McGovern, G. & Norton, R. (2002). "Editing Content." In Content Critical: Gaining Competitive Advantage through High-Quality Web Content. (Chap. 6). Pearson Education Limited. pp. 109-122.
IT Help Desk
Clarke, S. and Greaves, A. (2002). "IT Help Desk Implementation: The Case of an International Airline." In Annals of Cases on Information Technology, 4, pp. 241-259.
Walko, D. 1999. "Implementing a 24-Hour Help Desk at the University of Pittsburgh ." In Proceedings of the 27th Annual ACM SIGUCCS Conference on User Services: Mile High Expectations ( Denver, Colorado, United States). SIGUCCS '99. ACM Press, New York, NY, pp. 202-207.
Duhart, T., Monaghan, P., and Aldrich, T. 1999. "Creating the Customer Service Team: An Ongoing Process." In Proceedings of the 27th Annual ACM SIGUCCS Conference on User Services: Mile High Expectations ( Denver, Colorado, United States). SIGUCCS '99. ACM Press, New York, NY, 51-55. DOI= http://doi.acm.org/10.1145/337043.337090.
Padeletti, A., Coltrane, B., and Kline, R. 2005. "Customer service: help for the help desk." In Proceedings of the 33rd Annual ACM SIGUCCS Conference on User Services ( Monterey, CA, USA, November 06 - 09, 2005). SIGUCCS '05. ACM Press, New York, NY, 299-304. DOI= http://doi.acm.org/10.1145/1099435.1099504.
1 comment:
I suggest KDnuggets as a resource:
http://www.kdnuggets.com/
Post a Comment