FeelingElephants’s Weblog

19 October, 2007

Split session: Scaling up

Filed under: GHC07, Metadata, politics-tech — feelingelephants @ 3:51 pm

Paradigm shifts, new technology requirements, and a solution path: the future of computing

Ramune Nagisetty of Intel gave a talk which more than anything gave a feeling of the zooming, Moore’s Law following shifts in the high tech industry. She focused her talk on:

    The Power Wall
    -The issue of power and its uses is becoming more and more popular and necessarily relevant, especially within technical industries. From cell phone batteries to PG&E bills for company or school labs the use of power must be considered as technology is developed.Many Core Architecture
    -Use of two or more processors to make computing more efficient is a trend the presenter sees emerging in the field.The Future: Sensors and Data Sets
    -iPhones and Wiis are impressive because they know where they are. Use of sensors is cool. Stories about the US Government Data Mining for private information and general use databases all of these are the current state of a growing field.

Finding Semantically Similar Objects Using ontology to find similar objects.Relevant questions:

Yu Deng, IBM T.J. Watson Research Center  presented information on the friend of all people not familiar with working their way into other people’s brains: designing technology which searches or categorizes information based on semantically similar objects. Most experienced researchers get used to using odd words to find what they need. For example in a conversation over Disney’s collected biometrics (which will be blogged about soon) us hardcore geeks started talking about what words someone else (Disney’s web content writers in particular) might use in an explanation of a guest’s Privacy Rights. Here are some of the keyword combinations which might produce results:

1) “Fingerprint” Disney

2) “Digital Scanning” “Disney world”

3) Disney biometrics

4) “Privacy policy” “Disney World”

5) “Guest Information” usage

However as I understand use of ontology  in next-generation searching is that any of these phrases could pull up the Disney Policy if it existed on the Internet. This would be because the words “Fingerprint”, “Digital Scanning” and “Biometrics” would all be linked in the search structure. NOT equivalent, but linked. For example, the word “Disney” would *not* need to be linked to “Disney World” because based on current search algorithms “Disney World” would come up when “Disney” was searched for because that phrase includes the searched for key word. Anyhoo, next presentation, here I come!

Inspirational Quote:

We’ll try to be a catalyst for development projects that include women at all levels - in the design and development process.” Anita Borg

Jessica Dickinson Goodman

Official GHC 2007 Blogger
You may comment on this blog by visiting the GHC (Grace Hopper Conference) Forum.

5 August, 2007

Metadata in the news

Filed under: DRM, Judicial Branch, Metadata, copyright, news, politics-tech — feelingelephants @ 9:37 pm

Hey,

So, a little bit less political today. Ok, just a little calmer. Here are 3 really cool articles on where metadata shows up in the real world.

http://diveintomark.org/archives/2007/06/26/piracy-lessons

http://www.commondreams.org/archive/2007/07/29/2837/

http://www.economist.com/world/international/displaystory.cfm?story_id=9546242&CFID=13851491&CFTOKEN=81503514

The first is about how pirates use metadata to accurately show their downloaders the facts about whatever they’re downloading (and explaining why legit movie retailers could learn something about product information from said pirates). It is nice to see a comparison of the quality legally and illegally obtained movies simply because this kind of discussion does not happen in mainstream media. I would like to point out that I abstain from pirating movies and music because, though I think the current life + 70 years term of copyright is unreasonable it is the law and I need the moral high ground to argue for the reduction of copyright to some more reasonable period.

The second link is to a NYT article on the deeper aspects of the allegations that the NSA holds vast databases full of metadata about private US citizens. This is interesting in the first place because I don’t like the idea of the NSA going for datamining expeditions nor do I like them having huge databases about US citizens just in a general way. In the second place, it is suggested that this data mining program was the program which the current Attorney General and previous White House Counsel asked John Ashcroft (then Attorney General) to continue when he was in the hospital. Attorney General Gonzales has testified under oath that there was never any major controversy in the DOJ over the wiretapping and that the discussion with then Attorney General Ashcroft was about other security matters, potentially this data mining operation. As The Economist says “And perhaps Mr Gonzales is merely a weasel and not a perjurer” (this week’s ed).

This final story does not actually mention metadata it mentions how aid agencies are using online databases of people’s names and locations to allow families separated by disasters to find each other. This is an incredible article talking about the shifting relationship between donor and victim and how technology allows people in aid-needing countries to ask for the aid they need. A few cool quotes:

“Télécoms sans Frontières (TSF), a French voluntary agency (total staff: a dozen), goes in with the UN team that does the first needs-assessment in the hours after disaster strikes.”

“The Tsunami Evaluation Coalition, a group of agencies bent on learning from past mistakes, notes that “local people themselves provided almost all immediate life-saving action and the early-emergency support, as is commonly the case in disasters.””

“Family remittances are already a bigger source of transfers to poor countries than government aid.”

I wonder if TSF wants an intern? Just kidding, I need college. But seriously, I think it’s amazing that the same technology which can be used to track digitized books and DVDs can be used to track everyone from innocent civilians to innocent victims of disasters. It’s all in how that metadata is used. But wouldn’t it rock to program for the UN? On that note, here’s my goodnight quote:

“No distance of place or lapse of time can lessen the love of those who are thoroughly persuaded of each other’s worth.” Robert Southey

30 July, 2007

Feeling Elephants: the title explained. Also, a selection of tech vocabulary–not for the non-geeky!

Filed under: DRM, Metadata, copyright, news, open source, politics-tech, workflow — feelingelephants @ 9:15 pm

This summer I had my first non-family non-babysitting actually-being-paid-with-money kind of job. One of the things I realized is that when I am learning about something new (jbpm (java business process management), java, whatever) I spend a great deal of time getting detailed knowloedge of only one aspect of it. This reminded me of the old story about the blind men and the elephant. See below for pretty shiny hyperlinks.

I wasted a good part of my day with this. It is my vocabulary list for my job, as a software developer. I found dozens of definitions for each of these so please tell me if I just described the elephant’s tail in detail but missed it’s trunk or foot. For an explanation of this metaphor see here.

For a less clinical description, see here.

These are a mix of jargon I knew and jargon I’m learning. Most people don’t care what DSL stands for, but knowing the technical definition helps a true understanding.

And now, for the tech vocab:

CVS: Concurrent Versioning System. CVS is an open source version control and collaboration system.

component: [Definition quoted from the CCA Forum] A component is a software object, meant to interact with other components, encapsulating certain functionality or a set of functionalities. A component has a clearly defined interface and conforms to a prescribed behavior common to all components within an architecture. Multiple components may be composed to build other components.

beans: A collection of Java components

ide: Integrated Development Environment.

jbpm: java business process management

server: A process that runs on a host that relays information to a client upon the client sending it a request. Servers come in many forms: application servers, web servers, database servers, and so forth. All IP-based servers can be load balanced. See Web Server.

SDR: Stanford Digital Repository

execute: To perform a data processing operation described by an instruction or a program.

sql: Structured Query Language (SQL), pronounced “sequel”, is a language that provides an interface to relational database systems. It was developed by IBM in the 1970s for use in System R. SQL is a de facto standard, as well as an ISO and ANSI standard.

CGI: common gateway interface

Perl:(Short for Practical Extraction and Report Language), is a programming language specifically designed for processing text, and because of this trait is one of the most popular languages for writing CGI scripts. note from me: this is acutally wrong. On more research I found that Perl was just a name the creator came up with and liked and then defined. go figure.

W3C: The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for information, commerce, communication, and collective understanding.

Schema: A schema is the set of objects (tables, views, indexes, etc) belonging to an account. It is often used as another way to refer to an Oracle account. The CREATE SCHEMA statement lets one specify (in a single SQL statement) all data and privilege definitions for a new schema. One can also add definitions to the schema later using DDL statements.

XML or here: (eXtensible Markup Language) A widely used system for defining data formats. XML provides a very rich system to define complex documents and data structures such as invoices, molecular data, news feeds, glossaries, inventory descriptions, real estate properties, etc. As long as a programmer has the XML definition for a collection of data (often called a “schema”) then they can create a program to reliably process any data formatted according to those rules. Or: Extensible markup language; a markup language for documents and data structures such as invoices, molecular data, news feeds, glossaries, inventory descriptions, real estate properties, etc. As long as a programmer has the XML definition for a collection of data (often called a “schema”) then they can create a program to reliably process any data formatted according to those rules.

JMX: Java Management Extensions or JMX is a Java technology that supplies tools for managing and monitoring applications, system objects, devices (e.g. printers) and service oriented networks. An interesting detail of the API is that classes can be dynamically constructed and changed.

API or here: Application Programming Interface. In the world of software, APIs are structured abstraction layers between the gory details of an individual application, operating system or hardware item and the world outside that software or hardware. Or: A formalized set of software calls and routines that can be referenced by an application program in order to access supporting system or network services.

UI: User Interface. The user interface of a program is the part of it with which a user (person) interacts, such as a menu, button or toolbar. Mozilla’s user interface is often referred to as the Chrome.

DIP: Dissemination Information Package-the means by with information in a digital archive is conveyed to a user of the archive. The term comes from the Open Archives Information System model.

DSL: Digital Subscriber Line is a technology for bringing high-bandwidth information to homes and small businesses over ordinary copper telephone lines. A DSL line can carry both data and voice signals and the data part of the line is continuously connected.

bre: business rules engine

lisp or here: (which stands for “LISt Processing”) is a programming language oriented towards functional programming. Its prominent features include prefix-notation syntax, dynamic typing (variables are type-neutral, but values have implicit type), and the ability to treat source code as first-class objects. Or: List Processing Language — A high-level computer language invented by Professor John McCarthy in 1961 to support research into computer based logic, logical reasoning, and artificial intelligence. It was the first symbolic (as opposed to numeric) computer processing language.

LAS: Log ASCII Standard (file format)

Stub: A temporary implementaion of part of a program for debugging purposes.

jpeg2000: JPEG 2000 is a wavelet-based image compression standard. It was created by the Joint Photographic Experts Group committee with the intention of superseding their original discrete cosine transform-based JPEG standard. The usual file extension is .jp2.

pointers: In computer science, a pointer is a programming language datatype whose value is used to refer to (”points to”) another value stored elsewhere in the computer memory. Obtaining the value that a pointer refers to is called dereferencing the pointer. A pointer is a simple implementation of the general reference datatype, although it is quite different from the facility referred to as a reference in C++.

metadata: Data about other data, commonly divided into descriptive metadata such as bibliographic information, structural metadata about formats and structures, and administrative metadata, which is used to manage information.

mets: a standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. METS is being developed by the Digital Library Federation (DLF) and is maintained by the Library of Congress.

abstraction: In computer science, abstraction is a mechanism and practice to reduce and factor out details so that one can focus on few concepts at a time. It is by analogy with abstraction in mathematics. The mathematical technique of abstraction begins with mathematical definitions; this has the fortunate effect of finessing some of the vexing philosophical issues of abstraction.

EDI: (Electronic Data Interchange) This is a set of computer interchange standards for business documents such as invoices, bills, and purchase orders. or here. The inter-organizational, computer-to-computer exchange of structured information in a standard, machine-processable format.

mapping: A process of matching a Client to a specific Console system, so that it cannot be controlled by another Console system with unauthorized access.
or here. It is the association of data field contents from an internal computer system to the field contents in the EDI standard being used. The same mapping takes place in reverse during the receipt of an EDI document.

relational database: (1) A data structure organized so that it is perceived by its users as a collection of tables. (2) A database that is organized and accessed according to relations. T. A relational database has the flexibility to generate new tables from existing records that meet specified criteria.

domain model: “The domain model should serve as a unified, definitive source of reference when ambiguities arise in the analysis of problems or later during the implementation of reusable components, a repository of the shared knowledge for teaching and communications, and a specification to the implementer of reusable components. …

Object-oriented: Programming languages and techniques where data carries with itself the “methods” (also known as “functions”) used to handle that data. An OO programmer, for instance, can write a statement such as “object.print()” without having to be concerned about what kind of object will be involved at “run time” or what its printing method is. Object-oriented code is both more flexible and more organized, so it is far easier to write, read, and change than procedural code. …

Hibernate or here: Hibernate is an Object-relational mapping (ORM) solution for the Java language. It is free, open source software that is distributed under the LGPL. Hibernate was developed by a team of Java software developers around the world. It provides an easy to use framework for mapping an object-oriented domain model to a traditional relational database.

JMX: Java Management Extensions or JMX is a Java technology that supplies tools for managing and monitoring applications, system objects, devices (e.g. printers) and service oriented networks. An interesting detail of the API is that classes can be dynamically constructed and changed.

dtd: Document Type Definition file that specifies how elements inside an XML document should relate to each other. It provides “grammar” rules for an XML document and each of its elements. DLESE’s metadata records are XML documents.
www.dlese.org/documents/glossary.html

tei: A project to represent texts in digital form, emphasizing the needs of humanities scholars. Also the DTD used by the program.
www.cs.cornell.edu/wya/DigLib/MS1999/glossary.html

Sandbox: A network or series of networks that are not connected to other networks.
www.krollontrack.com/legalresources/glossary.asp

QC: Quality Control The regulatory process through which we measure actual performance, compare it with standards, and act on the difference. Also sometimes used to distinguish inspection and test activities from other quality activities (see QA: Quality Assurance).


Observer pattern
: The observer pattern is a design pattern used in computer programming to observe the state of an object in a program.

beanshellasynchronous: A type of two-way communication that occurs with a time delay, allowing participants to respond at their own convenience. Literally not synchronous, in other words, not at the same time. Example of an application of asynchronous communication is electronic bulletin board.

beanshell: BeanShell is a Java scripting language, invented by Pat Niemeyer. It runs in the Java Runtime Environment (JRE) and utilizes Java’s own syntax.

Blog at WordPress.com.