Is Freebase worth much?

There has been some speculation that Freebase is a vehicle for Metaweb to prosper from its semantic web infrastructure when used for commercial purposes.  As I recall, Metaweb raised over $40 million in Series B around the time they started building Freebase. The investment was led by Goldman Sachs.  Metaweb’s seasoned investors were unlikely to invest so much in a business that cannot project a return on that investment.  Almost certainly, Metaweb has firm plans for realizing over $100 million in revenues.  Most likely, for these investors and the amount of capital, target revenues by 2014, five years after the second round, would be in the vicinity of $1 billion.  Obviously, there is a lot of work to get there from around zero today.

Some of the bubble in raising those funds has burst.  The economy would crimp the valuation and investment if made today.  And the semantic web has yet to produce a winner, so with less enthusiasm, the investment would again be less favorable today.  All this is modulo the business plan.  If the business plan withstands scrutiny and the rate of return from credibly achievable projections justifies investment, they could get the money again, even now.  But no one that I have heard or read over the past few years can explain the business plan adequately – that is, concretely.  I would appreciate any insights or opinions on the topic.  I believe these are smart people, in the company and among its investors, so I am sure it is there.  I just don’t believe in the “we’ll figure out how to make money eventually” business plan in this case.

Some Freebase terms that are worth knowing but are commercially reasonable for any site that provides a free service include:

  1. The terms of service are subject to change (upon posting).
  2. The service may be changed or discontinued at any time and without notice.
  3. Limits concerning access to or use of the services may be established.
  4. Any disputes shall be heard in San Francisco and governed by California law.

Continue reading “Is Freebase worth much?”

Extended Enterprise Ontology

In a recent post I mentioned comments by Sir Tim Berners-Lee concerning the overlap between enterprise information models and semantic web ontology supporting the concept of linked data.  Sir Berners-Lee argued that overlap is already sufficient to have a transformative effect on mainstream IT.  I think he is right, but also that we are not there yet.  There are many obstacles to adoption, not the least of which is the inertia of enterprise IT.  Disruptive approaches to software development typically require ten years or so to cross the chasm from visionary and early adopters to the mainstream.  We are only a few years into this and the technology is not ready.

First, let’s establish that there is plenty of semantics available for reuse now.  There are existing models, some of which are well-designed, mature, and widely used.  Unfortunately, most of what exists has little apparent relevance to enterprises.  There is little on this diagram that would draw the attention of an enterprise architect, for example.

Continue reading “Extended Enterprise Ontology”

Time for the next generation of knowledge automation

In preparing for my workshop at the Business Rules Forum in Las Vegas on November 5th, I have focused on the following needs in reasoning about processes, about events, and about or over time:

  1. Reasoning at a point within a [business] process
  2. Reasoning about events that occur over time.
  3. Reasoning about a [business] process (as in deciding what comes next)
  4. Reasoning about and across different states (as in planning)

Enterprise decision management (EDM) addresses the first.  Complex event processing (CEP) is concerned with the second.  In theory, EDM could address the third but it does not in practice.  This third item includes  the issue of governing and defining workflow or event-driven business processes rather than point decisions within such business processes. 

Business applications of rules have not advanced to include the fourth item.  That is to say, business has yet to significantly leverage reasoning or problem solving techniques that are common in artificial intelligence.  For example, artificially intelligent question and answer systems, which are being developed for  the semantic web,  can do more than retrieve data – they perform inference.  Commercial database and business intelligence queries are typically much less intelligent, which presents a number of opportunities that I don’t want to go into here but would happy to discuss with interested parties.  The point here is that business does not use reasoning much at all, let alone to search across the potential ramifications of alternative decisions or courses of action before making or taking one.  Think of playing chess or a soccer-playing robot planning how to advance the ball on goal.  Why shouldn’t business strategies or tactical business decisions benefit from a little simulated look-ahead along with a lot of inference and evaluation?

Even though I have recently become more interested in the fourth of these areas, I expect the audience at the business rules forum to be most interested in the first two points above.  There will also be some who have enough experience with complex business processes, which are common in larger enterprises.  These folks will be interested in the third item.  Only the most advanced applications, such as in biochemical process planning, will be interested in the fourth.  I don’t expect many of them to attend!

The notion of enterprise decision management (EDM) is focused on point decision making within a business process.  For enterprises that are concerned with governing business processes, a model of the process itself must be available to the business rules that govern its operation.  I’ve written elsewhere about the need for an ontology of events and processes in order to effectively integrate business process management (BPM) with business rules.  Here, and in the workshop, I intend to get a little more specific about the requirements, what is lacking in current standards and offerings, and what we’re trying to do about it. Continue reading “Time for the next generation of knowledge automation”

Sir Tim Berners-Lee on Ontology

A panel on whether or not ontology is needed to achieve a collective vision for the semantic web was held on Tuesday at the International Semantic Web Conference (ISWC 2009) near Washington, DC.  For most of the panelists the question was rhetorical.  But there were a few interesting points made, including that machine learning of ontology is one extreme of a spectrum that extends to human authoring of ontology (however authoritative or coordinated).  Nobody on the panel or in the audience felt that the extreme of human authored ontology was viable for the long-term vision of a comprehensively semantic and intelligent web.  It was clear that the panelists believed that machine learning of ontology will substantially enrich and automate ontology construction, although the timeframe was not discussed.  Nonetheless, the subjective opinion that substantial ontology will be acquired automatically within the next decade or so was clear.  There was much discussion about the knowledge being in the data and so on.  The discussion had a bit of the statistics versus logic debate to it.  Generally, the attitude was “get over it” and even Pat Hayes, who gave a well-received talk on Blogic and whom one would expect to take the strict logic side of the argument, pointed out seminal work on combining machine learning and logic in natural language understanding of text.

David Karger of MIT’s AI lab challenged the panel from the audience by asserting that the data people posted on the web is much more important than any ontology that might define what that data means.  This set off a bit of a firestorm.  There was consensus that data itself is critically important, if not central.  For the most part, panelists were aghast at the notion that spreadsheets of data would be useless to computers unless the meaning of its headings, for example, were related to concepts defined by reference to ontology those computers understood. 

With respectful deference, the panel and audience yielded.  Sir Tim Berners-Lee took the floor. Continue reading “Sir Tim Berners-Lee on Ontology”

A Common Upper Ontology for Advanced Placement tests

I have previously written about the lack of a common upper ontology in the semantic web and commercial software markets (e.g., business rules).  For example, the lack of understanding of time limits the intelligence and ease of use of software in business process management (BPM) and complex event processing (CEP).  The lack of understanding of money limits the intelligence and utility of business rules management systems (BRMS) in financial services and the capital markets.   And, more fundamentally, understanding time and money (among other things, such as location, which includes distance) requires a core understanding of amounts.

The core principle here is that software needs to have a common core of understanding that makes sense to most people and across almost every application.  These are the concepts of Pareto’s 80/20 Principle.  A concept like building could easily be out, but concepts like money and time (and whatever it takes to really understand money and time) are in.  Location, including distance, is in.  Luminousity could be out, but probably not if color is in.  Charge and current could be out, but not if electricity or magnetism is in.  The cutoff is less scientific than practical, but what is in has to be deeply consistent and completely rational (i.e., logically rigorous).[2] Continue reading “A Common Upper Ontology for Advanced Placement tests”

Cyc is more than encyclopedic

I had the pleasure of visiting with some fine folks at Cycorp in Austin, Texas recently.  Cycorp is interesting for many reasons, but chiefly because they have expended more effort developing a deeper model of common world knowledge than any other group on the planet.  They are different from current semantic web startups.  Unlike Metaweb‘s Freebase, for example, Cycorp is defining the common sense logic of the world, not just populating databases (which is an unjust simplification of what Freebase is doing, but is proportionally fair when comparing their ontological schemata to Cyc’s knowledge).  Not only does Cyc have the largest and most practical ontology on earth, they have almost incomprehensible numbers of formulas[1] describing the world.   Continue reading “Cyc is more than encyclopedic”

Over $100m in 12 months backs natural language for the semantic web

Radar Networks is accelerating down the path towards the world’s largest body of knowledge about what people care about using Twine to organize their bookmarks.  Unlike social bookmarking sites, Twine uses natural language processing technology to read and categorize people’s bookmarks in a substantial ontology.  Using this ontology, Twine not only organizes their bookmarks intelligently but also facilitates social networking and collaborative filtering that result in more relevant suggestions of others’ bookmarks than other social bookmarking sites can provide.

Twine should rapidly eclipse social bookmarking sites, like Digg and Redditt.  This is no small feat!

The underlying capabilities of Twine present Radar Networks with many other opportunities, too.  Twine could spider out from bookmarks and become a general competitor to Google, as Powerset hopes to become.  Twine could become the semantic web’s Wikipedia, to which Metaweb’s Freebase aspires. Continue reading “Over $100m in 12 months backs natural language for the semantic web”

Oracle should teach Siebel CRM about location and money

Not long ago I posted on the need to understand common concepts well. My example then concerned the need to understand time well enough to answer a question like, “How much did IBM’s earnings increase last quarter?”. Recently, in contemplating some training issues related to the integration of Haley Authority within Siebel, I came across examples phrasings from the documentation on Siebel’s web site, including:

  • if an account’s location contains “CA” then add 50000 in “USD” for the account
  • if an account’s location contains “CA” then add 70000 in “USD” on today for the account

Two things are immediately obvious.

  1. Oracle does not understand location.
  2. Oracle has an interesting, but nonetheless poor understanding of money.

Of course, I am intimately familiar with Authority’s understanding of money. However, Siebel needs more than Authority understands. Continue reading “Oracle should teach Siebel CRM about location and money”

Rules are not enough. Knowledge is core to reuse.

James Taylor’s blog today on rules being core to BPM and SOA in which he discussed reuse had a particularly strong impact on me following a trip yesterday.  During a meeting with the insurance and retail banking practice leaders at a large consulting firm, we looked for synnergies between applications related to investment and applications related to risk.  Of course, during that conversation, we discussed whether operational rules could be usefully shared across these currently siloed areas, but we landed up discussing what they had in common in terms of business concepts, definitions, and fundamental truths or enterprise wide governance.  It was clear to us that this was the most fruitful area to develop core, reusable knowledge assets.

In his post, James agrees with the Butler Group’s statement:

Possibly the most important aspect of a rules repository, certainly in respect of the stated promise of BPM, Service Oriented Architecture (SOA), and BRMS, is the ability for the developer to re-use rules within multiple process deployments.

I have several problems with this statement: Continue reading “Rules are not enough. Knowledge is core to reuse.”