Semantic web as future reality

This entry at Fred on Something neatly summarises my painful experiences while reading the W3 specs and assorted tutorial this weekend:

The thing is that RDF is not intended to be easily understood by humans like simple XML documents. RDF is intended to be understood by machines.

However, I still think the lack of accessibility of the W3 specs is a big problem. The XML spec is reasonably accessible. Even the XML Schema spec is. I can sit down with them, read them, and start writing a parser. Granted, it wouldn't be a very good parser if I didn't know more than I'd learned from a single reading of the specs, but I'd be able to.

It's less important that the formats are inaccessible if the specs are easily accessible so we get good tools to deal with them.

Nobody cares that Postscript is painfully obtuse to read in a text editor, and that doing so won't really tell you much about the document it describes, because we have good tools to manipulate postscript files and few of us need to interpret the files directly.

However the RDF and OWL specs are painfully dense, and painfully fluffy and full of mathematical terms that for me and most software engineers I know reads as mostly nonsense.

This massively complicates the issue of getting good tools to work with it, and at this early stage even makes it hard to get people to understand the potentials of the technology.

I'm sure these specs represent great work, but it could have been so much better if more effort had been put into 1) examples and 2) presenting the normative semantics by specifying the intended effects in terms of observable effects on the RDF graph, or conceptual addition of RDF triples (even if the implementation wouldn't necessarily have to store these triples).

The triples aren't hard to understand. The RDF graph isn't hard to understand. The bloody description ohe OWL semantics IS.

I wish the W3 would take some cues from ECMA, and do what ECMA did for ECMA 262 (the ECMAScript / Javascript specification), where the document specifies the semantics of the language by presenting expected results in terms of code rather than abstract mathematical terms.

Personally I have this intense hate for these kinds of specs as they're hardly ever needed.

I have no problems understanding how to implement a backpropagation neural network, for instance. However that is thanks to plain English or pseudo code descriptions of the algorithms involved. If somebody tried showing me a mathematical representation of it I'd glaze over instantly.

I've yet to see a single example of something presented in this kind of notation that isn't possible to do just as well in natural language, and that will be significantly more accessible to a significantly larger audience.

If you want to win the Nobel Prize in maths then accessibility to the general public isn't needed as long as other leading scientists understand you. If you try to write specifications with the goal of transforming the web, which became successful largely exactly because it was accessible and anybody could easily understand how to make use of the technology, it is.

Hello Mr Hokstad,

You are completely right. It's sure that if the base specification's document is not readable and understandable, we are in troubles. Personally i was talking about the resulting code, but the problem you rise is much more important. Every future softwares will be based on these documents.

The problem of documentation is deep in computer sciences. It's why there are vulgarization contest in universities. Another problem is that many people think that putting useless mathematical formulas or incomprehensible words is essential to have credibility. In the present case, it is not. You are right: these technologies need to be understand by the average computer scientist or hobbyist. It's essential to spread the good news and encourage them to develop software that use it.

I hope that people will write code example and clarified documents about these new technologies. Personally I would have some problems to do it considering my current English writing but I'm sure some will.

Salutations,

Fred

Fred,

Thanks for taking the time to comment. I agree with you that the code is complex too. I think the main thing is that if the code/markup is simple, then it doesn't matter as much if the specification is hard to read and the other way around... The problem with things like RDF and OWL for instance is that both are hard.

I'm hoping to spend some time trying to understand the technologies and writing some tools, but I know my time is way too limited to make a major impact. I've gotten part way through an N3 parser, though, and that's helping me understand a lot of the technologies.

I don't neccesarily think the mathematical formulas used in many computer science papers are "useless" as such, but I aree that for the average practitioner of software engineer - as opposed to someone researching computer science - they are generally a hindrance rather than an advantage.

I think you're right that part of it is an attempt to gain credibility, but I also think a lot of it is a result of a gap between computer scientists and software engineers - that is, people who pursue computer science as a research subject are far more likely to have substantial experience with maths, and often find it easy to throw in a formula when a sentence in a natural language would do just as well and reach more readers.

It's sad to see this percolate down into specifications for important technologies though. I think perhaps a key reason for this is the restricted membership of the W3 Consortium. Contrary to for instance the IETF which is a volunteer organization, the W3 is based on paid membership mostly targetted at corporations.

This naturally limits the people participating significantly, and weeds out a lot of useful corrective that you get from having a membership that represents the average software engineer.

Vidar

Try reading the InfiniBand specs for a bit: It's even worse. Almost 2000 pdf pages of mostly unreadable data. :D

Hello Mr. Hokstad,

Yeah, I have a formation as a computer scientist and math formulas are essential to get rid of ambiguities (it's, for example, one of the goals of formal specification of softwares).

But the current problem is that the technology we are discussing about is intended to be used by anybody, computer scientists or not. It's why we will need to write both scientific and vulgarized articles. The first to erase ambiguities and to have formal foundations; the second to be understood and implemented in softwares of any kind.

Salutations,

Fred

I really don't agree that math formulas are essential to get rid of ambiguities. In my experience, math formulas create ambiguities in software specifications more than get rid of them, as they are rarely used correctly and even when they are, they are rarely understood correctly by whomever read them.

If you want to get rid of ambiguities, you need to use language that is simple enough that people both understand it at write it correctly.

As an example, at a previous job I did have a developer that did write his specs using lots of maths, and his specs were useless. Partly because they were not verbose enough - the formulas were correct, but without a well defined context (and it would be close to impossible to define one using purely maths) they could be interpreted in infinitely many ways, but more importantly most of the other developers did not know how to interpret them correctly.

A good specification is verbose: It describes the desired outcome from several different perspectives that can be trivially checked against eachother, and that can be communicated to both developers and "customers"/end users.

Using maths for this is counter productive, even if your audience are all well educated in maths.

Personally I believe that the moment you resort to maths in software specification, it is an indication that you don't understand the problem well enough to know how to describe it in natural language, and you're "running away" from the problem by resorting to a language that is restricted to covering certain aspects of the solution.

It might be that I'm overly cynical, but after reading countless computer science research papers, I've yet to see a single work even there where anything beyond fairly basic maths would have been necessary.

I'm sure there are sub fields outside my sphere of influence where it may be different, such as for intance cryptography, but I believe they are in a small minority.

Take the field of AI, for instance. I've read lots of papers full of maths, and I've read papers covering the same subjects that managed perfectly well to explain the same subjects in plain English. The latter were invariably more accessible, and hence provided more value due to a wider audience.

Computer scientists often seem to suffer from some kind of maths fetish, when they have a perfectly adequate alternative that is well understood by their audience: Code.

That is really what annoys me the most - when describing algorithms to be implemented in software, why pick an abstract notation that most software engineers have limited exposure to, instead of picking one of the notations the algorithm is likely to be implemented in in the "real world"?

This isn't about "vulgarizing" anything. It is about learning to communicate efficiently, and is one of the most important skills a researcher should have.

An idea is worthless if people don't understand it well enough to make use of it, and I believe this is one of the key reasons why so much of computer science research sits entirely unused and never gets any interest from industry - there's a huge communication gap between computer scientists and software engineers.

Hello Mr. Hokstad,

What I was talking about was the formal specification of a program. Some formal specification languages have been created, some decades ago, to handle some problems with software development, precisely in the field of critical code. This is not an end in itself, it doesn't answer to all questions and it's not full proof. But it's a good way to define the needs of your clients while checking that your system or algorithms don't fall in undesired states. I'll not do a course on formal software specifications here but check out this whole project firstly specified en Z then coded in C for a Radiation Therapy Machine:

http://staff.washington.edu/jon/z/rationale.html
http://staff.washington.edu/jon/z/machine.html

Read the introduction, it'll tell you a lot (pros and cons) about formal specifications. It's sure that it's not intended for all software projects and in practice you'll not specify everything but, personally, I think it can have his place in software development(specifically in the future).

The problem with natural languages, in an international environment, is that English is not the native language of everybody. By example, many teams of American developers do outsourcing in India. Not all Indian bachelor programmers are perfect bilingual. Formal specification could (and not would) be a partial solution to future problems.

Why mathematical versions of algorithms are essential? Because you can prove them. The whole present system is working because it's proved.

So it's a really interesting discussion with two different points of views. Thank for your interest in it.

Salutations,

Fred

Formal specifications of software can be useful. However my problem with things like Z notation vs. for instance extensive use of units tests and a natural language specification is that Z notation and similar languages aren't the native language of anyone - with a random group of software developers from all over the world, I'd be much more likely to find a reasonable number of people with sufficient English skills than people with reasonable Z notation skills.

In fact, through my career I've met perhaps 3-4 people that might have had sufficient skills in Z to be able to create proper specifications with it.

I wish it wasn't so, and that we could use formal methods in reasonably sized projects, but the technology isn't there.

And this is important, because even though you can prove that the model is consistent, you can't prove that it does what the software developer wants - if the software developer doesn't understand the notation fully you can prove the model as much as you want and still end up with junk.

Personally I'd rather have unit tests - well written unit tests demonstrate in very clear ways that the software does what is expected of it in the tested cases. You're unlikely to get full test coverage, but at least you're likely to be able to get developers that understand them.

Regards,
Vidar

Hello Mr Hokstad,

Yeah, I agree. Formal Specification are just a tool and as with other tools, we need to learn how to use it. What was interesting with the link I mentioned above is that none of the developers knew what was formal specifications and none (except one if I remember right) ever do any Z specifications.

So, it's sure that well done unit tests are inevitable at the moment. It's probably the best tool we have, for anybody, at the moment. It's sure that the technologies of formal specifications need development (specifically the Semi-Demonstration system).

Salutations,

Fred

Semantic web as future reality 2005-03-21