SV: [RDF] Version handling

Jonas Liljegren jonas@liljegren.org
27 Sep 2000 20:32:43 +0200


Stefan Andersson <Stefan.Andersson@ullmans.com> writes:

> > The basic thing we want to do is to filter out previous versions of a
> > statement.  Let's say that I chnage the phone number. The old
> > statement remains but should not normaly be shown.
> 
> Yeah, there is a few 'atomic' filters - 'current valid version (a.k.a.
> latest)' that deserves an extra optimization boost, but (as you know) I
> think it's an interesting idea to think of a filter as a 'virtual model', so
> you have this vast ocean of resources, and look thru the 'latest'
> spectacles, you'll only see the model consisting of all the 'latest'
> resources.

How do you combine this with trust?

Let's say that a untrusted person updates the information. He revokes
the earlier data and create a new version.

If you say: $s->latest->get($person) it would get the person in the
model produced by the latest() filter. If you put trust *AFTER*
latest(): $s->latest->trust($my_list)->get($person) you could end up
with nothing at al.  The latest() filter will only return the latest
triples, but some of those triples may be untrusted.  

You would have to create a combined filter because you want to get the
latest trusted information.  Maby like:

$filtered_person = $s->filter( version => 'latest', trust => $my_list )->get($person)

The returned person must be a filtered person. Subsequent calls to
phone_number must use the same filter. Either that, or you have to
supply the filter to every operation.

I'm considering actualy have you submit a context variable wit every
call.  But that could be done implicitly with a thin uniqe filter
object that encapsulate the real object.  The real object would be
cached and would hold all the data but the encapsuylating thin object
would be new every time and contain the specific context of the call,
to know what to filter out.  The calling object is not enough context.


> > We could base the version handling on statements bound to a point in
> > time.  Every statement belongs to a model and the model has a creation
> > date (and maby should have a last-updated-date (in case for "open
> > models")).
> > 
> > The date information could be used to see what information is the most
> > recent.  But I could have two phone numbers so it wouldn't be right to
> > just exclude one of the numbers if the other is more recent.  The old
> > number must be expired in some way.
> 
> Actually, that's a rather interesting example. It illuminates the point that
> it is not enough with f.ex. 'home phone'. And I still maintain that you
> never 'change' a statement. You revoke an earlier statement, and make a new.
> It kind of solves your problem, doesn't it?

I think so.

But there could be many agents revoking a statement. The person
revoking the statement would often not be the same person that made
the statement.  That means that there is no cheap way to integrate
that status in the system. It would be one full blown statement for
each revoked statement. Or maby you would revoke an entire model.  You
could also collect the revoked statements in a collection. But that
would realy hurt performance.

For the DBI interface: could we catch most cases by assuming that
there will usualy only be at most _one_ revokation per statement?

Maby we could even let the person insert the revokation statement in
the original model and by that only allow the model owner to revoke
the statment?

Think about the case there a statement has been revoked by an
untrusted (anonymous?) agent.  How can we handle that in an efficiant
manner?


> > I just ask for an efficient model for the implementation.
> 
> Well. Based on my work with this kind of models in content managing, I'd say
> that what you need is a 'created' - 'revoked' model, where you'd apply a
> cached filter on what resources are 'created' but not 'revoked', these are
> your current model.
> (Actually, our CM model was a wee bit more complicated, as you had four
> states: 'created', 'updated', 'published', 'revoked' to separate the two
> levels of publishing - the level where only the author saw the changes, and
> the level where everybody could see the changes.)

Nothing stopping you from adding properties like "published".


Now. Thinking about it:  I have planned for many form of distributed
properties. They can be distributed over uri-prefixes, collections or
models.  One soulution could be to distribute the revokation statement
over the intended target.

There are two things in this discussion:
  1. How to represent VC (version control) in RDF::Service
  2. How to reprecent VC in the DBI interface


A requst for a property for a node will trigger the init_props() call
to the involved interfaces.  The type property are separatly handled
by the init_types() call.  I think that this is the most efficient way
to do it.

That means that if you are intrested in any property of a node, all
properties will be retrieved.

This will later be combined with "secondary properties" that will be
initiated one by one, such as the dynamic properties.

Maby we should allow the interface to not set up all the properties.
But then we must have a way to know what has been returned.  Let's say
that the interface only return the latest version. How will the
program know that there are other properties if another session wants
to have all properties?  Should we just make an exception for
versions, or are there a general solution?

Each interface can have it's own solution on how to store/handle
versions. Some may not support versions.  But the DBI interface is
intended to be general purpouse. It could implement revoke statements
by distributing them over the target statments, if there are more than
one.


> > > Actually - one could say 'version' roughly means 'sufficiently
> > > equivalent', that is, 'all those instances could fill roughly the same
> > > function, but with small, maybe critical, differences'.
> > 
> > Ah. Now I think that I understand.
> 
> Versioning points to the fact that you should be able to say 'this instance
> is a sub-instance of this instance' a.k.a. 'version'.

That dcan't be done if all you do is to revoke one statement and
create another.  That doesn't say that the nes statment is a version
of the other one.


> > You can produce two statements about diffrent things, but that
> > contradict each other in some way.  Which of the statements should we
> > follow?
> 
> The trick is to isolate the discrepancy - the very thing I was hoping we
> could implement intelligently in WRAF.

Well. If you have two models; some of the statements in the model
could be about the same thing. It would be up to the constraints of
the object to point aout that there exist a contradiction.  The
contradicting statments could turn up only after some logic machine
has infered new statments from the original.

But this doesn't sau how you will resolve the conflict.  I think that
this is the same problem as in DB replication.  In one way or the
other you choose the one to follow and the one to ignore.  This could
be done by authority, date or by just ask the agent or someone who has
the authority to make the choice.

But that can wait to a later stage.  The question now is how to encode
that one statement or model should be used instead of another (by the
time that choice has been made).

-- 
/ Jonas  -  http://jonas.liljegren.org/myself/en/index.html