| home | specifications | WIJIS URIs | gateway | CDCL | GJXDM example | warrants/po exchange | wijis articles |
During the CDCL design effort, it became clear, fairly early in the process, that the requirement for Boolean aggregation of atomic rule particles would cause a lot of grief for semitechnical users. (Face it: Boolean logic has caused confusion even among professional programmers. Not even the least logically challenged among programmers wouldn't not be reluctant to deny that there isn't doubt about likelihood of logical reversals of sense when one is not careful not to avoid non-confusing Boolean constructs.)
There are several features of formal Boolean expressions that are likely to cause a semitechnical user to feel lost, but the most forbidding is probably the parentheses that delimit logical blocks. Parentheses are visually confusing, dense, and worst of all they look like math. It's also kind of difficult to convey, to a nontechnical user, the topology of parenthetic notation:
"See, you have this concept of a matched pair of parentheses, which means one opening parenthesis, followed by some content, followed by the closing parenthesis. Okay?"
"Uh, sure. I guess."
"Now. That content inside? it can only contain other matched pairs, or no parentheses at all. If it contains any parenthesis without a matching partner, we're only looking at some kind of mismatched fragment, and we can't say that our two parentheses on the right and left ends are a matched pair, because they're not."
"Whatever you say."
"Yeah now this is important, because the content of every matched pair has to resolve to true or false. And you have to figure out which it is before you apply any adjacent operator."
"...
"...
"...Look, is this math? It looks like algebra.""It's not algebra, it's boolean algebra. That's not the same --"
"AAAAAAAAAAHHHHHHH! It's math! It's math! Get it off me get it off me! Math!"
Kidding aside: Boolean Algebra, which is a terse, analytically rigorous form of notation convenient for mathematicians and logicians, simply doesn't play well amongst nontechnical users. We need to find an alternative notation form that can suit our needs.
So: What would be the characteristics of a "suitable notation form"?
Well, well. What do you know?
That bulleted list can be represented as Boolean Algebra:
A (B (C + D))
.
Could that possibly work in reverse? i.e., could a bulleted list express
any Boolean Algebra expression, in general?
OK, since the title of the page (as well as the name of the notation system itself) really gives it away, there's little use trying to maintain dramatic tension. Booliette is the name of a system of notation, designed to be usable and understandable by nontechnical people, that is premised on the idea that
Given sufficient conventions, a bulleted list, with indented sublists, can be used to represent any Boolean expression.
In other words, appropriate use of bulleted lists with indented sublists satisfies A
in the requirements list above. Proof of the soundness of that central idea awaits
rigorous mathematical treatment. But, we're not going to let that slow us down.
Let's retry our earlier thought experiment, but with Booliette instead of parenthesized Boolean Algebra.
"In Booliette there is one simple rule: every bullet point comes up true or false.".
"What does that mean, 'comes up true or false'?""
"It means that at every bullet point, you have some information; and a statement about that information; and you can tell if the statement is true or false."
"So... no matter what bullet point it is, you can say it's true or it's false."
"Right."
"Okay, got it. Every bullet is true or false."
"OK, now: there are two kinds of bullets. There's the kind with a sublist, and the kind without a sublist."
"Wait wait. How can I tell what's a sublist?"
"It's, um, indented."
"Oh. D'oh. I use those all the time. Sorry, go on."
"Now, bullets without a sublist have to be true or false entirely on their own. They don't need any information from any other bullet. Whatever they say, is whatever it is.
"But for the bullets with a sublist, it's the other way around: Whether they come up true or false depends on the bullets in their sublist. The bullet with a sublist can only state conditions about its sublist."
"Hey, you're losing me here...conditions about its sublist? Like what?"
"Like, 'all of my sublist's bullets must be true', or 'at least one of them must be true', or 'none of them can be true', or --"
"Ah, ok, I get it. Because if what it's saying about its sublist is true, then the bullet itself comes up true."
"Yes yes, right, that's it exactly!"
"And... this does what for me, exactly?
"Well, every Booliette expression has just one bullet at the top level: everything else is in a sublist, or a sub-sublist, or whatever. All of those bullets underneath represent logic statements - rules, if you want - and then it all feeds up to whether that top bullet comes up true or false. Even if the rules are complicated, you can always work it out."
"And that's all there is to it?"
"Well, yeah, basically. See, bullets without sublists are the equivalent of Boolean variables, and bullets with sublists are the equivalent of Boolean operators, and the sublists themselves are like parenthesized--"
"AAAAAAAAAAHHHHHHH! It's math! It's math! Get it off me get it off me! Math!"
These examples prove nothing, of course. They are only rhetorical demonstrations of the very real advantage in comprehensibility of Booliette over Boolean Algebra as an expression of complex Boolean propositions.
Booliette uses characters from the 7-bit ASCII set. Source code written or generated by any Booliette editor may contain only the following characters:
| dec | hex | description |
|---|---|---|
| 9 | 9 | tab |
| 10 | A | linefeed |
| 13 | D | carriage return |
| 32-126 | 20-7E | printing ASCII characters |
Booliette uses indentation levels, rather than block delimiters, to demarcate sublists. In the spirit of trying not to reinvent any unnecessary wheels, Booliette follows the lexical conventions from Python (a similarly indentation-oriented language) with respect to defining lines and standardizing indentation treatment. In particular, the following topics from the Python specification should be considered definitive for Booliette:
2.1.1 Logical lines
2.1.2 Physical lines (except for the description of
"embedding Python" which is inapplicable)
2.1.5 Explicit line joining
2.1.5 Implicit line joining (although
we are not yet certain whether this applies to block delimiters () and {};
we haven't worked out comments yet; and the reference to a "triple-quoted line" is not applicable.)
2.1.8 Indentation (except for the references to
"Formfeed characters", i.e. ASCII Decimal 12, which are prohibited in Booliette code)
2.1.9 Whitespace between tokens
(again, except for the references to "Formfeed characters")
Further parts of the Python specification may also be adopted as part of the Booliette lexical definition. If so, they will be posted here.
Every logical line in Booliette is a bullet. A bullet takes the general lexical form
(indent-whitespace) * (whitespace) (proper content)
Where indent-whitespace establishes level of indentation, the asterisk
character literal * (ASCII decimal 42, hex 2A) represents the bullet itself, and proper content
is all characters, from the first non-whitespace character after the bullet, up to the
end of the line of code (i.e., the first unescaped end-of-line sequence).
Note that the bullet representation is not a logical necessity, since a bullet and a line are the same thing in Booliette. It's included as a syntactic requirement because:
Every bullet point has two semantic characteristics:
true or false.
Booliette recognizes two kinds of bullet point:
| common name | logical name | description | Boolean equivalent |
|---|---|---|---|
| standalone | particle |
The standalone is the kind of bullet without a sublist.
It evaluates as true or false based
on information stated within the bullet's proper content.
|
a Boolean variable |
| sublist | operator |
A sublist is the kind of bullet that does have a sublist.
Its proper content is a statement about the occurrence of truth or falsity
concerning information based on its sublist bullets. It evaluates as true
if that statement is determined to be a truthful expression.
|
a Boolean operator |
A standalone bullet's proper content can take one of three general forms:
* key word
* left-hand value (whitespace) argument (whitespace) right-hand value
* protocol [ protocol-specific part ]
And a sublist bullet's proper content is a "sublist Boolean assertion", which is often equivalent to a Boolean operator. A sublist bullet precedes an indented list of bullets. It takes the general form
* sublist boolean assertion
* bullet
* bullet
* bullet
* ...
Propositions are dynamically evaluated to either true or false or, under exceptional circumstances, may generate error(s). Multiple propositions or complex propositions may be organized by conjunctions (from a grammar model) that are, in fact, n-place Boolean logical operators (from a mathematical/logical model) through use of "sub-lists" and the various kinds of bullet points.
A proposition appears like natural language and consists of an argument relating two subjects, where one subject exists in the proposition's left-hand value and the other in the right-hand value.
An example of a proposition could appear like:
* today is-later-than '7/4/1776'
where "today" is the left-hand value, "7/4/1776" is the right-hand value, and "is-later-than" is the
argument relating the two. This proposition, were it to be included in policy, would be dynamically
evaluated to determine the applicability for the disclosure rule in which it existed.
A more complex example illustrating use of a sublist bullet together with proposition form standalone
bullets (i.e. a proposition containing a Boolean operator -- a "conjunction") could appear like:
* all-true
* today is-later-than '7/4/1776'
* today has-semantic weekday
This is to say "Today is later than 7/4/1776, AND today has semantic weekday
(i.e. today is a weekday)."
A left-hand value or right-hand value may be a literal, quoted string, such as '7/4/1776', using matched pairs of either apostrophes (i.e. single quotation marks) or double quotation marks. Nothing within the matched pair of marks is prohibited as long as it is a part of the supported character set for literal strings. Alternatively, a left-hand value or right-hand value may be a dynamically evaluated reference.
A dynamic reference takes the form (independent of sequence as long as whitespace is properly handled)
(optional possessive determiner and whitespace) subject (optional whitespace and manipulator, repeated)
The reserved words for possessive determiners and for manipulators may be included in the dynamic reference but are dependent on the nature of the subject. For example, a possessive determiner may be included if the subject derives from user context, and a manipulator may be included if the subject is of a type that corresponds to the manipulator. If both a possessive determiner and one or more manipulators are present in either a left-hand value or right-hand value, then the possessive determiner acts first upon the subject followed by the manipulators acting in any order upon the subject. For example, "any-user's years-of-service plus-4" would be evaluated by considering first any-user's years-of-service. But of course, before feeding the value to the (unstated) argument of the proposition, the manipulator of plus-4 would be accomplished. So, once the individual's years-of-service has been found, four is added to it via the manipulator, and then it is fed to the argument. This recurs until the argument and the possessive determiner are satisfied or until the inspection ultimately exhausts all possibilities. Next we discuss the expectation of the commutative behavior of coincidentally-applied manipulators.
No more than one manipulator of a given, specific word is permitted to exist in either a left-hand value or right-hand value. Yet multiple, different manipulator words may be used, and if so, the entire set of manipulator words used must be commutative because there is no guarantee of the order of processing. For example, one should refrain from mixing multiplication manipulators with addition manipulators. Also, manipulators may be subject-specific. For example,
| prohibited: | identity-full-name plus-3-minutes | the subject and manipulator do not correspond in type |
|---|---|---|
| prohibited: | now plus-3-minutes plus-7-minutes | we seek to avoid unlimited occurrences of the same manipulator word |
| permitted: | now plus-12-minutes minus-2-minutes | OK, since two different manipulator words are used |
| permitted: | 3-days-ahead-from-now plus-30-minutes | should be obvious |
A subject should have value, semantic meaning for that value, and value type. A subject value has a scalar value part and an optionally defined unit part. The scalar value part is a byte-stream, and the unit part is a string. When the unit part is defined, the subject as a whole is considered a "measure", such as fifteen dollars (US$15) or three kilometers (3 km). When the unit part is not defined, the subject as a whole is considered a byte-stream.
The proposition relies on its particular argument's implementation to interpret the proposition's subjects. For example, one argument may use and interpret a subject's semantics while a different argument may use and interpret a subject's measure.
Irrespective of whether the subject is a measure or a byte-stream, a subject may or may not be a container for other subjects. Because the subject's scalar value is atomic and is not interpreted as anything more than a character set encoded byte-stream, the values and scalars of subjects are never natively considered as multivalued (although a non-native, third party implemented call-out of IsRelatedTo may do whatever it wishes). Certainly, multiple instances of a subject may exist within a given context. A hypothetical example is multiple subjects of "authenticated-session-start", one for each of several authenticated users possibly found within the user context. In other words, a subject may contain multiple values as independent subjects but does not exhibit multiple values or multiple scalars itself. For example, a flag does not exhibit a color of three values (red, white, blue), but rather the flag ought to contain a collection of its own "color subjects": color with value (red), color with value (white), and color with value (blue). In order to handle sets of data, where one may treat multiple contained subjects as a single entity, such as treating a flag as though it exhibits color that is a set of three values (red, white, blue), refer to the result set bullet. In such a circumstance, this hypothetical flag's set of colors, were it to be dynamically referenced within a proposition bullet, would exhibit a value of [red, white, blue] (independent of order), which may then be compared with other sets. So, the result set bullet will often be preferable for data like this.
If the Gatepoint discovered in the present document a flag exhibiting a single color subject with value of the literal string "red, white, blue" (actually, in whatever order as presented in the document), then this single value is preserved and made available when ever it may be referenced within a proposition. It is not considered a set of three colors; it is a single color in this case called "red, white, blue". The party that generated the document is responsible for ensuring data is properly formatted or characterized. A rulesheet author is responsible for ensuring that written policy is comprehensive to cover disclosure control under many circumstances, including prerogatives exercised by parties generating documents in selection of formatting or characterization schemes for their data.
To illustrate what had been covered thus far, here is an example of a subject from the user context: authentication-level-of-assurance And here is an example of using a possessive determiner with that same subject: every-recipient-user's authentication-level-of-assurance And now a full proposition with this subject:
* every-recipient-user's authentication-level-of-assurance has-value "Level_4"
Possessive determiners allow the proposition to specify how multiple
instances of a subject within a given context shall be interpreted.
Another example of a subject:
client-date-and-time
This same subject with a manipulator:
client-date-and-time minus-5-minutes
And a full proposition:
* client-date-and-time minus-5-minutes not-is-earlier-than now
or alternatively as:
* now not-is-later-than client-date-and-time minus-5-minutes
These standalone bullets are assertions formulated in two parts:
We've got plenty of unresolved questions about the particular nature of the protocol-specific part. In particular: the distinction between opaque and semantically structured types of entities. (This may end up being resolved by something similar to the syntax defined in the URI Generic Syntax reference from the Network Working Group: there are several possible formulations for parts of URIs, of varying degrees of opacity to Booliette's Syntax.)
A sublist bullet requires accompaniment of an indented list of one to many bullets. The constraints on the quantity and kind of bullets in the sublist depends on the containing sublist bullet's implementation. A sublist bullet takes one form, and that is an assertion expression:
* assertion key word
An example is the equivalent of the logical operator AND.
* all-true
* bullet
* ...
Another example is the equivalent of the logical operator OR.
* any-true
* bullet
* ...
For more examples, see the next section on result set assertions
or see sublist boolean assertions.
Result set assertions come in two groups: set builders and set operators.
Set builders come in two varieties: intensional and extensional. Extensional set builders are a topic reserved for the future, but it's currently thought they may take a form similar to that of a result set bullet (see historic note below). Herein, the discussion will concern exclusively the intensional set builder. Some intensional set builders can be expressed as a valid kind of proposition form stand-alone bullet.
Set operators come in two varieties: order-independent and order-dependent. Order-dependent set operators are a topic reserved for the future. Herein, the discussion will concern exclusively the order-independent set operators. Some set operators can be expressed as a valid kind of sublist bullet.
Intensional Set Builder An intensional set builder is a way to create a set by means of satisfying membership criteria. This is opposed to an extensional set builder, which by definition creates a set by means of explicitly specifying each member (i.e. by listing the members). An intensional set builder kind of result set assertion bullet follows the proposition form of a stand-alone bullet. To illustrate:
* left-hand value (whitespace) argument (whitespace) right-hand value
As is the case of any proposition form stand-alone bullet, the constraints on the left-hand and right-hand values
are dependent on the implementation of the given argument in use. In the case of intensional set builders,
the constraints are that one side's value be a literal string or a dynamic reference node and that the other side's
value be only a dynamic reference node. The following example intensionally builds a set of all the nodes
found within the present document where the nodes satisfy set membership by having type currency:
* "currency" being-type-of-elements-within present-document
If the argument's implementation fails to satisfy the assertion, in our example that there are elements within
the present document of type "currency", then the bullet returns false. It also happens that the argument's
implementation created a set, albeit an empty set, in this case. On the contrary, if a non-empty set were created,
then the bullet returns true.
Order-independent Set Operator An order-independent set operator is one which cares not for any declared order in which its operands exist (e.g. the top-down appearance in authoring form of bullets in its sublist). There are three order-independent operator families:
* these-have-no-value-in-common
Another example,
* these-have-no-type-in-common
And yet another,
* these-have-a-meaning-in-common
And still another,
* these-have-every-caption-in-common
And here is an example of the use of a set operator and two set builders:
* these-have-no-value-in-common
* check-kiter being-meaning-of-elements-within present-document
* "fullName" being-caption-of-elements-within identity-attributes
At this point, we're not sure whether this third, supplemental bullet type is a good idea or not. It may not be a logical necessity, because any proposition expressible by this kind of bullet is also expressible by the other kinds. However: expressing it those other ways is cumbersome and error-prone, so we're considering this as a user-friendliness measure. It's possible that references to a set, say when a data node exhibits a set as its value, may require this kind of bullet.
The basic concept of a resultset bullet is that it is like a specialized standalone bullet, with an assertion in its proper content. And it is also like a sublist bullet, because that assertion describes a condition applying to a nested list. However, this nested list is not a bullet list (i.e., not a "sublist" as such). It's a list of non-bullet data representations. Ordinarily, these data items would be compared to elements in the result of a query against a resource; hence the name "resultset" for this type of bullet.
Lexically, the resultset is represented one item per logical line. Each line
is prefaced by indent whitespace and then (instead of an asterisk, as would be
the case for a bullet), a single minus sign "-" (ascii decimal 45,
hex 2D), and another whitespace character. Everything from the first subsequent
non-whitespace character to the first unescaped end-of-line sequence is the
value of the item.
A resultset bullet would therefore take the general form
* (generation of the resultset) (assertion about the resultset):
- item
- item
- item
- ...
* at-least-one-true:
* bullet
* bullet
* ...
* all-true:
* bullet
* bullet
* ...
* exactly-one-true:
* bullet
* bullet
* ...
* none-true:
* bullet
* bullet
* ...
One of the goals of Booliette notation is to provide easily readable and understandable statements of logical requirements. One impediment to achieving this is the permissibility of including visually awkward content within the body of standalone or resultset bullets. For example:
It's very much in our interest to create a mechanism to remove these elsewhere, so that the structure of bullets is clearly understood. The most straightforward mechanism is aliasing.
An alias is a statement that a short, convenient, visually unobtrusive identifier is to be substituted for something less tractable. An alias is not a "variable" assignment in any sense. (Note, though, that programmers working in languages that do permit variable assignment sometimes use the alias mechanism to achieve some of the same advantages. Within Booliette, the proposition form of bullet points includes the concept of a variable known as a subject pronoun (keyword "that-item", "whilst", or "ibid" are synonyms for each other). But back to the topic at hand.) Aliasing
An alias declaration takes the general form
alias:(alias identifier){(referent)}
in which
alias: is a literal character sequence
the (alias identifier) is a character sequence containing
only lowercase letters, digits, or underscores, which is (rather arbitrarily)
limited to thirty characters in length. This length limitation is far too generous
for best practice: 2 to 10 characters make the most visually simple aliases.
But there may be some situations in which practitioners can get value out of long
aliases... time will tell.
And, the prohibition on uppercase letters in the alias is included simply to make the alias less visually "noisy". We may relax that.
(referent) is always enclosed in opening and closing curly
braces (ASCII decimal 123 and 125, hex 7B and 7D respectively);
the content may include any characters permitted in Booliette
except the following sequence:
} (optional-whitespace)(end-of-line)
where the closing curly brace is followed by zero or more whitespace
characters, then one of the end-of-line sequences accepted by the Booliette Grammar.
This sequence is prohibited because it is identical to the closing syntax of the
referent.
void and null
In order to deal with certain authoring and processing situations, Booliette defines two
supplemental, or quasi-logical, states
a bullet may assume, besides the pure logical states of true and
false. These states are void and null.
A bullet can be assigned a value of void. This is a comparatively simple
state: it simply means "this bullet is ignored when evaluating the Booliette
expression of which it is a part".
Examples of void bullets:
plain,
denoting that they are a non-processable ("plain language") description of
a logical condition, would be void
void bullets) in its sublist.
null is a more difficult (and possibly logically flawed -
we await rigorous mathematical treatment on this) state. Conceptually, it is the situation
that occurs during processing, when the order of logical evaluation of bullets
prescribed for Booliette requires evaluation of a bullet that cannot, for whatever reason, be
resolved to true or false at the moment.
In such a case, the bullet may be assigned a logical value of null, and
processing may continue, if possible, in the evaluation of other bullets. There are two
immediately obvious cases in which this would be useful:
null
value to indicate a "pending" state.
null may not prove to be rigorously usable, in which case it would need to be
discarded. However, superficial analysis does yield a plausible truth table of Boolean
operations defined for null values. It's not overwhelmingly useful, but it
does appear to be logically consistent:
expression containing null |
resolves to |
|---|---|
| null AND true | null |
| null AND false | false (this is a logically demonstrable case in which
the value of the null variable can make no difference to the value
of the overall expression) |
| null OR true | true (this is a logically demonstrable case in which
the value of the null variable can make no difference to the value
of the overall expression) |
| null OR false | null |
| null XOR true | null |
| null XOR false | null |
| NOT null | null |