quick-link index of major CDCL topics
Authoring Form of CDCL for expressing policy in human readable form,
covering the lexical and logical structure of CDCL rulesheets and rules
The Booliette notation for expressing Boolean propositions, used within Authoring Form
Community involvement and, specifically, the Collation
of policies from custodian community
Decision making process flow when evaluating CDCL policies
Statement of the Problem:
-
Municipal and tribal law enforcement agencies do not have ready access
to free or low cost automated privacy policy solutions.
-
There is a dearth of human interfaces to existing automated privacy policy solutions, which are
easy enough to use for drafting and auditing policy by non-technical personnel.
-
Existing tools in the marketplace do not ease the self-administration of
community-centric privacy policies. In other words, there is an absence of tooling that allows
a community of like-minded stakeholders in the compliance of data disclosure to govern
community membership and to collate multiple stakeholders' policies.
-
Existing automated privacy policy solutions do not yet offer semantic-based decision making.
Overview
Disclosure is the rhinoceros in Information Sharing's livingroom.
You know? The one people don't talk about much? But still, the one whose potential
for havoc stays at the back of everyone's mind as they chitchat about other topics.
No one really knows when the brute might do something awful; and nobody would
be able to do anything much about it, except perhaps escape.
Inappropriate disclosure of data raises a lot of very serious concerns,
including (but by no means limited to) legal jeopardy, privacy
infringement, compromise of operational confidentiality, and public
policy repercussions. To some extent, all parties in an
Information Sharing activity share exposure to these risks.
Options.
A workable comprehensive solution to the problem of Disclosure Policy
is indispensable. Without such a solution, sharers of data are left
with three unsatisfactory alternatives:
-
Don't share the data in the first place. Safe, but regrettable.
Obviously, to the extent that sensitive information is held back due
to concerns about inappropriate disclosure, the value of any
Information Sharing initiative is diminished.
-
Ignore the risk. The whistling-past-the-graveyard approach (often
favored by the naive) will last until the first civil suit, or
incident of operational compromise, or crusading newspaper multi-part
Big Brother feature story.
-
Bake local, ad-hoc policies into software. It's possible (and indeed has been
done more than once) to get help from privacy experts to define
policies, and then write those policies directly into the
program code of an Information Sharing application. However, this doesn't
work over the long term. Policy imperatives are not stable over time.
Statutes, operating policies and the preferences of specific
agency sponsors constantly change. Nor, for that matter, are the
imperatives consistent across contexts. Disclosure policy from one
jurisdiction or data provider to another may be sufficiently
different to discourage any kind of sharing or exchange if the consequences of
baking policies in the exchange software involve complex designs or refitting
when policy changes occur.
Any attempt to adapt the software to adjust to any necessary
changes tends to become an expensive and underperforming
Software Project Without End.
Creating that comprehensive solution.
Discussions of Information Sharing (and to a lesser degree,
Information Exchange) tend to acknowledge Disclosure as an issue,
but to defer addressing it in any substantive fashion. The general
sentiment appears to be that, sooner or later, someone will solve
the problem adequately; and in the meantime there are other things
to work on.
Fair enough. But (to torture the metaphor) everyone is getting ready
to attend the big Information Sharing Dinner Party. WIJIS is
saying that we're all going to need that livingroom, and it's
time to get the damned rhinoceros out. To that end, WIJIS has begun
conceptual work on a technology called Cascading Disclosure Control
Language, or CDCL.
Initial results are promising. CDCL can, in principle, solve a
remarkably large array of problems. As WIJIS has been pursuing
this initial design, it has been very encouraging to uncover many
other, initially unconsidered, problems it fixes. Just as significantly,
CDCL creates many new opportunities and enables previously
unconsidered functionality. In our experience, these unanticipated
benefits are hopeful indications that a design concept is highly
appropriate to the problem space.
Essential characteristics of CDCL
In order to be usable for practical disclosure control deployment,
CDCL is going to need to have some specific characteristics.
-
Human requirements
-
Easy to learn. CDCL should be reasonably easy
for a semi-technical person to learn to write. This facilitates
distributed authorship and editing.
-
Readable. Sight-reading a CDCL rulesheet should
be reasonably easy for a semi-technical person. This would enable
review and audit by many pairs of eyes (for transparency) and
further promote distributed authorship.
-
Terse. So rulesheets are small.
-
Accommodating of a community of parties, all of whom
claim a stake in ensuring the disclosure of information complies with policy.
-
Machine requirements
-
Inexpensive to parse. The software-development
cost of writing applications that can parse CDCL
declarations should be low.
-
Transformable. There are many cases in which
the ability to transform CDCL content into an alternate
form would be of great value. Example: transforming the rules
into a lawyerly form of English, in order to provide
policymakers with an authoritative translation of the
practical effect of a given CDCL rule. Example: transforming the
rules into a widely adopted, machine-readable format such as XACML.
-
Platform-independent. Nothing in the CDCL design
should preclude reasonably cost-effective deployment in support
of operations on any significant computing platform.
Obviously, the Machine requirements strongly favor representing
CDCL as XML, while human requirements militate against it. This
conflict is an interesting design and implementation challenge.
-
Legal/administrative requirements
-
No Intellectual Property encumbrances.
No part of the CDCL datamodel or syntax should be vulnerable
to any kind of IP challenge.
-
No licensing lockout of vendors.
Similarly, CDCL reference implementations should not be
licensed in such a manner as to impede the use of CDCL
by for-profit vendors.
-
Project requirements
-
CDCL should be fast-moving and should
deliver workable content soon. This project should not
be bogged down in an unending quest for
perfect solutions. A 90% level of completion and
logical consistency, if its limitations are well
understood, is acceptable for first-approximation
publication.
-
CDCL should be sturdy, practical and effective
rather than abstractly perfect. Those
abstractions are very important, and should not be ignored;
but reconciling them with pragmatic early implementations
can be done in in subsequent releases.
-
CDCL should be easy to implement for early
adopters; every effort should be made to reduce the implementation
and operational burdens imposed on anyone coming on board.
-
CDCL deliverables should be published as free and open source
so that the cost to deploy and operate CDCL is extremely inexpensive if not
free. WIJIS' objective is to create a solution that may be employed by
any law enforcement agency, even those with the most constrained financial and technical
resources.
-
CDCL must be semantically savvy
so that brittle dependencies may be minimized between authors of policy
and engineers of data exchange systems.
CDCL summary
From Wikipedia: "...in most areas of law in most jurisdictions in the United States, there are "statutes"
enacted by a legislature, "regulations" promulgated by executive branch agencies pursuant to a delegation
of rule-making authority from a legislature, and common law or "case law", i.e. decisions issued by courts
(or quasi-judicial tribunals within agencies)." Combining law with other "expectations of behavior", including
operational procedures and business processes, one has a body of work for which CDCL and its potential
derivatives may play a critically important role in compliance. A case in point is, from the Wisconsin
Department of Justice in August 2007, "Under Wis. Stat. § 19.36(6), however, the custodian is required to
delete or redact confidential information contained in a record before the parts of a record that are subject
to disclosure [are disclosed]."
For further information, please see papers from
California and GLOBAL.
Therefore, controlling disclosure both in a manner conducive to current practices and in
compliance with law and procedure is a problem for which CDCL is proposed as the solution across various domains.
The results of our preliminary conceptual work are as follows:
Design Principles
CDCL rests on a few fundamental principles:
-
It's better to express rules in a declarative language ("what")
than to code logic in an imperative language ("how"). CDCL possesses,
accordingly, a purely declarative construct called the Authoring Form,
which is to be employed by users to write disclosure policy rules.
-
CDCL Authoring Form is a severely abstract language. Its syntax will include
minimal embedded assumptions about expected usage; semantics of
the desired disclosure policies will be expressed at the rule
level, not at the syntactical level.
-
Collation of policies will be governed by Data Custody. Any entity
that is authoring rules is called a
Custodian
of the data being constrained. Custodians come in two types:
-
Stakeholder
Custodians can both authorize and restrict disclosure. There need not be any stakeholder
policies defined at the event of collation, policy decision making, and policy enforcement.
-
Primary Custodians
can both authorize and restrict disclosure. The primary custodian's policy
must exist at the event of collation, policy decision making, and policy enforcement,
unlike stakeholder policies.
(In general, Primary custodianship would reside in the entity
ordinarily referred to as the "owner" of the data.)
-
Disclosure control at the "document" level may not be sufficient in many cases.
Rules should be able to constrain disclosure of individual data
elements when appropriate. CDCL is therefore node-based, not document-based
(document-level constraints, if desired, are easily enough
expressed by referencing the document root node.)
-
CDCL Rules
are defined as logical triples:
-
A Condition-set
of the circumstances that must be satisfied for rule applicability (defined
by making boolean assertions about properties of the user to whom the data is destined,
run-time event properties, or
authentication-event properties, e.g. LDAP Attributes,
2ndFactor confirmation, time of day etc.)
-
A Nodeset
of data elements (membership is defined by
making boolean assertions about the elements or their
containing document, e.g.
XPath expressions in an XML
Document, SPARQL result assertions in an RDF Document, semantic
assertions, literal value comparisons, etc.)
-
An Outcome
stating how any data element in the Nodeset will
be treated when providing information under satisfaction of the conditions
described in the Condition-set. Outcomes will be expressed as URIs, both for
unambiguous worldwide identification and for future-proofing.
In preliminary analysis, WIJIS has identified seven
likely Outcomes:
-
disclose
-
redact (confirm the info exists but do not disclose content)
-
deny (disclose neither content nor the existence of the info)
-
hold pending human review (policy enforcement is concluded manually)
-
do not quote out of context (prevents "line-item" redaction intended to influence data's interpretation)
-
obligation or "entailment" (describes the duties or covenants at the recipient's acceptance or use of the data, e.g. "destroy within 30 days")
-
alert (sends bulletin to a specified monitor)
It seems likely at this point that the eventual set of Outcome definitions will
be somewhat more developed than this preliminary list.
The Condition-set and Nodeset are referred to as the Rule's
cites.
-
Any collection of CDCL rules is called a
rulesheet.
The rulesheet is the administrative unit of rule authorship, maintenance, and
publication. Characteristically, a rulesheet consists of
rules for a single Custodian.
-
Both Condition-set definitions and Nodeset definitions can be
elaborated with the familiar Boolean Algebra
AND, OR,
XOR, and NOT operations to express
intersections, unions, disjunctions
and complements respectively.
NOTE: recent work on
Booliette Notation is providing a
framework for fulfilling this in a way that's reasonably accessible to nontechnical users.
-
An unbounded number of rules can be written to apply to a
given data element (because the data element can exist in the
intersection of an arbitrarily large number of Nodeset
definitions, each cited in a different rule.)
-
Because of the multiplicity of rules, a deterministic logic
of outcome resolution must be defined. Stealing
from Following the terminological lead of the
W3C's
Style Activity,
WIJIS refers to this reliable resolution of multiple rule
outcomes as a cascade.
The
Cascade is a fixed order of Outcome reconciliation, giving priority to restrictions
on disclosure over any authorizations of disclosure. This provides Custodians with consistent
expectation about the behavior of one's policy when collated with others' policies.
Design Goals
The CDCL design project will have to make some compromises and
choices when defining semantics and declaring syntax. For guidance in
doing so, the following desirable
characteristics will be considered:
-
The authoring and review of CDCL Rulesheets should be as easy
as possible for a semi-technical person. This would ideally
include the ability of someone with less than Software Developer
skills to sight-read the rules and understand what they're reading.
-
CDCL Rules should be parseable and transformable with minimal
cost and effort.
-
Compatibility with the W3C
Semantic Web stack is highly desirable.
-
It's better to maintain rulesheets in extrinsic configuration
files, as opposed to compiling them into the application.
CDCL Design should avoid any characteristics
that might impact this capability.
Unanswered Design Questions
-
What are the target languages/platforms for CDCL implementation? The most
practically useful would seem to be Java, Python, and Ruby;
deployability to a .NET stack would be essential; some of the older
legacy platforms like OS/400 might be kind of tricky to implement
in spite of the possible applicability. It's possible that C/C++ may
have a role... this is a very open question.
-
What is the feasibility of implementing non-XML CDCL syntax? Doing
this kind of work requires a
Parser
Generator or other such "compiler-compiler" that can consistently
build a source tree for every desired target language. How many of these
are available? How consistent are they? Can they all consume the same
grammar specification (e.g. EBNF)? And so forth. Lots of work to do on this.
-
OASIS,
the Organization for the Advancement of Structured Information Standards,
has in fact published a very elaborate, heavyweight, thorough specification called
XACML,
which addresses many of the same problem spaces as CDCL. Initial review suggests that
XACML is not very well-suited for the purposes to which CDCL will be applied. However,
it seems to be a very well-engineered specification, and WIJIS supports OASIS as a
general rule. While designing CDCL, it will be important to investigate XACML
much more closely, with particular attention paid to the following questions:
-
Can XACML (contrary to current impressions) actually
be conveniently used for the intended purposes
of CDCL after all? Because if it can, then CDCL represents a wheel reinvention,
and WIJIS's efforts should be shifted to XACML implementation instead.
Our current thinking is that XACML and CDCL are complementary, rather
than duplicative, technologies. They address different needs, and have
different capabilities and different goals. XACML is a powerful and
complete solution to the general problem of resource permissions; CDCL
is, by contrast, a lightweight and agile technique to address
a subset of that general solution. If this view turns out to be accurate,
the relationship between XACML and CDCL may be comparable to the relationship
between XSLT/FO and CSS: nothing that CSS can do cannot be done with XSLT and FO,
and they can do many many things that CSS cannot.
However, people overwhelmingly use CSS to author Web pages because it's much more tractable and
practical when styling Web content.
-
Can CDCL be transformed into a subset of XACML, and vice versa? And, should it? Answering this
question will involve a nontrivial comparative analysis of the XACML and CDCL
datamodel topologies. There may or may not be incompatibilities that make
transformations between the two forms of dubious usefulness, or perhaps even
logically impossible.
-
How much can the CDCL design effort steal
reuse any of the XACML work product? Again, looking at this question requires
a thorough comparative abstract analysis. But reuse of XACML artifacts for
CDCL design would confer two significant benefits:
-
The quality of the XACML analysis can be presumed to be quite high.
It would be nice to be able to benefit from that.
-
Borrowing constructs from XACML would improve the likelihood of
interoperability between the two standards.
Content of this site remains the property of its poster.