martes, mayo 08, 2007

La normalización relacional y OOD

Como siempre, el grupo comp.object mantiene buenas discusiones sobre diseño orientado a objetos, frecuentemente entrelazado con la programación procedural o las bases de datos relacionales, particularmente por la participación (o interferencia) de algún partidario del estilo procedural. En el caso que se menciona ahora, cómo representar la tercera forma normal usando clases, en palabras de H.S. Lahman:

Responding to V4vijayakumar...

I really hope this was not a homework question... B-)

> How object-oriented design can be normalized? Normalization, anyway
> related to OOD?

The Class Model in UML is underlain by the same relational data model
branch of set theory that underlies DBMSes. As a result the Class Model
needs to normalized to Third Normal Form just like an RDB schema.
[Normal Forms above third are rarely relevant because they mostly deal
with identifier conventions and we rarely use explicit identifiers for
objects.]

Most OOA/D authors don't talk explicitly about normalization. However, every OOA/D author will provide a suite of rules for constructing Class Models that essentially ensure Third Normal Form under the guise of things like one-fact-one-place. For example, the most comprehensive book available on Class Modeling is Leon Starr's "Executable UML: How to Build Class Models". Leon doesn't even mention Normal Form as far as I recall, yet he provides the most comprehensive set for guidelines for normalization that I have seen.

In a nutshell we have:

1NF: all responsibilities must be a simple domain. For knowledge attributes this means that the attribute must be described in terms of an abstract data type (ADT) that can be manipulated as if it were a scalar. For attributes that can be expressed in terms of fundamental values, the domain of data values must have a single semantics. So a domain of {UNSPECIFIED, 5, 6, 7} is invalid because it captures two separate semantics: valid data values of 5, 6, 7 and whether or not the data is specified at all.

For behaviors this means that the behavior responsibility must be cohesive and self-contained. Self-contained means it can depend on knowledge attributes but it can't depend upon other object's behavior responsibilities. (Note that this comes for free if one follows the methodology's dictums about encapsulation.)

2NF; all responsibilities are fully dependent on the object identity. Typically objects do not have explicit identity attributes but they do have an unambiguous mapping to some some uniquely identifiable problem space entity. This means the "value" of the property depends solely on what problem space entity is abstracted in the object.

As a practical matter 2NF is not very relevant to OO development because it is really about compound identifiers (i.e., multiple attributes combine to define the object identity). What 2NF is saying is that if there are multiple explicit identity attributes, then the "value" of a non-identity attribute must be dependent on /all/ of the identity attributes, not just some of them. A classic example of this is:

[Housing Development]
+ developmentID // identifier
...

[Subdivision]
+ developmentID // identifier
+ subdivisionID // identifier
...

[House]
+ developmentID // identifier
+ subdivisionID // identifier
+ houseID // identifier
+ style
+ builder

The style attribute is clearly dependent on the particular House identity, which must be fully specified. The same thing seems true for the 'builder' attribute since each House is built by one builder. But suppose construction policy is that a builder builds all the houses in a particular subdivision. Now the 'builder' value is fully specified if one only knows {developmentID, subdivisionID}. So the 'builder' attribute really belongs in the [Subdivision] class. [Note that if the development id seriously homogenized, all Houses in the same subdivision might have the same style. In that case, style also belongs in
[Subdivision].]


3NF: all responsibilities depend upon nothing but the object identity. Essentially this means that the "value" of a responsibility cannot depend upon knowledge attributes that are not explicit identity attributes. A classic example of this problem is:

[House]
+ address // identifier
+ builder
+ style
+ cost
...

The problem here is that it is highly unlikely that 'cost' is only dependent on the House identity. In fact, it is probably dependent on the style or on the combination of {builder, style}. IOW, only the /combination/ of {builder, style, and cost} is dependent solely on the identity of House, not the individual values. So, assuming cost is solely dependent on style, we need:

[House]
+ address
+ builder
+ style // referential attribute
...

[Style]
+ style
+ cost

where the unique combination of values is captured indirectly through
the relationship to [Style].

---

One must be careful not to confuse coincidental values or data domains with dependency. Consider Washing Machine and Refrigerator objects that both have a 'color' attribute. If the colors are designed to be color coordinated from the same manufacturer, they will have identical data domains for 'color'. It is quite possible that an object from both sets may be colored chartreuse. Nonetheless they are quite different things. How is that the 'color' attribute doesn't violate 1NF (same data domains semantics) and 3NF (both have the same color)?

The trick is to think for such generic qualities in terms of 'color of'. IOW, the color of a Washing Machine is chartreuse and the color of a Refrigerator is chartreuse. Thus the color of a Washing Machine is not semantically the same as the color of a Refrigerator even though the value is the same. Similarly, we have:

[Appliance]
+ color
A
|
+--------+-------------+
| |
[Refrigerator] [Washing Machine]

The notion of the color of an Appliance is something shared; it raises the level of abstraction of color of a Refrigerator and color of a Washing Machine to a common ground for both. Since a Refrigerator is an Appliance, it has the color attribute.

This example, though, underscores an important difference between
normalization applied to OO Class Models and the Data Models used for RDB Schemas. In the OO case Refrigerator can implement a different data domain than a Washing Machine for the 'color' attribute (i.e., they aren't required to be color coordinated). That is not true in Data Models where there will be exactly one data domain for 'color'.

The reason is that in Data Modeling the [Appliance] table is instantiated separately from the subclass tables and the subclass tables do not have a 'color' attribute. So there is only one attribute in one table with one data domain.

In contrast, in the OO Class Model the superclasses cannot be instantiated separately so an object resolves the entire tree. Thus the object identity /includes/ the [Appliance] properties. In addition, the Class Model only identifies the responsibility (i.e., What it is); it does not define its implementation (i.e., How it does it).

So Refrigerator and Washing Machine can provide different implementations of the 'color' attribute, such as different data domains. IOW, we resolve object identity at the leaf level of the tree through inheritance. Thus unique data domains can be associated with subclasses even though the responsibility is identified in a superclass.

No hay comentarios.: