ER Limits

Overcoming the Limitations of ER Modeling

BRIEF DESCRIPTION:

A look at the semantics implied by an ER/Relational data structure, and then the difficulties stemming from thinking in terms of relational tables. This one really challenges you to think about what we do as data modelers. It not only tells you what is wrong with ER/Relational modeling but also points to a better way to do data modeling using Object Role Modeling (ORM).

TIME: best for half day.

WHERE: DAMA Minnesota, 2001; DAMA Iowa, 2005; Minnesota chapter of Professional Association of SQL Server users (PASSMN), 2007 (Overall presentation rating was 4.9 on a 5 point scale, with a 5.0 on speaker's presentation skills); presented as a 3 hour workshop at DAMA International EDW Symposium, 2008 - very well received with a rating of 8.8/10 overall, and Effectiveness of speaker = 9.1, one attendee said "We need to have this talk again next year for those who missed it"; DAMA Portland and Sacramento, 2008

ABSTRACT: (shorter version)

ER/Relational data modeling is dominant today, in practice and in DBMSs. We will explore the semantics implied by the simplest ER diagram (one entity with an identifier X and attributes A and B) then look at the problems in modifying the diagram when other or different semantics are true. The traditional focus on entities/records with descriptive attributes leads to "table think" and results in confusion. Consequently, even experienced data modelers often get the design wrong, especially in complex design situations. Normalization is one technique for detecting design problems.

We need a different way of thinking about and doing data modeling that avoids the pitfalls of record-based modeling, one that does not require normalization. We start by focusing on object domains and the relationships between objects, rather that on entity tables with identifiers and attributes, that is, rather that on relations with multiple columns. (NOTE: We did not say relationships.) The root problem is having to distinguish between entities and attributes; sometimes an attribute is actually an entity (aka foreign key). The initial "conceptual" stage of data modeling needs to be based on just two constructs - objects (entities) and relationships. Examples of such modeling schemes are Object Role Modeling (ORM), NIAM, and Fact (Oriented) modeling (FOM). Tables are formed after the conceptual, fact-based model is validated. The tables can be generated automatically and guaranteed in fifth normal form. This workshop has several interactive group activities.

ABSTRACT: (longer version)

In 1976 Peter Chen introduced a data model diagraming scheme to transcend thinking about physical records. It focused on entities and their relationships to produce Entity-Relationship (ER) Diagrams. It was a natural fit for today's relational DBMSs. Consequently, variants of ER Modeling are widely used today.

ER and its many "cousins" (including data modeling CASE tools), such as Information Engineering (Finkelstein, and adapted by Martin), ERwin (CA), PowerDesigner (Sybase), IDEF1X (Appleton; Bruce), Oracle Designer*2000 (Barker), Extended ER (Teorey), and even class diagrams in UML, have one key common characteristic -- they model in records. That means they still utilize attributes as a basic modeling construct. Even if attributes are not shown in an ER diagram, they are understood to be there. The ER Diagram is an abstraction of a more detailed underlying data model.

In ER modeling, the data modeler assembles value domains as descriptors of an entity into entity records. Within the record, relationships among the attributes (columns, including identifiers) are not explicitly defined. Unfortunately, this is where even seasoned data modelers make mistakes -- attributes are included in a record when they shouldn't be. Normalization is one series of tests we perform to detect errors in our data models. (As an aside, no DBMS today is able to help the modeler produce a fully normalized data model. Do you know why? Come and find out!) As long as data items are clustered into records, there remains the possibility of violating one of the normal forms.

Data modeling is the foundation of information systems development - if you don't get the database design "right" then the systems you build will be like a house of cards, collapsing under the weight of inevitable future forces for revision, enhancement, integration, and quality improvement. Therefore, we need a scheme to guide our data modeling efforts which helps produce data models that are a clear and accurate representation of the users' domain of discourse, and which facilitate human communication, understanding, and validation.

Today, increasing numbers of people, data modelers, are beginning to recognize some serious limitations in all record-based data modeling schemes.

This workshop presents some examples to help data modelers recognize problems and develop 'correct' data models. It begins by having the participants verbalize all the semantics implied by a simple ER diagram consisting of a single record type:

[ X | A | B ] ... There are at least 15 distinct statements.

Then we focus in on the root cause and problems inherent in any modeling scheme that uses three basic constructs (entity, attribute, relationship), and implicitly or explicitly clusters attributes into entity records. For any given data item to be included in the database, the modeler must a priori decide whether it is an entity or an attribute. Sometimes it is both!

The common approach to designing databases is to first think of tables, what (entities) they represent, and what attributes to include. A good data modeler must avoid "table think" too early in the process. Hopefully this workshop will motivate data modelers to explore and seriously consider using a data modeling scheme with two basic constructs, such as Object Role Modeling (ORM). This does not mean abandoning traditional ER modeling; but it does sharpen our ability to do a better job of data modeling and overcome some of the limitations of ER modeling.