Candidate keys: Choosing the right primary key

HOME \| NEWSLETTER \| WATERCOOLER
	Candidate keys: Choosing the right primary key It is the business that has the knowledge about their entities (things of importance) and what kind of unique information that separate them (invoice numbers, customer IDs, and so on). Your role as an analyst/designer is to validate such properties, and to fit them into a normalized database model that reflects the needs of the business. During this, you will discover many candidate keys and issues concerning how they could be defined. With wrong decisions, chances are high that you will have a worse job in normalizing the database model, as well as achieve a high performing database in the end. (Read the eBook on database normalization). ). Three absolute demands on the candidate key, if it is to be regarded as a possible primary key. There are three fundamental demands on the candidate key that we should never deviate from, if it is to become the subject for a primary key: The candidate key must be unique within its domain (the entity it represents, and beyond, if you also intend to access external entities, with is clearly illustrated in this article). The candidate key can not hold NULLs (NULL is not zero. Zero is a number. NULL is 'unknown value'). The candidate key should never change. It must hold the same value for a given occurrence of an entity for the lifetime of that entity. In 1, we say that the main purpose of the candidate key is to help us to identify one single row in a table, regardless of whether there exists billions of row. This sets high demands on the flexibility in terms of delivering uniqueness. In 2, we say that a candidate key always, without exception, must hold a value. Without it, we break post 1. In 3, we say that no matter what happens, inside our business or outside of it, we are not allowed to change the value of the candidate key. Adding an attribute (expanding) the candidate key, would pose a change. Some say that you should be able to change the primary key, and do a 'cascade update' to all underlying tables, ref above. Actually, 3 it not a formal rule as such, but it is my personal experience through the years that make me claim this as a rule for myself, due to the limitations of today's database products. Cascading updates are very critical operations, not well supported by the different database system vendors. We will deal with that in a later chapter. If the candidate key can withstand all these three demands, we have most likely identified our primary key. As we proceed in this eBook, we will keep these three demands in our minds. You may also find interest in reading 'A Relational Model of Data for Large Shared Data Banks' by E.F. Codd (the founder/co-founder of the relational theory): A Relational Model of Data for Large Shared Data Banks Notations in graphic examples. Throughout this eBook, I will be using conceptual models; Entity Relationship diagrams, as well as logical models; what is called Server diagrams in Oracle Designer. I will also examplify the different constructions with SELECT statements where appropriate. First, a few words about notation in the conceptual model: Fig. 2: In this figure, a # in front of an attribute means the attribute is (part of) the primary key. Likewise, the horizontal bar over the crow-foot in the drawing of the relationship between departments and accounts is a sign that the primary key from departments is also a part of the primary key in the accounts entity. How smart that is, is a totally different issue: (ACCOUNTS is breaking 2nd Normal Form: Name is dependent on ACCOUNT_NO alone, not DEPT_ID). A * in front of an attribute signs that this attribute is mandatory (always must have a value filled in), while an o in front of an attribute tells that it is optional to give the attribute a value. Likewise with the logical model: Fig. 3: Notice the difference between the two graphics: in the logical model in Fig.3, the relationship in Fig. 2 is incorporated in the accounts table. The relationship in Fig. 3 now indicates that there exists an unconditional constraint between the two tables: You are not allowed to enter a value into DEPT_ID in accounts if it doesn't exist in the departments table. The horizontal line above the crow-bar does not necessarily indicate participation in the primary key of that table: only the # tells you if a column is a part of the primary key. By the way, Fig. numbers do not start at Fig. 1, because this is just an excerpt from the eBook. you can check it out here: Primary and Foreign Keys. I have also used the plural form on entity names as well as table names. The normal is singular form for entities, and plural for tables. I have done this exception for reading purposes only. Details on uniqueness A primary key is, as stated earlier in post 1, a column, or a collection of columns, that together give us the ability to uniquely read one single row from a table, no matter how many rows that exists in that table. In the logical world this is just theory, but in the physical, we get into a lot of trouble if we are unable to do that. Consider a simple example: You want to give a 10% pay raise to your top DBA, John Wilson. You do that by writing: UPDATE EMPLOYEES SET SALARY = SALARY * 1.1 WHERE NAME ='John Wilson'; If you have two employees with that name, they both get a pay raise. Not exactly what you intended? Names are very unlikely candidates for primary keys. You need something unique, and names are not unique. Never. Details on null values In post 2, about no null values in a primary key: How can you identify something you don't know? This point should go without saying, but I bring it up, because, believe or not, I have read articles claiming that there exists the possibility of a weak primary key; meaning null values are allowed. Absolutely nonsense! All meaning of the term primary key vanishes into thin air. Most serious database systems will actually reject your attempts and deny you to go further on that path, and with a very good reason. If you ever come across this situation, just defy it. It is a path of failure. Actually, it is a non-sense discussion. Details on no change Think about Africa. After World War II, and the end of colonialism, how many countries have changed their names? A more recent example: There is no longer a country called The Union of Soviet Socialist Republics; Today we call it Russia. We even have standards for naming countries: the former Russia was known all over the world as USSR, and Russia is abbreviated as RUS today (or is it?). Look at these two fantastic tables: Fig. 4: Country definitions from the United Nations Fig:5: Another table of country definitions from the ISO standardization organization: Isn't that great? Western Samoa is abbreviated WSM in the United Nations, and it is abbreviated WS is ISO! It has even 3 identifiers in ISO; WS, WSM and 882! Take a look at USSR (Sorry, RUS): In United Nations they call it RUS, while ISO use RUS, RU, 643, and on Internet it's .ru. But according to FIPS 10-4 it should be abbreviated RS! How can we identify a country by using standards from these helpful people? If you do a search on Google for the term 'standard country codes' (without the quotes), you will find the sites for these two tables among the first ten hits. Do not go to the ISO page; they actually want you to walk around with a shopping cart and pay for it. In fig.4, every column is a candidate key: No two countries have the same name or any other shortcuts of the name, but they differ all the same, if you put on a broader perspective: In ISO, the name is Russia and the (latest determined standard) abbreviation is RU; but in the United Nations, the name for the same country is Russian Federation, and their abbreviation is RUS. Furthermore, in FIPS 10-4, the abbreviation is RS. Do they never meet to talk with each other? Looking at it isolated, all columns in Fig. 5 are also candidate keys. But not if you want to match entries in Fig. 4 and Fig. 5. Actually, I can find only one common denominator here: Both ISO and UN give Russia a common number: 643. Here is something to investigate. Are all the other abbreviations and names meant only for each organization, while they share the 643 number for identification? Have they been talking to each other, after all? It should be clear by now that we cannot rely on any of the character columns in either organization: While each of them are candidate keys in their own world, we would soon get into trouble if we tried to use them to talk to all the world (that is, outside the scope of our little application). As you can see, defining (or choosing) the right primary keys is not neccessarily a simple, straight-forward task. How about another case: You are a manufacturer, and one of your customers places an order on behalf of his customer, asking you to use his customer's customer number as identification: How long will it take before another of your customers does the same, and by sheer coincidence, they supply you with the same id (let us say cust_no 1000), but they are totally different companies? You will have to organize your system in such a way that you can uniquely identify all information going into the system, regardless of external systems, and regardless of external changes. That's the bottom line: You have to make the primary key foolproof. Return to Primary and Foreign Keys	Exclusive interviews with: Steven Feuerstein, PLSQL expert Donald Burleson, Top IT consultant Free eBook Subscribe to my newsletter and get my ebook on Entity Relationship Modeling Principles as a free gift: What visitors say... "I just stumbled accross your site looking for some normalization theory and I have to say it is fantastic. I have been in the database field for 10+ years and I have never before come across such a useful site. Thank you for taking the time to put this site together." Mike, USA Read more Testimonials
Theory & Practice DB Normalization Analysis Phase Database Keys DB Glossary Appl.Architecture Oracle DBA MySQL DBA SQL Server DBA Install Oracle Install SQL Server Proj.Management Oracle Constraint Programming Tips Database Normalization eBook:
Copyright © www.databasedesign-resource.com / All rights reserved. All information contained on this website is for informational purposes only. Disclaimer: www.databasedesign-resource.com does not warrant any company, product, service or any content contained herein. Return to top Copyright acknowledgement note: The name Oracle is a trademark of Oracle Corporation. The names MS Access/MS SQL Server are trademarks of Microsoft Corporation. Any other names used on this website may be trademarks of their respective owners, which I fully respect.