Programming skills for Java profile job: SQL

Showing posts with label SQL. Show all posts

What is normalization ?

Defination : Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.
Value addition :
a. Eliminate data redundancy
b. Improve performance
c. Query optimization
d. Faster update due to less number of columns in one table
e. Index improvement

There are four types of normalization:
1. First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Remove repetative groups
Create Primary Key

2. Second Normal Form (2NF)Second normal form (2NF) further addresses the concept of removing duplicative data:
Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Remove columns which create duplicate data in a table and related a new table with Primary Key – Foreign Key relationship

3. Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

4. Fourth Normal Form (4NF): The 4NF also known as BCNF NF
Finally, fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.

If PK is composed of multiple columns then all non-key attributes should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it

Chapter 3. SQL Essentials - D

Transactions

A transaction is the logical unit of work when performing database operations. A transaction is created by grouping a sequence of one or more DML statements into a single, all-or-nothing proposition with the following characteristics.

Atomicity— All statements must complete successfully or the entire set is aborted.
Consistency— The statements' net effect is to leave the database in a consistent state.
Isolation— Intermediate statements should not be visible to other transactions.
Durability— When the transaction is complete, the changes are made permanent.

These four characteristics of a transaction are often abbreviated ACID. To ensure the consistency of the data, we need to identify the boundaries of a transaction and the SQL statements it comprises. Oracle then does the rest of the work necessary for ensuring that the transaction passes the ACID test.

The environment in which we are executing SQL, whether it's SQL*Plus, SQLJ, or JDBC, normally allows us to execute our statements in one of two modes, manual commit or auto-commit. In auto-commit mode, changes made to the database with DML statements are immediately made permanent in the database and are immediately visible to other transactions. In other words, transactions can include only a single SQL DML statement. In this case, Oracle ensures that the transaction meets the ACID requirements.

In manual commit mode, Oracle provides isolation and durability. Changes made to the database with DML are not visible to other transactions and are made permanent only once the transaction has been successfully completed. We take responsibility for atomicity and consistency. We determine what set of DML statements we want to group as an atomic set, and we perform any intermediate validation or error checking that might be necessary.

We group DML statements by marking the boundaries of the transaction—the beginning and the end. Oracle has no statement to mark the start of a transaction explicitly; a transaction begins either at the start of a session (or connection) or after the previous transaction has ended by being committed or cancelled. There are two statements used to mark the end of a transaction:

COMMIT— Save the results of the transactions permanently in the database.
ROLLBACK— Abort the transaction and return the database to its state at the beginning of the transaction.

In addition, DDL statements, such as CREATE TABLE, ALTER TABLE, and DROP TABLE, have an implied COMMIT.

A Transaction: All or Nothing

SQL*Plus, like other interfaces to Oracle, has an auto-commit feature. When this feature is enabled, every SQL statement is automatically committed as it is entered. By default, auto-commit is off in SQL*Plus, so we can manually control a transaction by entering a sequence of SQL statements, then decide at the end whether to call COMMIT or ROLLBACK.

In other interfaces, we need to be aware of the default and change it if necessary to support transaction processing. In SQLJ, auto-commit is off by default. In JDBC, it is on by default, meaning that each SQL statement we send to the database is committed automatically; this is less efficient and less flexible than auto-commit.

Suppose that we are adding a CD to our CD collection by typing SQL statements directly into SQL*Plus. We need to update two tables—the CD_COLLECTION table, which contains information about the CD as a whole, and the SONGS table, which contains information for all the songs. We don't want to have incomplete information available in the database, so we'll consider the SQL statement that inserts the CD information and the SQL statements that insert each of the songs into the database as a single transaction.

Assuming that the session just started or that we have just called either COMMIT or ROLLBACK for the previous transaction, we are ready to begin a new transaction:

INSERT INTO CD_COLLECTION
(CD_ID, ARTIST, ALBUM_TITLE, RELEASE_DATE, COUNTRY, LABEL)
VALUES
(100, 'PJ Harvey','Rid of Me','1-JAN-1993','UK','Island');

INSERT INTO SONGS
(SONG_ID, CD_ID, SONG_TITLE, COMPOSER, TRACK)
(1,100, 'Rid of Me','Harvey',1);

INSERT INTO SONGS
(SONG_ID, CD_ID, SONG_TITLE, COMPOSER, TRACK)
VALUES (2, 100, 'Missed','Harvey', 2);

INSERT INTO SONGS
(SONG_ID, CD_ID, SONG_TITLE, COMPOSER, TRACK)
VALUES (NULL, 100, 'Missed','Harvey', 3);

Suppose that, at this point, we realize it is far too tedious to enter each song manually and decide we'll write a script to do this, instead. We can undo all that we've done so far by typing ROLLBACK. All of the records that we entered up until that point in this session will be removed from the database.

If, on the other hand, we decide we'll just pick up from here later, either with a script or manually, we would commit the transaction so far by entering:

COMMIT;

Up until the time we enter the COMMIT statement, the records we enter into SQL*Plus are not visible in other users' sessions. We can stop partway through, for example, and query the database, and we will see any records we have entered into the CD_COLLECTION table or the SONGS table. But nobody else can see the new records. You can verify this by opening another SQL*Plus session and querying the databases in the second session. You will find that until you type COMMIT in the first, you will not see the updates in the second.

Partial Rollbacks

In addition to allowing us to commit or roll back a group of SQL statements as a whole, Oracle also allows us to mark intermediate points to allow a partial rollback. The commands to do this are:

SAVEPOINT savepoint_name— Names a point to which we can selectively roll back a transaction.
ROLLBACK TO savepoint_name— Rolls back a transaction to previously named SAVEPOINT.

This is not as generally useful as the basic COMMIT/ROLLBACK arrangement, but it allows creating a more sophisticated scheme for recovering from a partial failure while processing a transaction.

For example, suppose that we have an application with a user interface that allows a user to enter the information for a CD. Suppose we will allow the users to enter just the general CD information or the general information and all the songs. Now suppose that a user tries to enter information and songs, but an error occurs while inserting the songs. We can prompt the user with three choices:

Save just the general CD information.

Try to pick up where the error occurred.

Abandon the entire transaction.

We won't present the application code here to support this logic, but here is a sequence of SQL statements that could cause an error like this—notice NULL in the third INSERT into the SONGS table:

INSERT INTO CD_COLLECTION
(CD_ID, ARTIST, ALBUM_TITLE, RELEASE_DATE, COUNTRY, LABEL)
VALUES (100, 'PJ Harvey','Rid of Me','1-JAN-1993','UK','Island');
SAVEPOINT CD_INFO;

INSERT INTO SONGS
(SONG_ID, CD_ID, SONG_TITLE, COMPOSER, TRACK)
(1,100, 'Rid of Me','Harvey',1);
SAVEPOINT SONG_1;

INSERT INTO SONGS
(SONG_ID, CD_ID, SONG_TITLE, COMPOSER, TRACK)
VALUES (2, 100, 'Missed','Harvey', 2);
SAVEPOINT SONG_2;

INSERT INTO SONGS
(SONG_ID, CD_ID, SONG_TITLE, COMPOSER, TRACK)
VALUES (NULL, 100, 'Missed','Harvey', 3);
SAVEPOINT SONG_3;

Because the third insert is missing its primary key, this causes an error. At this point, if the user chose option 1, to save just the CD information, we would perform a partial rollback and commit:

/* Save just CD information */
ROLLBACK TO CD_INFO;
COMMIT;

If the user chose instead to pick up from where the error occurred, we could roll back to SONG_2, the last successful insert, and let the user start again from there:

/* Start again just before first error */
ROLLBACK TO SONG_2;
/* Continue inserting more songs ... */

If the user chose to abandon the entire transaction, we would use an unqualified ROLLBACK, which returns the database to its state prior to starting the transaction:

/* Abandon transaction entirely */
ROLLBACK;

Partial rollbacks should be approached cautiously. It is easy to create complicated scenarios that can be hard to manage and which can leave the application in an invalid or undetermined state from which it is impossible to recover.

Chapter 3. SQL Essentials - C

SQL Functions

Oracle SQL has three main types of functions: single-row functions, user-defined functions, and aggregate functions. Single-row functions are usually referred to simply as functions and are the type we will be describing here. User-defined functions are functions that we write ourselves, using either Java or PL/SQL, and can be used like regular SQL functions; "PL/SQL." Aggregate functions are a special kind of function that combines multiple values returned by a query that returns multiple rows of data—to calculate a total or an average, for example.

Single-row functions are similar to Java methods: They take zero or more parameters and return a single value. (Unlike Java, however, they must return a value; there is no equivalent to the void return type.) SQL functions can be used anywhere that a literal constant or a select list item can be used, and its parameters can be either constants or select list items.

We can use functions in a SELECT statement to modify a select list item. The following example, using the UPPER() function, will convert the names of fruits to uppercase:

SQL> SELECT UPPER(FRUIT) FROM FRUITS;

UPPER(FRUI
----------
APPLE
APPLE
MANGO
MANGOSTEEN
DURIAN
ORANGE

We can also use functions in a SELECT statement's WHERE clause. The following example selects fruits that have names that are six letters long:

SQL> SELECT FRUIT FROM FRUITS
  2  WHERE LENGTH(FRUIT)=6;

FRUIT
----------
Durian
Orange

Note that when we call the function for a value in the select list, the function is called once per each returned row. (If the table is large, we normally restrict the size of the result set to a manageable size by using a WHERE clause.) But if we use a function in the WHERE clause, it will be called for every row in the table; unless we have created a function-based index, this can cause performance problems for large tables.

There are four main categories of functions:

Numeric
Character
Date
Miscellaneous

Functions can be categorized either on the basis of the first and principal parameter or on the basis of the return value. The function LENGTH(), for example, would be a character function based on the parameter it takes, but a numeric function based on its return value. Here, they are categorized according to their first parameter. A few functions do not take any parameters and are categorized as miscellaneous.

Because Oracle has a very large number of functions, only a few common ones will be briefly covered. Refer to the Oracle SQL Reference for a complete list.

Numeric Functions

All numeric functions take one or more numeric values and return a numeric value. These typically perform mathematical operations, such as rounding, calculating the square root, raising to a power, etc. Here are a few commonly used functions:

ABS(n)— Returns the absolute value of n.

Example: ABS(-123)

Returns: 123
MOD(m, n)— Returns m modulo n, that is, the remainder that results from dividing m integrally by n.

Example: MOD(15, 4)

Returns: 3
POWER(m, n)— Returns m raised to the power n. Both m and n can be floating point numbers, unless m is negative, in which case, n must be an integer.

Example: POWER(2, 8)

Returns: 256
ROUND(m [, n])— Rounds m to the nearest integer, unless n is specified, in which case, m is rounded to the number of decimal places corresponding to n. If n is positive, it refers to places to the right of the decimal point; if n is negative it refers to places to the left of the decimal point.

Example: ROUND(2.718281828459)

Returns: 3

Example: ROUND(2.718281828459, 5)

Returns: 2.71828

Example: ROUND(186282.397, -3)

Returns: 186000
SQRT(n)— Returns the square root of n.

Example: SQRT(256)

Returns: 16

Character Functions

Character functions take a string as the first parameter and sometimes additional character or numeric parameters. Most return a string, but a few, such as LENGTH(), return a numeric value. Here are some commonly used character functions:

ASCII(c)— Returns decimal value of the character c in the database character set. Note that this function is misnamed, because the database character set is unlikely to actually be ASCII.

Example: ASCII('Æ')

Returns: 146

(Assuming the database character set is WE8MSWIN1252)
CHR(n)— Returns the character that the decimal value n represents in the current database character set.

Example: CHR(65)

Returns: A
CONCAT(string1, string2)— Returns a string that is the concatenation of string1 and string2. This is equivalent to the (||) concatenation operator.

Example: CONCAT('top', 'hat')

Returns: tophat
LOWER(string)— Returns string with all letters in lowercase.

Example: LOWER('Your VOICE!')

Returns: your voice!
LTRIM(string [, chars])— Returns string with all blanks preceding the first nonblank character removed. If a string chars is specified, any characters appearing in chars are removed up to the first character that does not appear in chars.

Example: LTRIM(' ...uh, hello?')

Returns: ...uh, hello?

Example: LTRIM(' ...uh, hello?', ' .')

Returns: uh, hello?
SUBSTR(string, start [, length])— Returns the portion of string beginning with the character at position start. (The first character in the string is 1.) If length is specified, only the number of characters corresponding to length are returned; otherwise, the remainder of the string is returned.

Example: SUBSTR('Supercalifragilisticexpialidocious', 21)

Returns: expialidocious

Example: SUBSTR('Supercalifragilisticexpialidocious', 15, 4)

Returns: list
UPPER(string)— Returns string with all letters in lowercase.

Example: UPPER('crust')

Returns: CRUST

Date Functions

ADD_MONTHS(date, n)— Returns a date n months later than date. If the day of the month in date is greater than the last day of the resulting month, the last day of the resulting month is returned.

Example: ADD_MONTHS('7-JUL-2000', 1)

Returns: 07-AUG-00

Example: ADD_MONTHS('31-DEC-2003', 2)

Returns: 28-FEB-04
CURRENT_DATE— Returns current date and time according to the current database session's time zone. Takes no arguments and, consequently, does not use parentheses. See TO_CHAR() below for formatting information.

Example: CURRENT_DATE

Returns: 14-SEP-02
LAST_DAY(date)— Returns the date of the last day of the month that contains date.

Example: LAST_DAY('19-DEC-2002')

Returns: 31-DEC-02
MONTHS_BETWEEN(date1, date2)— Returns the number of months between date1 and date2. This includes a fractional part unless they are the same day of the month or both are the last day of the month. The value is positive if date1 is later than date2, negative if date1 is earlier than date2.

Example: MONTHS_BETWEEN('21-MAR-2003', '15-JAN-2003')

Returns: 2.19354839

Example: MONTHS_BETWEEN('31-JAN-2003', '1-FEB-2003')

Returns: -.03225806
NEXT_DAY(date, dayOfWeek)— Returns the date of the next weekday corresponding to dayOfWeek (for example, Tuesday) following date. The weekday is in the current session language and can be abbreviated.

Example: NEXT_DAY('1-MAR-2003','FRI')

Returns: 07-MAR-03
SYSDATE— Returns current date and time according to the database. Takes no arguments and, consequently, does not use parentheses.

Example: SYSDATE

Returns: 14-SEP-02
TO_CHAR(date [, format])— Returns a string representing date. If format is not provided, the default date format is used. The optional format parameter is a string that provides a template for the date formatting. Table 3-9 lists the most common format elements that it can include:

Table 3-9. Most Common Format Elements
am	Meridian indicator (lowercase)
AM	Meridian indicator (uppercase)
Day	Name (initial uppercase)
DAY	Name of day of week (uppercase)
DD	Day of month (1–31)
DY	Abbreviated day of week
HH	Hour of day (12-hour clock)
HH12	Hour of day (12-hour clock)
HH24	Hour of day (24-hour clock)
MI	Minutes (0–59)
MM	Month (01–12)
MON	Abbreviated name of month (uppercase)
Mon	Abbreviated name of month (initial uppercase)
MONTH	Name of month (uppercase)
Month	Name of month (initial uppercase)
PM	Same as AM
pm	Same as am
SS	Seconds (0–59)
YYYY	Year (four digits)
YY	Year (two digits)

In addition to these elements, the format string can include punctuation and double-quoted text.

Example: TO_CHAR(TO_DATE('19-JAN-2003', 'Month DD,YYYY HH:MI:SS am')

Returns: January 19,2003 12:00:00 am

Example: TO_CHAR(CURRENT_DATE, '"Today is" Day')

Returns: Today is Saturday

Chapter 3. SQL Essentials

SQL is a special-purpose language, sometimes described as a relational language, which can be used with a database for a number of different purposes. SQL can be considered a standard language, with some qualification. There are two significant standards, SQL-92 and SQL-99. Most database vendors, including Oracle, are largely compliant with SQL-92. SQL-99 greatly extends the scope of SQL, introducing new features such as persistent modules and multidimensional analytical capabilities. Most database vendors are largely compliant with the core features of SQL-99, but not the new features. This core compliance allows them to claim that they are compliant with SQL-99, even though this is virtually the same as being compliant with SQL-92. To confuse things further, most vendors, including Oracle, also have extensions to standard SQL, which are sometimes unavoidable. SQL commands can generally be grouped into a number of general categories, according to their purpose:

Data Definition Language (DDL), used for defining tables and the relationships between the tables and the data in them.
Data Manipulation Language (DML), used for adding, retrieving, modifying, and deleting data.
Transaction control commands, used to group sets of DML statements into a single unit of work; used to ensure data integrity.
Database administration commands.
SQL/PSM, used for writing procedural programs using persistent stored modules.

The first two categories of commands, DDL and DML, are the core SQL commands used for defining data models and for storing and querying data. These are the commands we will be concentrating on in this chapter. These commands are generally implemented in a standard way by vendors. We will generally adhere to the features defined in the SQL-92 standard for DML and DDL commands.

Transaction control commands are used to isolate groups of commands so that they can be treated as a unit. SQL-92 defines very basic support for transactions; there is no command to mark the start of a transaction, for example. We will largely use this basic model, but we will also consider SQL-99 features as implemented in Oracle.

Database administration commands are largely vendor-dependent. In Oracle, they are used to maintain the physical structure of the database, create users, grant rights to users, create database policies of various sorts, etc. Database administration is a large topic in its own right and well beyond the scope of this book. We'll cover only a few essential commands incidentally, when we set up a sample database.

The last category is an optional part of the SQL standard to allow procedural programming. It defines persistent stored modules that provide control flow statements and bind variables, similar to Oracle's PL/SQL. This standard was accepted in 1996, but since then, vendors have begun to converge on Java stored procedures.

Chapter 3. SQL Essentials - A

Data Definition Language

SQL has three main commands for data definition: CREATE, for creating tables and databases; ALTER, for changing tables and databases; and DROP, for deleting tables and databases. Because creating, altering, and deleting databases are database administration tasks, we will consider only tables in this section.

Creating Tables

We used the CREATE TABLE command in the previous chapter when we started building our sample database. The most basic form of this command is:

CREATE TABLE table_name (
  column_1_name   column_1_type,
  column_2_name   column_2_type,
  column_n_name   column_n_type
);

which can have as few as one or as many as 1,000 columns. Although it is perfectly legal to omit a primary key in Oracle, it's a good practice to include one, as follows:

CREATE TABLE table_name (
  column_1_name   column_1_type,
  column_2_name   column_2_type,
  column_n_name   column_n_type,
  PRIMARY KEY (column_list)
);

This example shows only a single column as the key. This is commonly the case, especially when we use an identification number of some kind to identify each record uniquely. In some cases, the items in the tables have a unique number already associated with them that make a good key—for individuals in the United States, a Social Security number is sometimes used in this way. Sometimes, it is necessary or just convenient to create a number for this purpose. Oracle, in particular, provides a feature—sequences—that can be used to generate numbers for keys.

In some cases, a combination of an item's properties serve to identify it uniquely. In the example we started discussing in the previous chapter, we mentioned that we could use the combination of an album title and an artist as the key.

We created our CD_COLLECTION table in the previous chapter with the following SQL command:

CREATE TABLE CD_COLLECTION (
  ALBUM_TITLE     VARCHAR2(100),
  ARTIST          VARCHAR2(100),
  COUNTRY         VARCHAR2(25),
  RELEASE_DATE    DATE,
  LABEL           VARCHAR2(25),
  PRIMARY KEY (ALBUM_TITLE, ARTIST)
);

The primary key is treated as a separate object in the database and has a name associated with it. If we don't explicitly name it, Oracle will give it a name automatically—something fairly cryptic, such as SYS_C001427.

It's a good idea to name our objects explicitly whenever possible. It makes things easier to understand and simplifies maintaining our database. In this case, the way to do it is to name the PRIMARY KEY constraint when we create it. This is a better way to create a table:

CREATE TABLE table_name (
  column_1_name   column_1_type,
  column_2_name   column_2_type,
  column_n_name   column_n_type,
  CONSTRAINT name PRIMARY KEY (column_list)
);

If we were to do it all over again, we could use a command such as this to create our CD table:

CREATE TABLE CD_COLLECTION (
  ALBUM_TITLE     VARCHAR2(100),
  ARTIST          VARCHAR2(100),
  COUNTRY         VARCHAR2(25),
  RELEASE_DATE    DATE,
  LABEL           VARCHAR2(25),
  CONSTRAINT CD_COLLECTION_PK PRIMARY KEY (ALBUM_TITLE, ARTIST)
);

Another option that we should consider applies to the columns in a table, NOT NULL. We can add this to our column definition after the datatype if we wish to require that a value be provided for a particular column. (This is automatically a requirement for key columns.) We may, for example, want to make the COUNTRY a mandatory field in the CD_COLLECTION table. We can do that by changing the CREATE statement like this:

CREATE TABLE CD_COLLECTION (
  ALBUM_TITLE       VARCHAR2(100),
  ARTIST            VARCHAR2(100),
  COUNTRY           VARCHAR2(25) NOT NULL,
  RELEASE_DATE      DATE,
  LABEL             VARCHAR2(25),
  CONSTRAINT CD_COLLECTION_PK PRIMARY KEY (ALBUM_TITLE, ARTIST)
);

Although COUNTRY is the only column we explicitly declare NOT NULL, it is not the only column for which we must provide a value. Because the ALBUM_TITLE and ARTIST columns together are the primary key, they are implicitly NOT NULL also.

Finally, let's consider another type of constraint, a FOREIGN KEY. Before we knew we should name our constraints, we created our SONGS table with this statement:

CREATE TABLE SONGS (
  SONG_TITLE   VARCHAR2(100),
  COMPOSER     VARCHAR2(100),
  LENGTH       NUMBER,
  TRACK        NUMBER,
  ARTIST       VARCHAR2(100),
ALBUM_TITLE    VARCHAR2(100),
  PRIMARY KEY (SONG_TITLE),
FOREIGN KEY (ARTIST, ALBUM_TITLE)
               REFERENCES CD_COLLECTION(ARTIST, ALBUM_TITLE)
);

The FOREIGN KEY clause indicates that each record in this table is a child of a record in the CD_COLLECTION table. This means that we cannot enter a song in the SONGS table without first entering an album in the CD_COLLECTION table. This also means that we cannot delete any albums from the CD_COLLECTION table without first deleting the corresponding child records in the SONGS table. Were we to attempt to delete a record from CD_COLLECTION that had child records in SONGS, we would get an error like this:

SQL> delete from cd_collection;
delete from cd_collection
*
ERROR at line 1:
ORA-02292: integrity constraint (MYDB.SYS_C002745) violated - child record found

We can add an option to the foreign key constraint that will automatically delete child records when the parent record is deleted, ON DELETE CASCADE.

This is the form our statement for creating the SONGS table takes after we name our constraints and add the ON DELETE CASCADE OPTION to it:

CREATE TABLE SONGS (
  SONG_TITLE   VARCHAR2(100),
  COMPOSER     VARCHAR2(100),
  LENGTH       NUMBER,
  TRACK        NUMBER,
  ARTIST       VARCHAR2(100),
  ALBUM_TITLE  VARCHAR2(100),
  CONSTRAINT SONGS_PK PRIMARY KEY (SONG_TITLE),
  CONSTRAINT SONGS_FK_CD_COLLECTION
FOREIGN KEY (ARTIST, ALBUM_TITLE)
        REFERENCES CD_COLLECTION(ARTIST, ALBUM_TITLE)
        ON DELETE CASCADE
);

Keys and Sequences

In the preceding examples we used an existing column or group of columns that uniquely identify a row as the primary key in a table. This is called a natural key. Sometimes, however, a natural key does not exist or it is not practical to use a natural key. If we have a list of names, for example, we may have many John Smiths. As mentioned earlier, one solution that is common in the United States is to use a person's Social Security number to identify unique individuals, but this is a practice that is often—and justifiably—criticized.

Using long and complex keys, such as combinations of columns, makes it more difficult to use a table. This is a problem with using the combination of ALBUM and TITLE as the primary key for the CD_COLLECTION table.

The way to avoid these problems with natural keys is to use an artificial key. This is an arbitrary unique number that is assigned to each row in the table. This is such a common requirement for relational databases that it is surprising there is no standard way to generate artificial keys for a table. Every RDBMS has its own special way of doing this.

In Oracle, the way to generate artificial keys is by using a SEQUENCE. This is a database object that generates numbers guaranteed to be unique, even for multiple clients in a distributed environment.

Let's redefine our CD_COLLECTION table to use an artificial primary key, CD_ID. But first, we need to drop the SONGS table because it contains a foreign key that refers to this table, then we can drop the CD_COLLECTION table.

DROP TABLE SONGS;
DROP TABLE CD_COLLECTION;
CREATE TABLE CD_COLLECTION (
    CD_ID             NUMBER,
    ALBUM_TITLE       VARCHAR2(100),
    ARTIST            VARCHAR2(100),
    COUNTRY           VARCHAR2(25) NOT NULL,
    RELEASE_DATE      DATE,
    LABEL             VARCHAR2(25),
    CONSTRAINT CD_COLLECTION_PK PRIMARY KEY
(CD_ID)
);

Next, we'll create a SEQUENCE to generate CD_IDs:

CREATE SEQUENCE CD_ID_SEQUENCE;

Now, when we want to add an album, in addition to the other information, we'll need to include a CD_ID, which we can obtain from CD_ID_SEQUENCE, using the NEXTVAL function:

INSERT INTO CD_COLLECTION (CD_ID, ...)
  VALUES(CD_ID_SEQUENCE.NEXTVAL, ...);

(We'll learn more about the INSERT statement later in this chapter; the important thing to note is that calling the CD_ID_SEQUENCE.NEXTVAL function returns a unique integer.)

In the same way, we can recreate the SONGS table to use a sequence for the primary key. Notice that we need to change the foreign key (and the corresponding columns) to reflect the new primary key in the CD_COLLECTION table, from ARTIST and ALBUM_TITLE to CD_ID:

CREATE TABLE SONGS (
    SONG_ID      NUMBER,
    CD_ID        NUMBER,
    SONG_TITLE   VARCHAR2(100),
    COMPOSER     VARCHAR2(100),
    LENGTH       NUMBER,
    TRACK        NUMBER,
    CONSTRAINT SONGS_PK PRIMARY KEY (SONG_ID),
    CONSTRAINT SONGS_FK_CD_COLLECTION
FOREIGN KEY (CD_ID)
       REFERENCES CD_COLLECTION(CD_ID)
       ON DELETE CASCADE
);
CREATE SEQUENCE SONG_ID_SEQUENCE;

We don't need any of the details of the numbers that a sequence generates, other than their uniqueness. In fact, because they are entirely arbitrary, our applications should use them only internally, should not depend on their having any particular properties, and should not expose them to users.

By default, sequences generate consecutive numbers starting at 1, but as an optimization in a distributed environment—to minimize the number of round trips between the client application and the server—the client session may reserve a set of numbers for later use. If it doesn't use all of these numbers, they are silently discarded. This means that there will be gaps in the numbers that get used as primary keys in the database. This shouldn't matter, but if the number appears in the user interface, as a sales order number, for example, nonconsecutive numbers will be perceived as a bug, not a performance feature!

Indexes and Functional Indexes

When we insert data into a database, the data typically is entered in an unordered fashion. One of the benefits of identifying a primary key for a table is that we are indicating to the database that we will be using this key to locate records in the database. Rather than keeping the data sorted by this key, the database creates an index behind the scenes by keeping a separate table of our primary keys in sorted order, together with a pointer to the record in our table.

Creating Indexes

If we know we will be searching the table often, using a specific set of criteria, we sometimes want to create additional indexes of our own. This is especially the case where we are using an artificial key. Using a CD_ID in our COLLECTION_TABLE will make programming easier, as we will see later, but we still will find ourselves frequently searching for ARTIST and ALBUM_TITLE. We may want to create additional indexes for ARTIST and TITLE or the combination of the two, to speed up queries.

There is a cost for indexes, however; each time we add a record to a table in the database, each index for that table must be updated. Adding indexes to support every likely query can make inserting records, especially into large tables, noticeably slow. This might be acceptable if the table is primarily used for queries (particularly ad hoc queries), but more typically, a balance must be struck, and we must be selective about what columns to index.

Suppose we anticipate that we will most frequently query our CD_COLLECTION by ARTIST and only occasionally by ARTIST and ALBUM_TITLE. We might decide to create a single column index for ARTIST. We do this as follows:

CREATE INDEX CD_COLL_ARTIST_IDX ON CD_COLLECTION(ARTIST);

But now let's suppose that, as we develop our application, we realize that although querying on ARTIST is, in fact, a common query, this returns a list of the ARTIST's albums, from which the user will typically then select a specific album. In other words for every search based on ARTIST, there will be another search based on ARTIST plus TITLE. We may want to create instead an index based on the combination of ARTIST and ALBUM_TITLE:

CREATE INDEX CD_COLL_ART_ALB_IDX
ON CD_COLLECTION(ARTIST, ALBUM_TITLE);

Because ARTIST is the first column in this index, the database can use this composite index as an index for the ARTIST column too, so we don't need the single-column index we created above.

If, on the other hand, we found that we were often searching on the combination of ARTIST and ALBUM_TITLE, and never on ARTIST, we may want to reverse the order of the columns in this index. It is more efficient if the column with the most unique values appears first in an index. In this case, we can expect that each artist will have multiple albums and album titles for the most part will be unique.

Creating Function-Based Indexes

In Oracle 9i, it is also possible to create indexes based on functions. To support case-insensitive searches, for example, we may want to create a functional index. A functional index is like a regular index, except that instead of specifying a column name, we specify the function (including, of course, any applicable columns). The following will create an index on the uppercase version of ARTIST:

CREATE INDEX CD_COLL_UC_ARTIST_IDX
  CD_COLLECTION(UPPER(ARTIST));

As each record is inserted into the database, the UPPER() value of the ARTIST column will be indexed. This can greatly ameliorate the performance problems that using functions as part of a search criterion can cause.

Indexes and Performance

It's important to understand the purpose of indexes and their interaction on the code we write—this understanding often leads us to make better choices in how we design our queries. But as a practical matter, in a development environment, with small test databases, indexes make little or no difference.

Setting up indexes properly in a production environment is a complex task. Typically, indexes are assigned to their own tablespaces, which are usually on a separate disk drive than the data table. (The commands in the examples above default to using the same tablespace as the data tables.) Oracle provides a number of performance tools for verifying that queries are using indexes and other optimizations properly. If you will be deploying to such a system, you likely need to work together with the DBA (or refer to additional resources, such as Oracle's documentation) to build the appropriate indexes and make sure your application's queries are properly tuned to use these indexes.

Altering Existing Tables

Once a table is created, it's not too late to modify it. We can add, alter, or drop columns, and disable or enable the constraints, among other things. Here are some of the most commonly used formats that the ALTER command can take:

ALTER TABLE tablename ADD(column_name column_type);
ALTER TABLE tablename ADD(CONSTRAINT constraint_name
  PRIMARY KEY (column_list));
ALTER TABLE tablename MODIFY(column_name, column_type)
ALTER TABLE tablename ENABLE CONSTRAINT constraint_name;
ALTER TABLE tablename DISABLE CONSTRAINT constraint_name;
ALTER TABLE tablename DROP COLUMN column_name;
ALTER TABLE tablename DROP CONSTRAINT constraint_name [CASCADE];
ALTER TABLE tablename DROP CONSTRAINT PRIMARY KEY [CASCADE];
ALTER TABLE tablename RENAME table_name TO new_table_name;

We'll first take a look at how to change the name of the primary key. Because there are no commands to rename columns or constraints, in order to rename our primary key, we first need to drop it, then add a new one with the correct name.

To drop a constraint, we need to know its name. If we've used a reasonable naming convention, we can guess what that name is. Otherwise we can find out from the database's data dictionary—a set of views (views are a type of virtual table) that we can query to obtain information about the database. The view USER_CONSTRAINTS contains information about constraints such as primary keys. The following query displays the constraints associated with our SONGS table.

SQL> SELECT CONSTRAINT_NAME
  2  FROM USER_CONSTRAINTS
  3  WHERE TABLE_NAME='SONGS';

CONSTRAINT_NAME
------------------------------
SONGS_PK
SONGS_FK_CD_COLLECTION

Knowing the constraint's name, we can drop it:

ALTER TABLE SONGS DROP CONSTRAINT SONGS_PK;

Now we can add a new primary key:

ALTER TABLE SONGS ADD CONSTRAINT SONGS_NEW_PK
  PRIMARY KEY(SONG_ID);

Querying again for the constraints for this table, we find:

SQL> SELECT CONSTRAINT_NAME
  2  FROM USER_CONSTRAINTS
  3  WHERE TABLE_NAME='SONGS';

CONSTRAINT_NAME
------------------------------
SONGS_NEW_PK
SONGS_FK_CD_COLLECTION

Probably the most common changes we will make to tables are to add columns or to change the type or size of an existing column. Let's take a look at the SONGS table, using the SQL*Plus DESCRIBE table command.

SQL> DESC SONGS
Name                              Null?    Type
--------------------------------- -------- -----------
SONG_ID                           NOT NULL NUMBER
CD_ID                                      NUMBER
SONG_TITLE                                 VARCHAR2(100)
COMPOSER                                   VARCHAR2(100)
LENGTH                                     NUMBER
TRACK                                      NUMBER

(Notice that we can abbreviate DESCRIBE as DESC and that we don't need to add a semicolon at the end because it is an SQL*Plus command, not an SQL command.)

First we'll add a column:

ALTER TABLE SONGS ADD(PUBLISHER VARCHAR2(50));

We can use DESCRIBE again to see that it has been added:

SQL> DESC SONGS
Name                               Null?    Type
---------------------------------- -------- -------------
SONG_ID                            NOT NULL NUMBER
CD_ID                                       NUMBER
SONG_TITLE                                  VARCHAR2(100)
COMPOSER                                    VARCHAR2(100)
LENGTH                                      NUMBER
TRACK                                       NUMBER
PUBLISHER                                   VARCHAR2(50)

Next, we'll change an existing column. But before we do that, we need to consider that existing data may interfere with our ability to do that. If we try to shorten the length of the SONG_TITLE field, but we have data that would not fit in the shortened column, the following error would occur:

SQL>    ALTER TABLE SONGS MODIFY(SONG_TITLE VARCHAR2(20));
        ALTER TABLE SONGS MODIFY(SONG_TITLE VARCHAR2(20))
                                 *
ERROR at line 1:
ORA-01441: cannot decrease column length because some value is too big

As we'll see later, we can use an SQL UPDATE statement to find and change the value that is too long, then run this command again.

Another alternative is to drop the table and start fresh. In fact, this isn't as drastic an alternative as it may seem. It's not generally a good idea to create and alter tables ad hoc at an SQL*Plus prompt. To properly plan, document, and manage a database schema, it's better to write scripts to perform DDL commands. If our database is being used for development, we would modify our scripts for creating tables, rather than adding additional scripts to alter tables. In a development environment, we often create and recreate our tables, as we'll see, so scripts for creating tables often begin with a command to drop the table first. If we were in a production environment, we would need to consider preserving user data; we don't need to worry about that, because we also have scripts for creating test data—although if we were to decide to change the column length of the SONG_TITLE column, we'd have to remember to change the data to fit within the length restriction.

Dropping Tables

The command for dropping a table is easy. The format is:

DROP TABLE table_name;

But if we try this with our CD_COLLECTION table, we find that it fails, with a complaint from Oracle about the primary key in this table being referenced as a foreign key by another table:

SQL> drop table cd_collection;
drop table cd_collection
           *
ERROR at line 1:
ORA-02449: unique/primary keys in table referenced by foreign keys

There are two ways to solve this problem, depending on our intention. If we are going to delete all the tables in our database—in order to rebuild it, perhaps—we can do this by deleting the tables in an order that doesn't violate these dependencies. In this case, it means dropping the SONGS table first, which removes the reference to the CD_COLLECTION table's primary key, allowing us to drop CD_COLLECTION next.

If we have a large database, however, determining the right order to drop tables can be nontrivial. (It may even be impossible, given that it is possible to create circular dependencies.) We can find out which constraints reference other constraints by looking at the USER_CONSTRAINTS table using the following query:

SQL> SELECT CONSTRAINT_NAME, R_CONSTRAINT_NAME, TABLE_NAME FROM USER_CONSTRAINTS;

CONSTRAINT_NAME         R_CONSTRAINT_NAME       TABLE_NAME
----------------------  ----------------------  ----------
CD_COLLECTION_PK                                CD_COLLECTION
SONGS_NEW_PK                                    SONGS
SONGS_FK_CD_COLLECTION  CD_COLLECTION_PK        SONGS

3 rows selected.

Notice that the constraint SONGS_FK_CD_COLLECTION belonging to the SONGS table references the constraint CD_COLLECTION_PK, as listed under R_CONSTRAINT_NAME. If it weren't already clear from the name, we could also see that the CD_COLLECTION_PK constraint belongs to the CD_COLLECTION table. If there were more tables with additional dependencies, we could continue going through this listing to determine how to proceed. As it is, we can drop the tables with two commands:

DROP TABLE SONGS;
DROP TABLE CD_COLLECTION;

Another way of deleting the CD_COLLECTION table is to remove the dependency between the two tables. We can do that by dropping the constraint in the SONGS table. If there were other tables that referenced CD_COLLECTION, we would need to drop the constraints in those, as well. We can do that in one step, however, by dropping the referenced constraint in CD_COLLECTION with option CASCADE, like this:

ALTER TABLE CD_COLLECTION
  DROP CONSTRAINT CD_COLLECTION_PK CASCADE;

In one step, this drops the CD_COLLECTION primary key constraint, CD_COLLECTION_PK, as well as any other constraints that reference it. Now we can drop the CD_COLLECTION table without a complaint from Oracle:

DROP TABLE CD_COLLECTION;

Chapter 3. SQL Essentials - B

Data Manipulation Language

Once we've created tables using DDL, we use DML to add, modify, delete, and query the data. In the process of creating and exploring our database, we've had a chance to become briefly acquainted with a few of these commands already.

This is the set of SQL DML statements that we will learn to use in this section:

Adding, changing, and deleting data
`INSERT` `UPDATE` `DELETE`
Querying data
`SELECT`
Transaction control statements
`COMMIT` `ROLLBACK` `SAVEPOINT` `SET TRANSACTION`

Adding Records

The command for adding new records to a database table is INSERT. There are two general formats for the INSERT command:

INSERT INTO tablename VALUES (value1, value2, ...);
INSERT INTO tablename (column_name1, column_name2, ...)
  VALUES (value1, value2, ...);

In the first format we do not list the columns that we will be supplying values for. The values must exactly match the type and default order of all the columns in the table. We can find this out with the DESC command.

This is the result for the CD_COLLECTION table:

SQL> DESC CD_COLLECTION
Name                                Null?     Type
----------------------------------  --------  -----------
CD_ID                               NOT NULL  NUMBER
ALBUM_TITLE                                   VARCHAR2(100)
ARTIST                                        VARCHAR2(100)
COUNTRY                                       VARCHAR2(25)
RELEASE_DATE                                  DATE
LABEL                                         VARCHAR2(25)

To add a row, we need to provide values for all six columns, in order:

INSERT INTO CD_COLLECTION VALUES
   (CD_ID_SEQUENCE.NEXTVAL, 'Horses', 'Patti Smith', 'USA',    '1-JAN-1975', 'Arista');

If we wish to omit a value (and it is valid to do so), we can use the keyword NULL, as in this example:

INSERT INTO CD_COLLECTION VALUES
   (CD_ID_SEQUENCE.NEXTVAL, 'Doolittle', 'Pixies', 'USA',    '1-JAN-1989', NULL);

You may be wondering whether it is true that all the albums we've entered so far really have a release date of 1 January. Actually, in most cases, the only available information is the year. Oracle, however, is very particular about the format of dates, so we need to provide a day and a month, as well. (In addition, hours and minutes are implied, so this really means zero hours, zero minutes, zero seconds after midnight on 1 Jan 1989.) We are essentially setting a convention that we will use only the year part of a date. Another way of representing just years would be to use a character field.

One problem with using an INSERT statement that doesn't specify a column list is that the table might change, and the statement will no longer be valid. If a column were added to the CD_COLLECTION table, for example, the previous INSERT statements would fail with the complaint from Oracle of not enough values. It is usually better to write INSERT statements with an explicit column list:

INSERT INTO CD_COLLECTION (CD_ID, ARTIST, ALBUM_TITLE)
   VALUES(CD_ID_SEQUENCE.NEXTVAL, 'The Breeders', 'Last Splash');

Notice that this INSERT statement has only the CD_ID, ARTIST, and ALBUM_TITLE columns and the order in which the values appear corresponds to the statement's column list, not the table as listed by the DESCRIBE command.

Updating Records

After a record has been entered, we may wish to change it or add more information to it. The SQL UPDATE statement is used to add information to existing records. The basic format is:

UPDATE tablename SET column_name1=value1, column_name2=value2, ...
[WHERE condition]

The WHERE clause is optional and can be used to select which record or records are to be updated. If we omit it, all records will be updated. WHERE clauses can be very complicated and powerful, as we will see when we examine queries, but for now, we can get by with knowing just the basics. WHERE is followed by a logical expression; every record for which this expression is true is updated. We'll see a complete list of logical operators later, but for now, we need to know only that the comparison operator for equality is the equal sign (=). (It is distinguished from the assignment operator, which is also an equal sign, by context.) We also need to know that we can combine logical expressions with the Boolean operators AND and OR.

In the last two examples of the INSERT statement, we deliberately omitted some information. We'll correct that now. To identify a record in a table uniquely, we need to specify either its primary key or a set of columns that we know will identify the record uniquely. In the CD_COLLECTION table, we can use the ALBUM_TITLE and the ARTIST columns. (You may remember that these two columns previously served as a natural primary key.) The WHERE clause to identify the Pixies album, "Doolittle," is:

WHERE ARTIST='The Pixies' AND ALBUM_TITLE='Doolittle'

We can use this clause in an UPDATE statement to supply the LABEL, which we previously specified as NULL:

UPDATE CD_COLLECTION SET LABEL='Elektra'
WHERE ARTIST='The Pixies' AND ALBUM_TITLE='Doolittle';

There is more information to add to the Breeders album, but the format is similar:

UPDATE CD_COLLECTION SET RELEASE_DATE='1-JAN-1993',
   COUNTRY='USA', LABEL='4AD/Elektra'
   WHERE ARTIST='The Breeders'
   AND ALBUM_TITLE='Last Splash';

Used without a WHERE clause, an UPDATE statement will change the specified field or fields for all records. The following statement will change the COUNTRY column to USA for every record in the CD_COLLECTION table:

UPDATE CD_COLLECTION SET COUNTRY='USA';

As it happens, the COUNTRY column for all the albums so far was USA anyway, so no harm has been done by this command.

Deleting Records

The DELETE statement for deleting records from the database is similar to the UPDATE statement in that it takes an optional WHERE clause for specifying which records to delete:

DELETE FROM tablename
[WHERE condition]

Like the UPDATE command, omitting the WHERE clause means the command will be performed on all records: All records will be deleted. It is frightening—but worth remembering—how easy it is to delete all records.

Let's suppose that these are the songs in the SONGS table:

SQL> SELECT SONG_TITLE FROM SONGS;

SONG_TITLE
------------------------------------------------
Black Sheets of Rain
Crystal
Don't want to know if you are lonely
I don't know for sure

If we wanted to delete I don't know for sure we would enter the following DELETE statement (remembering to use two single quotes for the literal single quote in the title):

SQL> DELETE FROM SONGS
  2  WHERE SONG_TITLE='I don''t know for sure';

1 row deleted.
Listing the songs again:
SQL> SELECT SONG_TITLE FROM SONGS;

SONG_TITLE
-------------------------------------------------
Black Sheets of Rain
Crystal
Don't want to know if you are lonely

Now let's see what albums are in the CD_COLLECTION table and delete one of them:

SQL> SELECT ALBUM_TITLE FROM CD_COLLECTION;

ALBUM_TITLE
------------------------------------------------------
Black Sheets of Rain
Candy Apple Grey
Horses
Doolittle
Last Splash

SQL> DELETE FROM CD_COLLECTION
  2  WHERE ALBUM_TITLE='Candy Apple Grey';

1 row deleted.

When we created the SONGS table, you may have noticed that we added the ON DELETE CASCADE option to the foreign key constraint, as we suggested in the section on DDL. This means that when a parent record is deleted from the CD_COLLECTION table, the children records in the SONGS table are deleted, as well. If we now list the SONGS table, we'll see that the songs from this album have silently been removed, as well:

SQL> SELECT SONG_TITLE FROM SONGS;

SONG_TITLE
-----------------------------------------------------
Black Sheets of Rain

Querying Data

The most important statement in SQL is the one we use to query tables, the SELECT statement. At its simplest, it can be used to select data from a single table, as we've already seen in previous examples. However, because of the many ways it can be used to combine tables and conditions, it has a versatility that we can only begin to explore here.

We'll be using the following table for this section:

FRUIT       COLOR       QUANTITY  PRICE  PICKED
----------  ----------  --------  -----  ---------
Apple       Green             12     .5  12-SEP-02
Apple       Red               12     .5  15-SEP-02
Mango       Yellow            10    1.5  22-SEP-02
Mangosteen  Purple             5      2  25-SEP-02
Durian                         2     15
Orange      Orange            10      1  28-AUG-02

These are the commands needed to create this table:



CREATE TABLE FRUITS (
  FRUIT VARCHAR2(12),
  COLOR VARCHAR2(12),
  QUANTITY NUMBER,
  PRICE NUMBER,
  PICKED DATE
);

INSERT INTO FRUITS VALUES('Apple', 'Green', 12, 0.50, 
'12-Sep-2002');
INSERT INTO FRUITS VALUES('Apple', 'Red', 12, 0.50, 
'15-Sep-2002');
INSERT INTO FRUITS VALUES('Orange', 'Orange', 10, 1.50, 
'28-Aug-2002');
INSERT INTO FRUITS VALUES('Durian', 'NULL', 2, 15.00, NULL);
 INSERT INTO FRUITS VALUES(NULL, NULL, NULL, NULL, NULL);

This is the basic format of the SELECT statement:

SELECT select_list
FROM table_name
[WHERE condition]

When executed, the SELECT statement returns one or more rows, called the result set. If we don't specify a WHERE clause, it simply returns one row for every row in the table.

The select list specifies the columns that are returned in the result set. The simplest form of the column list is an asterisk, which represents all the columns. This statement, therefore, returns all the columns and rows of the table:

SQL> SELECT * FROM FRUITS;

FRUIT      COLOR        QUANTITY      PRICE PICKED
---------- ---------- ---------- ---------- ---------
Apple      Green              12         .5 12-SEP-02
Apple      Red                12         .5 15-SEP-02
Mango      Yellow             10        1.5 22-SEP-02
Mangosteen Purple              5          2 25-SEP-02
Durian                         2         15
Orange     Orange             10          1 28-AUG-02

6 rows selected.

Tables sometimes contain columns that we are not interested in. We can select the columns we want displayed by specifying them in the select list:

SQL> SELECT FRUIT, PRICE FROM FRUITS;

FRUIT           PRICE
---------- ----------
Apple              .5
Apple              .5
Mango             1.5
Mangosteen          2
Durian             15
Orange              1

6 rows selected.

Select list items can include not just column names, but also expressions, including combinations of columns, SQL functions, and literal values. For example, we can calculate the total value of each type of fruit we have by multiplying the quantity by the price:

SQL> SELECT FRUIT, QUANTITY*PRICE FROM FRUITS;

FRUIT      QUANTITY*PRICE
---------- --------------
Apple                   6
Apple                   6
Mango                  15
Mangosteen             10
Durian                 30
Orange                 10

6 rows selected.

We can also change the name of a column by giving it a column alias. Oracle lets us do this by simply following the column name or expression with the alias:

SELECT FRUIT, QUANTITY*PRICE VALUE FROM FRUITS;

Standard SQL, however, uses the keyword AS to indicate the alias. Because Oracle supports this as well, we'll use that instead, in an effort to remain as standard as possible:

SQL> SELECT FRUIT, QUANTITY*PRICE AS VALUE FROM FRUITS;

FRUIT           VALUE
---------- ----------
Apple               6
Apple               6
Mango              15
Mangosteen         10
Durian             30
Orange             10

6 rows selected.

Column aliases are important, not because they provide us with nicer headings in SQL*Plus but because later, they will provide us with a way of accessing the value of SQL expressions and functions in Java, particularly when we use SQLJ.

The WHERE Clause

In the examples we have just seen, the SELECT statement returns one row of results for every row it finds in the table. One of the more powerful and commonly used abilities of SQL is to specify criteria for selecting subsets of the data in a table. The WHERE clause is used to set a condition that each row must meet in order to be included in the result set.

A condition can be a simple condition with a single comparison, such as:

FRUIT='Apple'

A condition can also be a compound condition, including multiple single conditions joined logically using AND or OR:

FRUIT='Apple' AND COLOR='Red'

Comparison Operators

The following comparison operators are available in Oracle SQL:

Table 3-1. Comparison Operators in Oracle SQL
=	Equivalent
>	Greater than
>=	Greater than or equal to
<	Less than
<=	Less than or equal to
<>, !=	Not equivalent
BETWEEN…AND	In range (closed interval)
IN	In set
IS NULL	Is NULL
IS NOT NULL	Is not NULL
LIKE	String comparison

The first set of operators should be familiar from Java. But unlike Java, where they can be used only with primitive types, in SQL, the comparison operators can also be used with any of the basic datatypes or expressions, including strings and dates.

There are two other differences between the SQL operators and Java. The first is that the equality comparison and assignment operators, both represented by a single equal sign (=), are the same in SQL. Which is meant is determined by context. The second is that the standard SQL inequality operator is represented by angle brackets (<>), though Oracle also supports the Java-style (!=).

The last set of operators may be unfamiliar. BETWEEN…AND and IN are a kind of wordy shorthand for comparisons that you can also perform using the other operators. BETWEEN…AND is used to compare a value with a range. IN is used to find whether a value is a member of a set of values.

The next two, IS NULL and IS NOT NULL, are necessary to test specifically for the presence or absence of a NULL value. They are necessary because when NULL is compared with other values, the result is neither TRUE nor FALSE, but NULL. Because NULL represents an unknown value, the result of a comparison remains unknown, as well. This is true even (or maybe especially) if both values being compared are NULL. It may seem counterintuitive, but the condition NULL=NULL is neither TRUE nor FALSE, it is NULL.

The final comparison operator, LIKE, is a special comparison operator for strings. Unlike the equivalence (=) operator, LIKE also supports wildcards.

Comparisons with Numbers

Comparisons are most frequently done with numbers, so we'll begin by very briefly looking at some examples comparing numbers—familiar territory for a Java programmer.

To find a specific value in the database, we use the equivalence operator—the equal sign. The following query finds any fruit with a price of 1:

SQL> SELECT FRUIT, PRICE
  2  FROM FRUITS WHERE PRICE = 1;

FRUIT           PRICE
---------- ----------
Orange              1

We don't need to limit our comparisons to simple values; we can also use expressions. For example, we can search for any fruits with a combined value, QUANTITY*PRICE, greater than 10:

SQL> SELECT FRUIT, QUANTITY*PRICE
  2  FROM FRUITS
  3  WHERE QUANTITY*PRICE > 10;

FRUIT      QUANTITY*PRICE
---------- --------------
1 Mango                15
2 Durian               30

We can also select records that are within a given range. BETWEEN… AND is used to select a closed interval—one that includes the end points. For example, the range BETWEEN 1 AND 15 includes both 1 and 15, as the following query demonstrates:

SQL> SELECT FRUIT, PRICE
  2  FROM FRUITS
  3  WHERE PRICE BETWEEN 1 AND 15;

FRUIT           PRICE
---------- ----------
Mango             1.5
Mangosteen          2
Durian             15
Orange              1

Using BETWEEN…AND is exactly the same as using the inequality operators (>=) and (<=):

SQL> SELECT FRUIT, PRICE
  2  FROM FRUITS
  3  WHERE PRICE >= 1 AND PRICE <= 15;

FRUIT           PRICE
---------- ----------
Mango             1.5
Mangosteen          2
Durian             15
Orange              1

Using inequality operators is more flexible than using the BETWEEN…AND syntax, because we can choose whether to include either, both, or neither of the end points.

To select an open interval, one that excludes the end points, we can use (>) and (<). The following searches for fruit with prices between $1 and $15, not including $1 and $15—finding none, of course:

SQL> SELECT FRUIT, PRICE
  2  FROM FRUITS
  3  WHERE PRICE > 1 AND PRICE < 15;

FRUIT           PRICE
---------- ----------
Mango             1.5
Mangosteen          2

Half-open intervals are important for sorting things into categories; otherwise, you end up with values that don't belong in any category. For example, we bought these fruits from a grower who gives us a discount, depending on the quantity of each type of fruit purchased. Someone purchasing less than 5 gets no discount, someone purchasing at least 5 but less that 10 gets a 10% discount, someone purchasing at least 10 but less than 50 gets a 20% discount, and someone purchasing 50 or more gets a 20% discount.

These ranges are:

No discount:QUANTITY < 5
10%: QUANTITY >= 5 AND QUANTITY < 10
20%  QUANTITY >= 10 AND QUANTITY < 50
30%  QUANTITY >= 50

There are two ways to calculate the discount for each type of fruit: either with a somewhat complex SQL statement or procedurally, as separate SQL statements, using Java, for example. SQL can be extraordinarily powerful, and a single statement can replace many lines of code. Sometimes, however, that statement can be quite complex and difficult to compose and debug. If it can't be solved with a simple query, programmers tend to use a procedural approach to combine multiple simple queries instead. This is usually inefficient for three reasons: first, using multiple queries to arrive at a single result means multiple round trips between the client application and the database; second, the database can perform data-related operations, such as searching and sorting, much more efficiently than a procedural program (not to mention that database servers tend to be much more powerful than client machines); and finally, if the operation is performed in the database, it is often possible to improve the performance even further, using indexes and other optimization techniques.

Programmers often find it surprising how much can be accomplished with an SQL statement. We don't mean to suggest that programmers need to become SQL experts, only that we need to be aware of the possibilities as much as possible. That way, we can make an informed decision about whether we should find a database expert to help us with the query, do the research necessary to write the query ourselves, or resort to a procedural approach—perhaps because, in fact, that is the only way to do it.

As we'll see in the next chapter, the SQL solution in this case, using a union, is actually quite straightforward.

Simple String Comparisons

String comparisons are performed using the numeric value of the characters. These values are determined by the database character set—something that we selected when we created our database. We can generally assume that the character set is compatible with ASCII, as long as we limit ourselves to the Latin characters, without diacritical marks. The decimal values of the numbers and letters, for example, are found in Table 3-2.

Table 3-2. Decimal Values of ASCII Numbers and Letters
Character range	Decimal value range
0–9	48–39
A–Z	65–90
a–z	97–122

This limits the usefulness of making simple string comparisons for two related reasons: First, it defies intuition that 'Zoo' < 'animal'. Second, and more generally, it fails to take into account cultural conventions, such as rules regarding diacritics and non-Western European characters. We could almost go as far as to say that comparing two strings with the inequality operators (>), (>=), (<), and (<=) is usually meaningless. A better way to compare strings in a linguistically meaningful way is to use the NLSSORT function, which we will cover in the next chapter.

The most useful simple comparisons for strings are done with the equality and inequality operators "=" and "<>". They can be used to select or omit specific records. For example, to find the price of apples, we can query the database:

SQL> SELECT FRUIT, COLOR, PRICE
  2  FROM FRUITS
  3  WHERE FRUIT='Apple';

FRUIT      COLOR           PRICE
---------- ---------- ----------
Apple      Green              .5
Apple      Red                .5

Supposing we have an aversion to green fruits, we could also query to find out which fruits are not green:

SQL> SELECT FRUIT, COLOR, PRICE
  2  FROM FRUITS
  3  WHERE COLOR<>'Green';

FRUIT      COLOR           PRICE
---------- ---------- ----------
Apple      Red                .5
Orange     Orange              1

The comparison operators, when used with strings, have the drawback that we must match case correctly. In order for queries to work reliably, we need to make sure that our text strings are stored in a consistent manner. There are SQL functions that we can use to convert strings in the database to all upper- or lowercase, so we can enter our criterion in all upper- or lowercase to find a case-insensitive match:

SQL> SELECT FRUIT, COLOR, PRICE
  2  FROM FRUITS
  3  WHERE UPPER(COLOR)<>'GREEN'
  4  ;

FRUIT      COLOR           PRICE
---------- ---------- ----------
Apple      Red                .5
Orange     Orange              1

We'll see more about functions later. But keep in mind that, for large tables, we need to be careful when using functions in complex WHERE clauses.

String Comparisons with Wildcards

We've seen that we can compare character strings for equality, using the same comparison operators that we use for dates and numbers. However, this works only if we want to compare entire strings for exact matches. SQL provides a special comparison operator, LIKE, which allows us to match portions of strings by using wildcard characters as placeholders in pattern strings to perform the comparison. There are two wildcard characters in SQL, underscore and percent, as shown in Table 3-3:

Table 3-3. Wildcard Characters in SQL
_	Matches any one single character
%	Matches any number of occurrences (including zero) of any character

These are similar to the wildcards (?) and (*), respectively, used in the DOS and Unix environments.

The underscore (_), is easiest to understand. To match a pattern string, one character—any character—must appear wherever the (_) appears. Any literals in the rest of the string (anything except the percent symbol, in other words) must appear exactly as it appears in the pattern strings. Table 3-4 shows some examples:

Table 3-4. Examples of the Underscore Wildcard
Pattern	Matches
'_'	Any single letter string
'__'	Any string two letters long
'_BC'	Any string three letters long, ending in BC
'A_C'	Any string three letters long, beginning with A and ending with C
'_A_'	Any string three letters long, with middle letter A

The percent sign (%) is probably harder to explain than it is to understand. It can represent any number of any character or characters, including none. However, if there are any literals in the string or (_) in the string, these must be matched, as well. For example (Table 3-5):

Table 3-5. Examples of the Percent Sign Wildcard
Pattern	Matches
'%'	Any string, but not NULL
'%A'	Any string that ends with A
'A%'	Any string that begins with A
'%A%'	Any string that contains the letter A
'_%A%'	Any string that contains the letter A, except as the first letter

We'll try a few sample queries. For the first one, we'll select all fruits that begin with Mango:

SQL> SELECT FRUIT FROM FRUITS WHERE FRUIT LIKE 'Mango%'

FRUIT
----------
Mango
Mangosteen

Next, we'll look for those that contain the letters an

SQL> SELECT FRUIT FROM FRUITS WHERE FRUIT LIKE '%an%';

FRUIT
----------
Mango
Mangosteen
Durian
Orange

Next, those that contain the letters an, except at the end:

SQL> SELECT FRUIT FROM FRUITS WHERE FRUIT LIKE '%an%_';

FRUIT
----------
Mango
Mangosteen
Orange

Finally, those that end with the letters an:

SQL> SELECT FRUIT FROM FRUITS WHERE FRUIT LIKE '%an';

FRUIT
----------
Durian

Date and Time Comparisons

Date and time comparisons are similar to comparisons with numbers because underlying every date and time is a number. The only thing that makes this a little more difficult is that, in Oracle, there is a single datatype, DATE, that represents both date and time with a single number.

If we want to compare a date column with another date, the easiest thing to do is use a string literal in the default date format and let Oracle perform the conversion for us behind the scenes, like this:

SQL> SELECT FRUIT, PICKED FROM FRUITS WHERE PICKED>'20-SEP-2002';

FRUIT      PICKED
---------- ---------
Mango      22-SEP-02
Mangosteen 25-SEP-02

We can get the current date and time in Oracle by using the pseudo-column, SYSDATE. A pseudo-column is an Oracle function that we use as though it were a column that automatically exists in any table—we can use it anywhere we'd use any of the table's real column names. Now let's suppose that today is 25 September 2002. This query will list all of the fruits in our table, the day they were picked, and the current system date:

SQL> SELECT FRUIT, PICKED, SYSDATE FROM FRUITS;

FRUIT      PICKED    SYSDATE
---------- --------- ---------
Apple      12-SEP-02 25-SEP-02
Apple      15-SEP-02 25-SEP-02
Mango      22-SEP-02 25-SEP-02
Mangosteen 25-SEP-02 25-SEP-02
Durian               25-SEP-02
Orange     28-AUG-02 25-SEP-02

7 rows selected.

It looks as though one of the fruits, mangosteen, was picked today: SYSDATE and PICKED are the same. But if we query for fruits picked today using SYSDATE, we won't find any. When we entered the PICKED dates, we entered the date using the default date format that doesn't have a time part. The time defaulted to 12:00:00 a.m. SYSDATE, the current date and time, does have a time part, even though only the date part displays by default:

SQL> SELECT FRUIT, PICKED FROM FRUITS
  2  WHERE PICKED=SYSDATE;

no rows selected

To remove the time part of an Oracle DATE, we can use the TRUNC function. This way, we can compare the dates in the table with today's date, disregarding any hours, minutes, or seconds:

SQL> SELECT FRUIT, PICKED FROM FRUITS
2  WHERE PICKED=TRUNC(SYSDATE);

FRUIT      PICKED
---------- ---------
Mangosteen 25-SEP-02

Logic in SQL: AND, OR, NOT, and NULL

Now that we've seen how to create simple conditions, we'll look at how conditions can be negated using NOT or combined with AND or OR to form a compound condition. We'll also look at how the unknown value, NULL, complicates things.

In SQL, there are three logical values, TRUE, FALSE, and NULL. Every condition, simple or compound, evaluates to one of these three values. In a WHERE clause, if this condition evaluates to TRUE, the row is returned if it's part of a SELECT statement, for example. If it's FALSE or NULL, it is not.

NOT

FALSE and NULL are not the same, though. When we negate FALSE, we get TRUE. But when we negate NULL, we still get NULL. Table 3-6 is the truth table for NOT:

Table 3-6. Truth Table for NOT
	NOT
TRUE	FALSE
FALSE	TRUE
NULL	NULL

If we query the FRUIT table with the condition COLOR='Green', we can see that only one row meets this criterion:

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE COLOR='Green';

FRUIT      COLOR
---------- ----------
Apple      Green

Now let's try the negation of this, NOT(COLOR='Green'):

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE NOT(COLOR='Green');

FRUIT      COLOR
---------- ----------
Apple      Red
Mango      Yellow
Mangosteen Purple
Orange     Orange

If you're used to regular Boolean logic, this result may seem strange, because between the fruits that are green and the fruits that are not, there appears to be some fruit missing. The entry for durian doesn't appear in either of these tables, because its color is unknown, NULL.

AND

AND is used to combine two conditions. In order to satisfy the combined condition, both must be true. If both are TRUE, the combination is TRUE. If either (or both) are FALSE, the combination is FALSE. If either (or both) are NULL, the combination is NULL.

Table 3-7 is the AND truth table:

Table 3-7. Truth Table for AND
AND	TRUE	FALSE	NULL
TRUE	TRUE	FALSE	NULL
FALSE	FALSE	FALSE	NULL
NULL	NULL	NULL	NULL

Here is a query combining conditions with AND:

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE FRUIT ='Apple' AND COLOR='Green';

FRUIT      COLOR
---------- ----------
Apple      Green

Only one entry, obviously, is both green and an apple. We already know that the other fruits are either not apples or not green. This doesn't demonstrate anything about the NULL entries in the truth table, however.

Because we previously demonstrated that NOT(NULL) evaluates to NULL, we can demonstrate that TRUE AND NULL evaluates to NULL in the following query:

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE FRUIT='Durian' AND NOT(COLOR='Green');

no rows selected

Because we know there is a row that satisfies the condition FRUIT='Durian', the condition NOT(COLOR='Green') is either FALSE or NULL for that row. If it's FALSE and we negate it again, it will be TRUE, and the following query should return this row:

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE FRUIT='Durian' AND NOT(NOT(COLOR='Green'));

no rows selected

Because it still doesn't return anything, we know that the original condition, NOT(COLOR='Green') was NULL. We still don't know, though, whether the row wasn't returned because the combined condition is NULL or because it is FALSE. We can find that out by negating the combined condition and seeing whether we get the durian in the results:

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE NOT(FRUIT='Durian' AND NOT(COLOR='Green'));

FRUIT      COLOR
---------- ----------
Apple      Green
Apple      Red
Mango      Yellow
Mangosteen Purple
Orange     Orange

If the combined condition had originally evaluated to FALSE, negating it would have made it TRUE. Because we still don't get the durian row in the results, we know that TRUE AND NULL evaluate to NULL.

OR

OR is also used to combine two conditions. In order to satisfy the combined condition, one (or both) must be TRUE. If both are FALSE, the result is FALSE. If one is UNKNOWN, and the other is FALSE or UNKNOWN, the result is UNKNOWN.

Table 3-8 is the OR truth table:

Table 3-8. Truth Table for OR
OR	TRUE	FALSE	UNKNOWN
TRUE	TRUE	TRUE	TRUE
FALSE	TRUE	FALSE	UNKNOWN
UNKNOWN	TRUE	UNKNOWN	UNKNOWN

Here is a sample query with OR:

SQL> SELECT FRUIT, COLOR FROM FRUITS
  2  WHERE FRUIT ='Durian'  OR COLOR='Green';

FRUIT      COLOR
---------- ----------
Apple      Green
Durian

We can see here, from the presence of the durian row, that if one of the conditions is NULL and the other is TRUE, the combined result is still TRUE.