Quantcast
Channel: OracleBuffer
Viewing all articles
Browse latest Browse all 58

Oracle: The myth, reality and the curious case of count(*)

$
0
0

Today’s article is all about the popular count(*) method of checking cardinality and is dedicated to novice Oracle DBAs like me. We use count(*) almost everyday at some point to check the number of rows in a particular table.

Sometime it gives us the result quick enough while other times it takes a long time to provide the result. You can google about ‘speed up count(*) oracle‘ and can find a vast number of healthy debate on this topic.

The takeaway from these debates is that there is no way to speed up count(*) operation. All the methods of counting rows be it count(*), count(1) or count(column_name) are in some or the other way the same.

However, if you are dealing with a poorly designed Oracle database system, there is a huge possibility that while counting table rows using the count() functions; you can speed it up. Well, read on to explore….

Demonstration

We are about perform a simulation of different count() methods on a poorly designed, partially ideal and ideal database system and then see the difference as well as explore as to why there is a difference if any.

In the database, I have a table called TEST_COUNT and it has 20000000 row. Lets explore, how much time the different count() methods take to provide the count of rows.

SQL> desc test_count
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER(10)
 NAME                                               VARCHAR2(15)
 GENDER                                             CHAR(1)

Count on a poorly designed Oracle Database

Lets start our simulation with a poorly designed Oracle database.

count(*)

SQL> select count(*) from test_count;

  COUNT(*)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1844084873

-------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 | 15990  (11)| 00:03:12 |
|   1 |  SORT AGGREGATE    |            |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST_COUNT |    20M| 15990  (11)| 00:03:12 |
-------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      52440  consistent gets
      52429  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed 

count(1)

SQL> select count(1) from test_count;

  COUNT(1)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1844084873

-------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 | 15990  (11)| 00:03:12 |
|   1 |  SORT AGGREGATE    |            |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST_COUNT |    20M| 15990  (11)| 00:03:12 |
-------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        215  recursive calls
          0  db block gets
      52476  consistent gets
      52429  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

count(column_name)

SQL> select count(id) from test_count;

 COUNT(ID)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1844084873

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 |     6 | 15990  (11)| 00:03:12 |
|   1 |  SORT AGGREGATE    |            |     1 |     6 |            |          |
|   2 |   TABLE ACCESS FULL| TEST_COUNT |    20M|   114M| 15990  (11)| 00:03:12 |
---------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      52440  consistent gets
      52429  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

As we can observe, all of the three methods have the same execution plan and the same execution cost. All of the three queries are doing a FULL TABLE SCAN to provide the count of table records. So, there is technically no difference between them.

Count on a partially ideal Oracle Database

Now, lets test the three different methods again on a partially ideal oracle database system.

count(*)

SQL> select count(*) from test_count;

  COUNT(*)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1844084873

-------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 | 15990  (11)| 00:03:12 |
|   1 |  SORT AGGREGATE    |            |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST_COUNT |    20M| 15990  (11)| 00:03:12 |
-------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        209  recursive calls
          0  db block gets
      52476  consistent gets
      52429  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

count(1)

SQL> select count(1) from test_count;

  COUNT(1)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1844084873

-------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 | 15990  (11)| 00:03:12 |
|   1 |  SORT AGGREGATE    |            |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TEST_COUNT |    20M| 15990  (11)| 00:03:12 |
-------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      52440  consistent gets
      52429  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

count(column_name)

SQL> select count(id) from test_count;

 COUNT(ID)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 3868281374

---------------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               |     1 |     6 | 11107  (11)| 00:02:14 |
|   1 |  SORT AGGREGATE       |               |     1 |     6 |            |          |
|   2 |   INDEX FAST FULL SCAN| TEST_COUNT_PK |    20M|   114M| 11107  (11)| 00:02:14 |
---------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      40018  consistent gets
      39981  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

I believe, you have already figured it out at this point. There is a better execution plan for the third method i.e. count(column_name). It is now doing a INDEX FAST FULL SCAN rather than a FULL TABLE SCAN. We all know, INDEX scans are always faster than the table scans (There are exceptions. However, those are very rare where table scan is faster than index scan) and hence there is reduction in the execution time as well as the execution cost for the count(column_name)

Now, the question is why the same method was not doing a INDEX FAST FULL SCAN in the earlier example? The reason is there was no index defined on the ID column for the table TEST_COUNT.

Before simulating the second set of example, I had created the following INDEX on ID column of TEST_COUNT table.

SQL> select OWNER,INDEX_NAME,TABLE_NAME,BLEVEL,NUM_ROWS,LAST_ANALYZED,STALE_STATS
  2  from dba_ind_statistics where TABLE_NAME='TEST_COUNT';

OWNER      INDEX_NAME        TABLE_NAME          BLEVEL   NUM_ROWS LAST_ANAL STA
---------- ----------------- --------------- ---------- ---------- --------- ---
MYAPP      TEST_COUNT_PK     TEST_COUNT               2   20000000 16-MAY-15 NO

So, now we know.. if we do a count(column_name) and if the column is indexed, we can have better response time compare to the count(*) and count(1) methods.

Count on an ideal Oracle Database

Now, lets test the three different methods for one last time on an ideal oracle database system.

count(*)

SQL> select count(*) from TEST_COUNT;

  COUNT(*)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 3868281374

-------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               |     1 | 13356  (10)| 00:02:41 |
|   1 |  SORT AGGREGATE       |               |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| TEST_COUNT_PK |    20M| 13356  (10)| 00:02:41 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      44499  consistent gets
      44471  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

count(1)

SQL> select count(1) from TEST_COUNT;

  COUNT(1)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 3868281374

-------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               |     1 | 13356  (10)| 00:02:41 |
|   1 |  SORT AGGREGATE       |               |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| TEST_COUNT_PK |    20M| 13356  (10)| 00:02:41 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      44499  consistent gets
      44471  physical reads
          0  redo size
        526  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

count(column_name)

SQL> select count(id) from TEST_COUNT;

 COUNT(ID)
----------
  20000000


Execution Plan
----------------------------------------------------------
Plan hash value: 3868281374

-------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               |     1 | 13356  (10)| 00:02:41 |
|   1 |  SORT AGGREGATE       |               |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| TEST_COUNT_PK |    20M| 13356  (10)| 00:02:41 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      44499  consistent gets
      44471  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

WOW!!! In this example, all of the three methods are doing a INDEX FAST FULL SCAN to provide the row count. In this ideal Oracle Database, all of the three count() methods have the optimal execution plan.

Now, what changes could have resulted in to this improvement???

Well, in the first set of examples (on a poorly designed oracle database), there was no index on the primary table column (ID) for the TEST_COUNT table. I had added an INDEX on this column for the second set of examples (on a partially ideal Oracle Database) and that helped in improving execution time for the count(column_name) method used for counting table rows. However, in the second set of examples; there was no primary key defined for the table TEST_COUNT. I had just added a primary key on ID column for the TEST_COUNT table in the third set of examples (on an ideal oracle database) as shown below.

SQL> alter table TEST_COUNT add constraint PK_TEST_COUNT primary key (id);

Table altered.

Once, I has added this consistent, the INDEX TEST_COUNT_PK got linked to the primary key (PK_TEST_COUNT) automatically (If index was not there, the ADD CONSTRAINT PRIMARY KEY command would have created an INDEX by default on the primary key column).

SQL> select OWNER,CONSTRAINT_NAME,CONSTRAINT_TYPE,TABLE_NAME,INDEX_NAME,STATUS
  2  from dba_constraints where TABLE_NAME='TEST_COUNT';

OWNER      CONSTRAINT_NAME                C TABLE_NAME      INDEX_NAME        STATUS
---------- ------------------------------ - --------------- ----------------- --------
MYAPP      PK_TEST_COUNT                  P TEST_COUNT      TEST_COUNT_PK     ENABLED

If the Oracle database tables have primary key defined on them (in turn will have index), the count methods count(*) and count(1) will by default utilize the index defined on the primary key while evaluating the query result. Thereby it will improve the execution time.

Now, at this point; you might have understood the reason as to why I had tagged the three set of examples with POORLY DESIGNED, PARTIALLY IDEAL and IDEAL Oracle Database system.

In a ideal Oracle RDBMS system , it is very rare to to have a table without a primary key column and therefore not indexed, whereas in a poorly designed system, there are possibilities of having tables without primary keys or indexes.

One observation, you might have made in all of these three set of examples that the count(*) and count(1) methods of counting rows always had the same set of execution plans whether the column being indexed or not. This is due to the fact that, Oracle database internally converts count(1) to count(*). For a better understanding on count(*) and count(1), you may refer Thomas Kyte’s response here.

Another point to mention here is that, the count(column_name) provides the count of rows having NOT NULL values. Therefore, a count on column_name and count(*) will only provide same count of rows if and only if the column_name being used in count is having no NULL values. If the column has any NULL values, then there would be a difference between count(*) and count(column_name) in terms of row count.


Viewing all articles
Browse latest Browse all 58

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>