40.3. Materialized Views

Materialized views in PostgreSQL use the rule system like views do, but persist the results in a table-like form. The main differences between:

  1. CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM mytab;

and:

  1. CREATE TABLE mymatview AS SELECT * FROM mytab;

are that the materialized view cannot subsequently be directly updated and that the query used to create the materialized view is stored in exactly the same way that a view’s query is stored, so that fresh data can be generated for the materialized view with:

  1. REFRESH MATERIALIZED VIEW mymatview;

The information about a materialized view in the PostgreSQL system catalogs is exactly the same as it is for a table or view. So for the parser, a materialized view is a relation, just like a table or a view. When a materialized view is referenced in a query, the data is returned directly from the materialized view, like from a table; the rule is only used for populating the materialized view.

While access to the data stored in a materialized view is often much faster than accessing the underlying tables directly or through a view, the data is not always current; yet sometimes current data is not needed. Consider a table which records sales:

  1. CREATE TABLE invoice (
  2. invoice_no integer PRIMARY KEY,
  3. seller_no integer, -- ID of salesperson
  4. invoice_date date, -- date of sale
  5. invoice_amt numeric(13,2) -- amount of sale
  6. );

If people want to be able to quickly graph historical sales data, they might want to summarize, and they may not care about the incomplete data for the current date:

  1. CREATE MATERIALIZED VIEW sales_summary AS
  2. SELECT
  3. seller_no,
  4. invoice_date,
  5. sum(invoice_amt)::numeric(13,2) as sales_amt
  6. FROM invoice
  7. WHERE invoice_date < CURRENT_DATE
  8. GROUP BY
  9. seller_no,
  10. invoice_date
  11. ORDER BY
  12. seller_no,
  13. invoice_date;
  14. CREATE UNIQUE INDEX sales_summary_seller
  15. ON sales_summary (seller_no, invoice_date);

This materialized view might be useful for displaying a graph in the dashboard created for salespeople. A job could be scheduled to update the statistics each night using this SQL statement:

  1. REFRESH MATERIALIZED VIEW sales_summary;

Another use for a materialized view is to allow faster access to data brought across from a remote system through a foreign data wrapper. A simple example using file_fdw is below, with timings, but since this is using cache on the local system the performance difference compared to access to a remote system would usually be greater than shown here. Notice we are also exploiting the ability to put an index on the materialized view, whereas file_fdw does not support indexes; this advantage might not apply for other sorts of foreign data access.

Setup:

  1. CREATE EXTENSION file_fdw;
  2. CREATE SERVER local_file FOREIGN DATA WRAPPER file_fdw;
  3. CREATE FOREIGN TABLE words (word text NOT NULL)
  4. SERVER local_file
  5. OPTIONS (filename '/usr/share/dict/words');
  6. CREATE MATERIALIZED VIEW wrd AS SELECT * FROM words;
  7. CREATE UNIQUE INDEX wrd_word ON wrd (word);
  8. CREATE EXTENSION pg_trgm;
  9. CREATE INDEX wrd_trgm ON wrd USING gist (word gist_trgm_ops);
  10. VACUUM ANALYZE wrd;

Now let’s spell-check a word. Using file_fdw directly:

  1. SELECT count(*) FROM words WHERE word = 'caterpiler';
  2. count
  3. -------
  4. 0
  5. (1 row)

With EXPLAIN ANALYZE, we see:

  1. Aggregate (cost=21763.99..21764.00 rows=1 width=0) (actual time=188.180..188.181 rows=1 loops=1)
  2. -> Foreign Scan on words (cost=0.00..21761.41 rows=1032 width=0) (actual time=188.177..188.177 rows=0 loops=1)
  3. Filter: (word = 'caterpiler'::text)
  4. Rows Removed by Filter: 479829
  5. Foreign File: /usr/share/dict/words
  6. Foreign File Size: 4953699
  7. Planning time: 0.118 ms
  8. Execution time: 188.273 ms

If the materialized view is used instead, the query is much faster:

  1. Aggregate (cost=4.44..4.45 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1)
  2. -> Index Only Scan using wrd_word on wrd (cost=0.42..4.44 rows=1 width=0) (actual time=0.039..0.039 rows=0 loops=1)
  3. Index Cond: (word = 'caterpiler'::text)
  4. Heap Fetches: 0
  5. Planning time: 0.164 ms
  6. Execution time: 0.117 ms

Either way, the word is spelled wrong, so let’s look for what we might have wanted. Again using file_fdw:

  1. SELECT word FROM words ORDER BY word <-> 'caterpiler' LIMIT 10;
  2. word
  3. ---------------
  4. cater
  5. caterpillar
  6. Caterpillar
  7. caterpillars
  8. caterpillar's
  9. Caterpillar's
  10. caterer
  11. caterer's
  12. caters
  13. catered
  14. (10 rows)
  1. Limit (cost=11583.61..11583.64 rows=10 width=32) (actual time=1431.591..1431.594 rows=10 loops=1)
  2. -> Sort (cost=11583.61..11804.76 rows=88459 width=32) (actual time=1431.589..1431.591 rows=10 loops=1)
  3. Sort Key: ((word <-> 'caterpiler'::text))
  4. Sort Method: top-N heapsort Memory: 25kB
  5. -> Foreign Scan on words (cost=0.00..9672.05 rows=88459 width=32) (actual time=0.057..1286.455 rows=479829 loops=1)
  6. Foreign File: /usr/share/dict/words
  7. Foreign File Size: 4953699
  8. Planning time: 0.128 ms
  9. Execution time: 1431.679 ms

Using the materialized view:

  1. Limit (cost=0.29..1.06 rows=10 width=10) (actual time=187.222..188.257 rows=10 loops=1)
  2. -> Index Scan using wrd_trgm on wrd (cost=0.29..37020.87 rows=479829 width=10) (actual time=187.219..188.252 rows=10 loops=1)
  3. Order By: (word <-> 'caterpiler'::text)
  4. Planning time: 0.196 ms
  5. Execution time: 198.640 ms

If you can tolerate periodic update of the remote data to the local database, the performance benefit can be substantial.