Postgres

# postgres super user account is `postgres` When first creating a server instance the only database that will be in the server is also `postgres` ## Resources - [Official Documents][https://www.postgresql.org/docs/current/index.html] ## Schemas ```sql CREATE SCHEMA rpt AUTHORIZATION postgres; COMMENT ON SCHEMA rpt IS 'Reporting'; GRANT ALL PRIVILEGES ON SCHEMA rpt TO postgres; ALTER USER postgres SET search_path TO rpt, public; ``` ## Setup ### Create database Create server instance > right click on databases > new database ### Create Tables Expand your databse > Expand Schemas > Tables > Create > table creation GUI #### Constraints and more When in the column menu of the table GUI you can select the edit icon next to any column and it will open a sub-menu with items that allow you to do things like add Constraints ## Stores Procedure ```sql CREATE OR REPLACE FUNCTION rpt."FN_Nuke_From_Orbit"() RETURNS void LANGUAGE 'plpgsql' AS $ BEGIN TRUNCATE TABLE rpt.report_data; TRUNCATE TABLE rpt.report_data_clean; TRUNCATE TABLE rpt.location_trended; TRUNCATE TABLE rpt.location_top; END; $; ALTER FUNCTION rpt."FN_Nuke_From_Orbit"() OWNER TO postgres; COMMENT ON FUNCTION rpt."FN_Nuke_From_Orbit"() IS 'Wipe all data in the rot schema from the Face of the earth'; CREATE OR REPLACE PROCEDURE rpt."USP_Refresh"() LANGUAGE 'plpgsql' AS $ BEGIN /********************************************************* USAGE: This stored procedure can be run at any interval desired. Since this performs a full truncate and load of the entire rpt schema and all table data this will be expensive in resources. Recommendation is, depending on geographic factors of store locations affecting end of day accounting totals, to run the stored procedure in the off hours such as at midnight or directly after C.O.B. so the day's results are posted immediately after the conclusion of that day's business. *********************************************************/ /********************************************************* Wipe The entire structure of the rpt schema for a clean full rebuild via truncate and load ETL *********************************************************/ PERFORM rpt."FN_Nuke_From_Orbit"(); /********************************************************* Grab all raw data and kick off the live rebuild process driven by triggers. *********************************************************/ INSERT INTO rpt.report_data SELECT p.payment_date , a.address , p.amount FROM public.payment AS p LEFT JOIN public.staff AS S ON s.staff_id = p.staff_id LEFT JOIN public.store AS st ON st.store_id = s.store_id LEFT JOIN public.address AS a ON a.address_id = st.address_id; COMMIT; END; $; COMMENT ON PROCEDURE rpt."USP_Refresh"() IS 'Refresh the entire reporting structure'; ``` ## Tables ```sql CREATE TABLE rpt.report_data ( "payment_date" timestamp without time zone NOT NULL, "address" character varying (255) NOT NULL, "amount" numeric ); ALTER TABLE rpt.report_data OWNER to postgres; ``` ## Triggers ```sql CREATE OR REPLACE FUNCTION rpt.FN_Clean_Data() RETURNS trigger LANGUAGE 'plpgsql' AS $ BEGIN INSERT INTO rpt.report_data_clean SELECT DATE_PART('year', payment_date) AS Year , address AS Location , CAST(amount AS money) AS Revenue FROM rpt.report_data; RETURN NULL; END; $; ALTER FUNCTION rpt.FN_Clean_Data() OWNER TO postgres; COMMENT ON FUNCTION rpt.FN_Clean_Data() IS 'New data in rpt.report_data gets cleaned and inserted into rpt.report_data_clean'; CREATE TRIGGER TR_ETL AFTER INSERT ON rpt.report_data FOR STATEMENT EXECUTE FUNCTION rpt.FN_Clean_Data(); COMMENT ON TRIGGER TR_ETL ON rpt.report_data IS 'Update reports when new data is added to the rpt.report_data table'; /*========================================================================*/ CREATE OR REPLACE FUNCTION rpt.FN_ETL() RETURNS trigger LANGUAGE 'plpgsql' AS $ BEGIN TRUNCATE TABLE rpt.location_trended; INSERT INTO rpt.location_trended SELECT "Year", "Location", SUM("Revenue") FROM rpt.report_data_clean GROUP BY "Year", "Location"; TRUNCATE TABLE rpt.location_top; INSERT INTO rpt.location_top SELECT "Year", "Location", SUM("Revenue") , RANK() OVER(ORDER BY "Year" DESC, SUM("Revenue") DESC) AS "Rank" FROM rpt.report_data_clean WHERE "Year" IN ( -- CAST(DATE_PART('year', NOW()) AS INT) -- Current Year -- , CAST(DATE_PART('year', NOW()) AS INT) - 1 -- Prior Year (SELECT MAX("Year") FROM rpt.report_data_clean) , (SELECT MAX("Year") - 1 FROM rpt.report_data_clean) ) GROUP BY "Year", "Location"; RETURN NULL; END $; ALTER FUNCTION rpt.FN_ETL() OWNER TO postgres; COMMENT ON FUNCTION rpt.FN_ETL() IS 'New data in rpt.report_data_clean so update all reports'; CREATE TRIGGER TR_Update_Reports AFTER INSERT ON rpt.report_data_clean FOR STATEMENT EXECUTE FUNCTION rpt.FN_ETL(); COMMENT ON TRIGGER TR_Update_Reports ON rpt.report_data_clean IS 'Update reports when new data is added to the rpt.report_data_clean table'; ``` ## Data Types ### Numeric #### Serial preferred for auto-incrementing values like primary keys ### Array #### Array data ```sql SELECT ARRAY['Lemon', 'Bat Limited Edition' ] AS example_purchased_products; /* example_purchased_products ------------------------------ {Lemon, "Bat Limited Edition"} */ ``` #### Unnest array ```sql SELECT UNNEST(ARRAY[123, 456, 789]) AS example_ids; ``` ![[2022-01-26-12-29-12.png]] This is a great way to add tags to an item in a single column in a better way than using string split on csv values! #### Array aggregate ```sql SELECT product_type, ARRAY_AGG(DISTINCT model) AS models FROM products GROUP BY 1; ``` ![[2022-01-26-12-31-31.png]] #### String splitting to array ```sql SELECT STRING_TO_ARRAY('hello there how are you?', ' '); ``` ![[2022-01-26-12-32-26.png]] ### Casting unlike [[TSQL]] casting doesnt require the `CAST(column AS TYPE)` function. Instead, within postgres you can cast datatypes like `column::datatype` using the double colon syntax `::` ```sql SELECT product_id , model , year::TEXT , product_type , base_msrp , production_start_date , production_end_date FROM products; ``` ## Functions ### Coalesce > To illustrate a simple usage of the COALESCE function, let's return to the customers table. Let's say the marketing team would like a list of the first names, last names, and phone numbers of all male customers. However, for those customers with no phone number, they would like the table to instead write the value 'NO PHONE'. We can accomplish this request with `COALESCE`: ```sql SELECT first_name , last_name , COALESCE(phone, 'NO PHONE') AS phone FROM customers ORDER BY 1; ``` > When dealing with creating default values and avoiding `NULL`, `COALESCE` will always be helpful. ### Copy > The COPY statement retrieves data from your database and dumps it in the file format of your choosing. For example, take the following statement: #### Data To STDOUT ```sql COPY (SELECT * FROM customers LIMIT 5) TO STDOUT WITH CSV HEADER; ``` - `COPY` is simply the command used to transfer data to a file format. - `(SELECT * FROM customers LIMIT 5)` is the query that we want to copy. - `TO STDOUT` indicates that the results should be printed rather than saved to a file on the hard drive. "Standard Out" is the common term for displaying output in a command-line terminal environment. - `WITH` is an optional keyword used to separate the parameters that we will use in the database-to-file transfer. - `CSV` indicates that we will use the CSV file format. We could have also specified `BINARY` or left this out altogether and received the output in text format. - `HEADER` indicates that we want the header printed as well. #### Data to a file ```sql COPY (SELECT * FROM customers LIMIT 5) TO '/path/to/my_file.csv' WITH CSV HEADER; ``` #### Use psql CLI to get data into the database ```bash psql -h my_host -p 5432 -d my_database -U my_username \copy (SELECT * FROM customers LIMIT 5) TO 'my_file.csv' WITH CSV HEADER: ``` - `\copy` is invoking the Postgres `COPY` ... `TO STDOUT`... command to output the data. - `(SELECT * FROM customers LIMIT 5)` is the query that we want to copy. - `TO 'my_file.csv'` indicates that psql should save the output from standard into my_file.csv. - The `WITH CSV HEADER` parameters operate the same as before. ##### Configuring the copy command - `DELIMITER` 'delimiter_character' can be used to specify the delimiter character for CSV or text files (for example for CSV files, or '`|`' for pipe-separated files) ##### Loading data into a table ```bash \copy customers FROM 'my_file.csv' CSV HEADER DELIMITER ``` ### Least Greatest Unlike aggregate functions such as `MIN` or `MAX` using `LEAST` or `GREATEST` seems to be scalar oriented > Two functions that come in handy for data preparation are the `LEAST` and `GREATEST` functions. Each function takes any number of values and returns the least or the greatest of the values, respectively. > > A simple use of this variable would be to replace the value if it's too high or low. For example, the sales team may want to create a sales list where every scooter is $600 or less than that. We can create this using the following query: ```sql SELECT product_id , model , year , product_type , LEAST(600.00, base_msrp) AS base_msrp , production_start_date , production_end_date FROM products WHERE product_type='scooter' ORDER BY 1; ``` ### Now #### Details `NOW()` is Like `GETDATE()` in [[TSQL]] returns a `DATETIME` value but also with the relevant timezone info if casted to that. #### Usage ```sql SELECT NOW(); ``` ### Window Functions Examples of window functions ```sql SELECT {columns} , {window_func} OVER (PARTITION BY {partition_key} ORDER BY {order_key}) FROM table1; ``` more realistic example: ```sql SELECT customer_id , title , first_name , last_name , gender , COUNT(*) OVER (PARTITION BY gender ORDER BY customer_id) AS total_customers , SUM(CASE WHEN title IS NOT NULL THEN 1 ELSE 0 END) OVER (PARTITION BY gender ORDER BY customer_id) AS total_customers_title FROM customers ORDER BY customer_id; ``` The window Keyword: ```sql SELECT customer_id , title , first_name , last_name , gender , COUNT(*) OVER w AS total_customers, , SUM(CASE WHEN title IS NOT NULL THEN 1 ELSE 0 END) OVER w AS total_customers_title FROM customers WINDOW w AS (PARTITION BY gender ORDER BY customer_id) -- Reduces typing and improves legibility ORDER BY customer_id; ``` ## Indexes ```sql CREATE UNIQUE INDEX "CIX_Year_Loc" ON rpt.location_trended USING btree ("Year" ASC NULLS LAST, "Location" COLLATE pg_catalog."default" ASC NULLS LAST) INCLUDE ("Year", "Location") TABLESPACE pg_default; ALTER TABLE rpt.location_trended CLUSTER ON "CIX_Year_Loc"; ``` ## Tips and Tricks ### date intervals ```sql SELECT TIMESTAMP '2016-03-01 00:00:00' + INTERVAL '7 days' AS new_date; /* new_date ------------------- 2016-03-08 00:00:00 (1 row) */ ``` ### extract date parts ```sql SELECT current_date , EXTRACT(dow FROM current_date) AS day_of_week , EXTRACT(week FROM current_date) AS week_of_year , EXTRACT(quarter FROM current_date) AS quarter; /* current_date | day_of_week | week | quarter -------------+-------------+------+-------- 2019-04-28 | 0 | 17 | 2 (1 row) */ ``` ### query tuning with explain ```sql EXPLAIN(FORMAT JSON) SELECT * FROM public.test WHERE hello IN (SELECT hello FROM public.test ORDER BY hello DESC LIMIT 1); ``` ![[2022-01-26-10-01-27.png]] Using `EXPLAIN(ANALYZE)` and then pasting the output of the plan into this tool gives like a flame graph: <https://explain.depesz.com/> ![[2022-01-26-10-09-32.png]] This tool is also amazing for displaying graphics on a plan <https://explain.dalibo.com/> ![[2022-01-26-10-30-43.png]] Setting EXPLAIN to be done automatically (must be a super user to load the module) ```sql LOAD 'auto_explain'; ``` `Explain` will just provide the plan calculations Running `Analyze` as well will actually execute the statement so if running on a `DELETE` statement be careful if just testing ### returning data you touched ```sql INSERT INTO person (name) VALUES ('Groot') RETURNING id; /* id ---- 4 (1 row) */ ``` ### returning newly inserted record #### Return Newly Inserted Record When inserting a new record into a table, if you also want that record immediately returned to the output viewer after insertion just add the `RETURNING` keyword ##### Before ```sql INSERT INTO products ( col1, col2, col3 ) VALUES ( val1, val2, val3 ); ``` ##### After ```sql -- This returns all fields of the newly inserted record but you can also make -- the returned value something like just the new PK ID of the record INSERT INTO products ( col1, col2, col3 ) VALUES ( val1, val2, val3 ) RETURNING *; ``` ### writeable cte ![[2022-01-26-10-39-53.png]] ```sql -- Delete rows while simultaneously -- inserting them elsewhere WITH moved_rows AS ( DELETE FROM products WHERE date >= '2010-10-01' AND date < '2010-11-01' RETURNING * ) INSERT INTO products_log SELECT * FROM moved_rows; ``` ## pgagent and pgadmin ### pgadmin #### Setup PGAdmin Stores your passwords for all Postgres server instances. Password provided for this is not the same as the installation password for Postgres ### pgagent Reference: <https://severalnines.com/database-blog/overview-job-scheduling-tools-postgresql> Like [[sql-agent]] it is a job scheduling agent available for PostgreSQL that allows the execution of stored procedures, SQL statements, and shell scripts. The purpose is to have this agent running as a daemon on Linux systems and periodically does a connection to the database to check if there are any jobs to execute. This scheduling is easily managed by PgAdmin 4, but it’s not installed by default once the pgAdmin installed, it’s necessary to download and install it on your own. #### Step 1 Installation of pgAdmin 4 ```bash sudo apt install pgadmin4 pgadmin4-apache ``` #### Step 2 Creation of plpgsql procedural language if not defined ```sql CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql' HANDLER plpgsql_call_ handler HANDLER plpgsql validator; ``` #### Step 3 Installation of pgAgent ```bash sudo apt-get install pgagent ``` #### Step 4 Creation of the pgagent extension ```sql CREATE EXTENSION pageant ``` In order to define a new job, it's only necessary select "Create" using the right button on "pgAgent Jobs", and it'll insert a designation for this job and define the steps to execute it: ![[2022-01-14-23-43-37.png]] ![[2022-01-14-23-44-04.png]] ![[2022-01-14-23-44-19.png]] #### Step 5 Finally, to have the agent running in the background it's necessary to launch the following process manually: ```bash /usr/bin/pgagent host=localhost dbname=postgres user=postgres port=5432 -l 1 ``` ## workflow ### Adding Data To a Table manually **When** you select values from a table you can edit the output and add new data once you want to "commit" the data you click the icon to save the data to the table: ![[2022-01-10-14-30-12.png]] ### migrating databases To migrate a database: 1. Right click on the database on the source server 2. Backup 3. Save the file as *.sql 4. Create a new database on the target server 5. Right click the new database and select `restore` 6. Choose the *.sql file you made to restore from