<!-- ACTIVE -->
> My primary use case for indexed views is using them to mirror data warehouse tables. I do not allow anybody that is not a data engineer access to the original tables. Not even on a read only basis. I create the index views for reports analyst. These are people that know SQL, but have no need to see the audit columns used to run the database.
>
> -- <https://tutorials.massstreet.net/v/transact-sql/advanced-topics/lesson-36.-indexed-views>
## ETL Developers Field Guide
### Python Software Engineering Considerations
Use [[DevLog/Z/yaml|yaml]] configuration files for python [[etl]]
**Directory Permissions**
You should use a Proxy to run Python just like you would SSIS. The Windows account that is being used for the credential has to be explicitly named in the folder permissions. For me, I was in dev and was confused because I was an administrator and administrator was named. It had to be my specific login.
**Absolute File Paths**
When executing from SQL Server Agent, your script actually executes at `C:\WINDOWS\system32`. That means you can't use `os.getcwd()` as the base of file paths. Instead, if you use absolute paths, everything works fine.
### EDW ETL Overview
![[2022-07-06-16-39-18.png]]
#### Acquire
Each ETL process that pulls data into the system is entirely independent of any other process. Each dataset is pulled from its source system and placed in a staging table.
#### Consolidate
All data that is collected in staging tables is moved to common model tables together. The common model is a unified representation of all data across systems.
#### Integrate
Moving data from consolidate to integrate is accomplished in the same step as moving from acquire to consolidate. Integrate is where we move the data from the common model into the warehouse tables.
#### Deliver
At this time, there are no OLAP cubes in the data architecture. Data is delivered through various vectors. Most of those vectors pass through the warehouse. A few bypass the warehouse and go straight to de-normalized reporting tables.
### ETL Environment Databases
- `EDW`: The Data Warehouse
- `ODS`: Operational Data Store
- This is where all ETL functions take place.
- It is where staging tables and the tables that support the common model live.
- The stored procedures that perform ETL live here as well as any views that support monitoring of the ETL processes.
- This database is not accessible to business users.
- `REPORTING`:
- The database consists of de-normalized reporting tables and views that are built from tables in the data warehouse.
- This database allows rapid access to data without having to build complex reports.
- Every model in production, is represented here as a de-normalized view.
- `SSISManagement`:
- SSISManagment is the database that holds the logging information from the execution of SSIS packages.
- This database is FAR more useful and user friendly than the execution information provided by SQL Server.
### Data Acquisition Paradigms
#### Demand Pull
- trigged by the data warehouse environment.
- The grand majority of your processes should be demand pull.
- Demand pull is when the system reaches out to other systems on a schedule defined by whatever is stet in your specific orchestration tool.
- The key element being that the data warehouse box has access to the remote system.
- For example, a database to database pull that is executed via a linked server.
#### Supply Push
- Supply push processes are initiated by processes external to the data warehouse.
- The general scenario here is data is usually dropped by a 3rd party into a folder that sits on an edge server.
- That folder is then scanned by data warehouse processes and moved into the `InterfacesAndExtracts` folder.
- Supply push processes should be rare and only done when there are no other options.
- A good example are bank feeds like bank reconciliation files.
- Those are published are published by the bank and the data warehouse does not have the ability to pull the data from the source system.
### The Common Model
> The common model represents the unified data model for the enterprise. The common model is necessary to process master data.
### The Semantic Layer
- The semantic layer exist in the Reporting database.
- The Reporting database is filled with a bunch of de-normalized 2 dimensional datasets.
- These datasets, rather they be views, or physical tables, are build with pretty names.
- `DimCustomer.FirstName` becomes `Customer.[First Name]`.
- This is so when the user connects to Reporting with their BI tool, they can simply click and drag ready made easy to read columns for their reports.
### Database Object Naming Conventions
<https://github.com/ktaranov/sqlserver-kit/blob/master/SQL%20Server%20Name%20Convention%20and%20T-SQL%20Programming%20Style.md>
| Object | Code | Notation | Length | Plural | Prefix | Suffix | Abbreviation | Char Mask | Example |
|------------------------------------------|------|------------|-------:|--------|---------|--------|--------------|--------------|---------------------------------------|
| [Database] | | UPPERCASE | 30 | No | No | No | Yes | [A-z] | `MYDATABASE` |
| [Schema] | | lowercase | 30 | No | No | No | Yes | [a-z][0-9] | `myschema` |
| [Global Temporary Table] | | PascalCase | 117 | No | No | No | Yes | ##[A-z][0-9] | `##MyTable` |
| [Local Temporary Table] | | PascalCase | 116 | No | No | No | Yes | #[A-z][0-9] | `#MyTable` |
| [File Table] | | PascalCase | 128 | No | `FT_` | No | Yes | [A-z][0-9] | `FT_MyTable` |
| [Memory-optimized SCHEMA_AND_DATA Table] | | PascalCase | 128 | No | `MT_` | `_SD` | Yes | [A-z][0-9] | `MT_MyTable_SD` |
| [Memory-optimized SCHEMA_ONLY Table] | | PascalCase | 128 | No | `MT_` | `_SO` | Yes | [A-z][0-9] | `MT_MyTable_SO` |
| [Temporal Table] | | PascalCase | 128 | No | No | `_TT` | Yes | [A-z][0-9] | `MyTable_TT` |
| [Disk-Based Table] | U | PascalCase | 128 | No | No | No | Yes | [A-z][0-9] | `MyTable` |
| [Disk-Based Wide Table - SPARSE Column] | U | PascalCase | 128 | No | No | `_SPR` | Yes | [A-z][0-9] | `MyTable_SPR` |
| [Table Column] | | PascalCase | 128 | No | No | No | Yes | [A-z][0-9] | `MyColumn` |
| [Table Column SPARSE] | | PascalCase | 128 | No | No | `_SPR` | Yes | [A-z][0-9] | `MyColumn_SPR` |
| [Columns Check Constraint] | C | PascalCase | 128 | No | `CTK_` | No | Yes | [A-z][0-9] | `CTK_MyTable_MyColumn_AnotherColumn` |
| [Column Check Constraint] | C | PascalCase | 128 | No | `CK_` | No | Yes | [A-z][0-9] | `CK_MyTable_MyColumn` |
| [Column Default Values] | D | PascalCase | 128 | No | `DF_` | No | Yes | [A-z][0-9] | `DF_MyTable_MyColumn` |
| [Table Primary Key] | PK | PascalCase | 128 | No | `PK_` | No | Yes | [A-z][0-9] | `PK_MyTableID` |
| [Table Unique (Alternative) Key] | UQ | PascalCase | 128 | No | `AK_` | No | Yes | [A-z][0-9] | `AK_MyTable_MyColumn_AnotherColumn` |
| [Table Foreign Key] | F | PascalCase | 128 | No | `FK_` | No | Yes | [A-z][0-9] | `FK_MyTable_ForeignTableID` |
| [Table Clustered Index] | | PascalCase | 128 | No | `CIX_` | No | Yes | [A-z][0-9] | `CIX_MyTable_MyColumn_AnotherColumn` |
| [Table Non Clustered Index] | | PascalCase | 128 | No | `NCIX_` | No | Yes | [A-z][0-9] | `NCIX_MyTable_MyColumn_AnotherColumn` |
| [DDL Trigger] | TR | PascalCase | 128 | No | `TR_` | `_DDL` | Yes | [A-z][0-9] | `TR_LogicalName_DDL` |
| [DML Trigger] | TR | PascalCase | 128 | No | `TR_` | `_DML` | Yes | [A-z][0-9] | `TR_MyTable_LogicalName_DML` |
| [Logon Trigger] | TR | PascalCase | 128 | No | `TR_` | `_LOG` | Yes | [A-z][0-9] | `TR_LogicalName_LOG` |
| [View] | V | PascalCase | 128 | No | `VW_` | No | No | [A-z][0-9] | `VW_LogicalName` |
| [Indexed View] | V | PascalCase | 128 | No | `VWX_` | No | No | [A-z][0-9] | `VWX_LogicalName` |
| [Statistic] | | PascalCase | 128 | No | `ST_` | No | No | [A-z][0-9] | `ST_MyTable_MyColumn_AnotherColumn` |
| [Stored Procedure] | P | PascalCase | 128 | No | `usp_` | No | No | [A-z][0-9] | `usp_LogicalName` |
| [Scalar User-Defined Function] | FN | PascalCase | 128 | No | `udf_` | No | No | [A-z][0-9] | `udf_FunctionLogicalName` |
| [Table-Valued Function] | FN | PascalCase | 128 | No | `tvf_` | No | No | [A-z][0-9] | `tvf_FunctionLogicalName` |
| [Synonym] | SN | camelCase | 128 | No | `sy_` | No | No | [A-z][0-9] | `sy_logicalName` |
| [Sequence] | SO | PascalCase | 128 | No | `sq_` | No | No | [A-z][0-9] | `sq_TableName` |
| [CLR Assembly] | | PascalCase | 128 | No | `CA` | No | Yes | [A-z][0-9] | `CALogicalName` |
| [CLR Stored Procedures] | PC | PascalCase | 128 | No | `pc_` | No | Yes | [A-z][0-9] | `pc_CAName_LogicalName` |
| [CLR Scalar User-Defined Function] | | PascalCase | 128 | No | `cudf_` | No | No | [A-z][0-9] | `cudf_CAName_LogicalName` |
| [CLR Table-Valued Function] | | PascalCase | 128 | No | `ctvf_` | No | No | [A-z][0-9] | `ctvf_CAName_LogicalName` |
| [CLR User-Defined Aggregates] | | PascalCase | 128 | No | `ca_` | No | No | [A-z][0-9] | `ca_CAName_LogicalName` |
| [CLR User-Defined Types] | | PascalCase | 128 | No | `ct_` | No | No | [A-z][0-9] | `ct_CAName_LogicalName` |
| [CLR Triggers] | | PascalCase | 128 | No | `ctr_` | No | No | [A-z][0-9] | `ctr_CAName_LogicalName` |
| [Linked Server] | | PascalCase | 128 | No | `ls_` | No | No | [A-z][0-9] | `ls_LogicalName` |
#### Primary keys
| Object Type | Naming Convention | Naming Model | Example |
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|--------------|
| Dimension Table | The primary keys of dimension tables are named after the name of the dimension without the Dim modifier and adding a CK for contrived key at the end. | `[DimensionNameWithOutDim]CK` | `CustomerCK` |
| Fact Table | MDM table primary keys should share the name of the table and end in EK for enterprise key. All staging tables have identically named primary keys. | `Row` | `RowID` |
| Master Data Table | MDM table primary keys should share the name of the table and end in EK for enterprise key. | `[TableName]EK` | `CustomerEK` |
| Staging Table | All staging tables have identically named primary keys. | `ETLKey` | `ETLKey` |
#### Schemas
Most schemas are predefined by this framework. However, in ODS, staging tables should have schema names that are some sort of recognizable acronym for where the data came from. Below are some examples.
| Schema | Source |
|--------|-----------------|
| `boa` | Bank of America |
| `ss` | Smart Soft |
| `sf` | Sales Force |
#### Stored Procedures
| Process Step | Naming Convention | Naming Model | Example |
|---------------------------|----------------------------------------------------------------------------------------------------------------|---------------------------------|-----------------------------|
| Pull Data | Name should start with the word pull and match the staging table the proc dumps to. | `usp_Pull[DescriptionOfData]` | usp_PullCustomerData |
| Clean Data | Name should start with the word clean and match the staging table the proc dumps to. | `usp_Clean[DescriptionOfData]` | usp_CleanCustomerData |
| Process MDM | Name should start with the word process and MDM and the dimension name it works on. | `usp_ProcessMDM[DimensionName]` | usp_ProcessMDMDimCustomer |
| Process Dimensions | Name should start with the word process and the dimension name it works on. | `usp_Process[DimensionName]` | usp_ProcessDimCustomer |
| Process Facts | Name should start with the word process and match the table name it works on. | `usp_Process[FactTableName]` | usp_ProcessFactCustomer |
| Finalize And Audit | Name should be descriptive. | `usp_[DescribeProcess]` | usp_MarkRecordsAsProcessed |
| Populate Reporting Tables | Name should start with the word load and end with the word table and should have the table name that it loads. | `usp_Load[TableName]Table` | usp_LoadCustomerReportTable |
| Monitoring | Name should describe the process. | `usp_[DescribeProcess]` | usp_DisplayTablesNotLoading |
#### Tables
| Table Type | Naming Convention | Naming Model | Example |
|----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|--------------------------|
| Common Model Tables | Common model tables should be nearly identical to the warehouse tables they mimic but stripped of all | Dim[BaseDimensionTableName] | DimCustomer |
| | indexes, keys, and most constraints with the exception of the index on the column used to match records. | | |
| Conventional Staging Tables | Table names should describe what the data is and end in the word "Data". | [DescriptiveName]Data | CustomerData |
| | Additionally, tables should be assigned to a schema that identifies the source system of the data. | | |
| Dimension Tables | All dimension tables should start with the word "Dim". | Dim[YourDimensionName] | DimCustomer |
| Fact Tables | All fact tables should start with the word "Fact". | Fact[YourFactTableName] | FactCustomerDetail |
| Junk Dimension Tables | All junk dimension names should end in the word "Information" in addition to inheriting the convention for naming standard dimensions. | Dim[YourDimensionName]Information | DimCustomerInformation |
| MDM Tables | MDM tables should be named after the data they are storing and end in MasterData. | [Dimension]MasterData | CustomerMasterData |
| Reporting Tables | Reporting tables should be obviously named with spaces between words. | no model | Customer Activity Report |
| Staging Tables Designed For Historical Loads | Tables that are specially modified to load large amounts of historical data should share | [DescriptiveName]DataHistorical | CustomerDataHistorical |
| | the name of the stage table that manages the normal batch process with the addition of the word "Historical" at the end. | | |
### ETL Reference Architectureure
![[2022-07-07-10-15-08.png]]
#### Pull Data
In general, all processes that acquire data from outside systems pull and clean data in the same step.
#### Data Cleansing
As with any ETL process, some of the data that comes in to the system needs to be cleaned and standardized before it is loaded. Cleaning dates is a common task.
#### Load Warehouse
The actual warehouse load processes is a combination of loading tables and auditing processes that are used to check and make sure everything loaded ok.
As you build ETL processes, you will need to make sure to add code that takes these processes into account.
All dimensions are processed first; then all fact tables are processed.
Fact table loads communicate back to staging to take note of what exactly was loaded to the warehouse.
The Finalize and Audit processes checks staging records and reports discrepancies.
#### Load Reporting Tables
This job is for those processes that load de-normalized reporting tables that reside in the Reporting database only.
The reporting tables that live in ODS are only for data professionals and are loaded by different processes.
#### Monitor DW Table Loads
Check Tables Loads will report on unusual record load amounts.
Either more than usual or none at all over a three day period are the trigger events.
In the final step, volumetric data is dumped into a de-normalized reporting table for easy digest.
### Standard Schemas And Their Definitions
| Schema | Database | Description | Definition |
|--------|----------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| dw | EDW | Data Warehouse | All base data warehouse tables that are physically implemented belong in the dw schema. |
| | | | This is an opinionated approach. |
| | | | The whole point of an EDW is that it is all supposed to be one integrated thing. |
| | | | I have seen data warehouses implemented in a manner that uses schemas to group logically related objects. Mediocre. |
| | | | There is no law that says you can't do that, but it violates the spirit of a data warehouse. |
| cm | ODS | Common Model | All common model tables go in the WAIT FOR IT.....the common model schema. |
| rpt | ODS | Report | Any table that records de-normalized reporting information in support of load management goes in the rpt schema. |
| vol | ODS | Volumetrics | Tables that are involved in statistical control processes used to manage the EDW load go in the vol schema. |
| | | | The difference between vol and rpt is that vol tables are normalized in support of small applications that perform statistical analysis. |
| | | | rpt tables are just flat tables to dump de-normalized data into. |
| mdm | ODS | Master Data Management | Any tables that are involved in the processing of mdm data but are not staging tables go in the mdm schema. |
| ms | ODS | Multi-System | Stage tables that combine data from more than one system go in the ms schema. |
### Feedback and Control Systems
#### Passive Monitoring System Implementation
| Object Name | Object Type | Function |
|---------------------------------|-------------|-----------------------------------------------------|
| vol.Tables | Table | Holds a list of fact tables that the load monitors. |
| vol.LoadObservations | Table | Holds the history of observations for the load. |
| usp_RecordRowCounts | Stored Proc | Populates the volumetric tables. |
| usp_TableLoadMonitoring | Stored Proc | Sends out email alerts of anomalous events. |
| usp_LoadTableLoadReportingTable | Stored Proc | Populates a flat table of the nightly load report. |
### Security Access Model
| User/Database | EDW (dw schema) | EDW (dbo schema) | ODS (all schemas) | Reporting (all schemas) |
|---------------|-----------------|------------------|-------------------|--------------------------|
| Data Engineer | rwx | rwx | rwx | rwx |
| Data Analyst | no access | r | no access | r |
| Business User | no access | no access | no access | r |
---
[Database]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-database-transact-sql
[Schema]: https://docs.microsoft.com/en-us/sql/relational-databases/security/authentication-access/create-a-database-schema
[Global Temporary Table]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables
[Local Temporary Table]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables
[File Table]: https://docs.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server
[Memory-optimized SCHEMA_AND_DATA Table]: https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/introduction-to-memory-optimized-tables
[Memory-optimized SCHEMA_ONLY Table]: https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/defining-durability-for-memory-optimized-objects
[Temporal Table]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
[Disk-Based Table]: https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/comparing-disk-based-table-storage-to-memory-optimized-table-storage
[Disk-Based Wide Table - SPARSE Column]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables#wide-tables
[Table Column]: https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql
[Table Column SPARSE]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-columns
[Columns Check Constraint]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/create-check-constraints
[Column Check Constraint]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/create-check-constraints
[Column Default Values]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-default-values-for-columns
[Table Primary Key]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/create-primary-keys
[Table Unique (Alternative) Key]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/create-unique-constraints
[Table Foreign Key]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/create-foreign-key-relationships
[Table Clustered Index]: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described
[Table Non Clustered Index]: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described
[DDL Trigger]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-trigger-transact-sql
[DML Trigger]: https://docs.microsoft.com/en-us/sql/relational-databases/triggers/dml-triggers
[Logon Trigger]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-trigger-transact-sql
[View]: https://docs.microsoft.com/en-us/sql/relational-databases/views/views
[Indexed View]: https://docs.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views
[Statistic]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-statistics-transact-sql
[Stored Procedure]: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-procedure-transact-sql
[Scalar User-Defined Function]: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/create-user-defined-functions-database-engine#Scalar
[Table-Valued Function]: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/create-user-defined-functions-database-engine#TVF
[Synonym]: https://docs.microsoft.com/en-us/sql/relational-databases/synonyms/synonyms-database-engine
[Sequence]: https://docs.microsoft.com/en-us/sql/relational-databases/sequence-numbers/sequence-numbers
[CLR Assembly]: https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration/assemblies/creating-an-assembly
[CLR Stored Procedures]: https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/clr-stored-procedures
[CLR Scalar User-Defined Function]: https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-functions
[CLR Table-Valued Function]: https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-table-valued-functions
[CLR User-Defined Aggregates]: https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-functions/clr-user-defined-aggregates
[CLR User-Defined Types]: https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-types/clr-user-defined-types
[CLR Triggers]: https://docs.microsoft.com/en-us/sql/relational-databases/triggers/create-clr-triggers
[Linked Server]: https://docs.microsoft.com/en-us/sql/relational-databases/linked-servers/linked-servers-database-engine