I once consulted at a company with a very large monolithic SQL Server. It actual...

agentultra · on Sept 22, 2023

Logical replication, in Postgres, contains all of the information about the change statement (insert/update/delete) in order to logically recreate the same state in another database.

You won't get client-level providence data with each change...

However you could hack around that. The logical replication stream can also include informational messages from the "pg_logical_emit_message" function to insert your own metadata from clients. It might be possible to configure your clients to emit their identifier at the beginning of each transaction.

cpursley · on Sept 22, 2023

I’m not sure how to handle queries, but for inserts/updates I have a column that tracks the event source (last updated by). Maybe this is an anti-pattern - I’d love a more robust solution.

hobs · on Sept 22, 2023

Technically log replication has everything done by everything, and if you are careful with triggers you can also track everything as well, using a DDL/DML capture table(DCL too if you're worried!).

These approaches work on basically every type of SQL solution that uses WAL/triggers.

For your specific question I have a trigger approach many times in SQL Server but it has a tendency to slow things down if you are logging every query so designing an insertion mechanism that doesn't bog down production isn't perfect, and you might want to perform some sampling.

brightball · on Sept 22, 2023

You get a lot just by making sure each application has its own user access.

rjbwork · on Sept 22, 2023

This seems like the right approach to me.

An approach I've taken is temporal tables w/ Application and UpdatedBy fields. That gives you a permanent record of every change, what application did it, and what user performed the action, at what time, and then lets you query the database as if you were querying it at that point in time. You can add triggers to fail CRUD if those fields are not updated if you want to get really paranoid.

There's a lot of overhead to this in terms of storage, so it's not suitable for high-throughput or cost-constrained transactional systems, but it's something for the toolbox.

aidos · on Sept 22, 2023

Even if it doesn't you can start by adding the application name to the connection string and you could probably do something gnarly with triggers to write that in a table and get it pushed in the logical replication.

evntdrvn · on Sept 22, 2023

yeah, for SQL Server the connection string has an ApplicationName property for this purpose, it's pretty useful :)

cpursley · on Sept 22, 2023

This is an interesting idea

drsopp · on Sept 22, 2023

I had the following idea recently: Go through all scripts/programs that send queries to the db and append a comment to the queey containing a unique id for that query that links it to the script/program. The query log hopefully shows the comment with the id so you can trace the origin.

NortySpock · on Sept 22, 2023

For the "I just need to emit and visualize a DAG" problem, at one point I wrote a python script that would filter for the relevant data, emit line-by-line mermaidJS-flavored markdown, and then shove that into an HTML file that imported the MermaidJS library.

The MermaidJS solves for the DAG and visualizes it, and your browser has enough context to let you CTRL-F for any interesting phrases you put in the label.

camgunz · on Sept 22, 2023

I think it's users + pgAudit.