Document data modeling to avoid write skew anomalies

In a previous post, I explained that a serializable isolation level is not the only solution to avoid write skew anomalies. Normalizing the database schema, I converted the read-write conflict into a write-write conflict. This strategy allows us to achieve serialization within the read-committed isolation level using a SELECT FOR UPDATE.
I used Martin Kleppmann’s example of doctors’ on-call shifts to demonstrate write skew anomalies.

Document data modeling is often confused with denormalization, but this is inaccurate. I will store the doctors’ on-call shifts in a JSONB document in this post. Using a single table as a document database allows me to implement the same solution as the normalized schema to prevent write skew.

I create a single table to store the shifts. The doctors are stored in a JSONB array containing their names and on-call status:

yugabyte=# create table shifts (
 shift_id int primary key
 ,doctors jsonb
);
CREATE TABLE

yugabyte=# insert into shifts (shift_id, doctors)
values (1, '[
 {"name": "Alice", "on_call": true},
 {"name": "Bob",   "on_call": true}
]'::jsonb)
returning shift_id, jsonb_pretty (doctors) as doctors
;

 shift_id |         doctors
----------+--------------------------
        1 | [                       +
          |     {                   +
          |         "name": "Alice",+
          |         "on_call": true +
          |     },                  +
          |     {                   +
          |         "name": "Bob",  +
          |         "on_call": true +
          |     }                   +
          | ]
(1 row)

INSERT 0 1

Bob starts a transaction and checks the on-call doctors for shift 1:

yugabyte=# -- Bob

yugabyte=# begin isolation level read committed;
BEGIN

yugabyte=*# select shift_id, doctor
 from shifts
    , jsonb_array_elements(doctors) as doctor
 where shift_id = 1 and doctor->>'on_call' = 'true'
 for update
;

 shift_id |               doctor
----------+------------------------------------
        1 | {"name": "Alice", "on_call": true}
        1 | {"name": "Bob", "on_call": true}
(2 rows)

As there are two on-call doctors, Bob can release himself, and another doctor will remain:

yugabyte=# -- Bob

yugabyte=*# update shifts
set doctors = jsonb_set(
  doctors,
  (
    -- get the array index ( 1-based provided by ordinality to 0-based for json_set)
    select format('{%s,on_call}', idx - 1)::text[]
    from jsonb_array_elements(doctors) with ordinality arr(doc, idx)
    where doc->>'name' = 'Bob'
  ),
  'false'::jsonb
)
where shift_id = 1
returning shift_id, jsonb_pretty (doctors) as doctors;


 shift_id |         doctors
----------+--------------------------
        1 | [                       +
          |     {                   +
          |         "name": "Alice",+
          |         "on_call": true +
          |     },                  +
          |     {                   +
          |         "name": "Bob",  +
          |         "on_call": false+
          |     }                   +
          | ]
(1 row)

UPDATE 1

Updating a JSON document in an SQL database is not straightforward. I will do the same in a future blog post with a NoSQL API to see if it is easier.

This configuration is valid and Bob can commit his changes. However, at the same time Alice has the same idea and checks the on-call doctors for the same shift:

yugabyte=# -- Alice

yugabyte=# begin isolation level read committed;
BEGIN

yugabyte=*# select shift_id, doctor
 from shifts
    , jsonb_array_elements(doctors) as doctor
 where shift_id = 1 and doctor->>'on_call' = 'true'
 for update
;

...

Because of transaction isolation, Alice cannot see Bob’s ongoing changes. However, when she uses SELECT FOR UPDATE, the database waits to see if Bob’s transaction will commit or roll back so that an accurate result can be returned to her.

Bob terminates his transaction with a commit to make his changes durable and visible to others:

yugabyte=# -- Bob

yugabyte=*# commit;
COMMIT

Immediately, Alice’s transaction continues and displays the current state of the database:

yugabyte=# -- Alice

yugabyte=# select shift_id, doctor
yugabyte-#  from shifts
yugabyte-#     , jsonb_array_elements(doctors) as doctor
yugabyte-#  where shift_id = 1 and doctor->>'on_call' = 'true'
yugabyte-#  for update
yugabyte-# ;

...

 shift_id |               doctor
----------+------------------------------------
        1 | {"name": "Alice", "on_call": true}
(1 row)

Alice sees that she is the only on-call doctor for this shift. She must cancel her attempt step back from being on-call:

yugabyte=# -- Alice

yugabyte=# rollback;
ROLLBACK

The state of the database is consistent, with one doctor on-call:

yugabyte=# select shift_id, jsonb_pretty (doctors) as doctors
from shifts
;
 shift_id |         doctors
----------+--------------------------
        1 | [                       +
          |     {                   +
          |         "name": "Alice",+
          |         "on_call": true +
          |     },                  +
          |     {                   +
          |         "name": "Bob",  +
          |         "on_call": false+
          |     }                   +
          | ]
(1 row)

yugabyte=#

Understanding how your database works, mainly locking and conflict detection, is essential for getting consistent results with high performance. YugabyteDB maintains runtime compatibility with PostgreSQL, so you won’t need to learn new behaviors. Contrary to others (like Spanner, CockroachDB, or AWS Aurora DSQL whose compatibility is limited to the wire protocol and dialect), YugabyteDB operates like PostgreSQL, supporting all isolation levels, explicit locking, and the same data types, including JSON and JSONB.

At the logical level, we identify two business entities: shifts and doctors, which are connected through a one-to-many relationship, the on-call status of a doctor for each shift. In this and the previous post, we have explored three data modeling options: two tables connected by a foreign key, a single doctor table with a shift attribute, and a single shift table with doctors in an embedded document. The appropriate choice depends on data volume, access patterns, and performance needs.

I used name attributes like “Bob” and “Alice,” which might introduce data duplication since doctors are likely stored in another collection or table. However, such applications typically use immutable keys, like UUIDs, and are not subject to update anomalies, which is the main reason for normalization.

Embedding documents can provide a consistent and efficient alternative to the traditional normal forms approach. Document databases are particularly well-suited for scenarios where related data is frequently accessed together and must be locked together, as in this example.

Source link
lol

Document data modeling to avoid write skew anomalies

By stp2y

Leave a Reply Cancel reply