Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 13, 2022 03:01 pm GMT

Is CosmosDB a new SQL database? Is CitusDB a distributed SQL database? Has Hyperscale vanished in the Hyperspace?

Azure Cosmos DB is a fully managed NoSQL database for modern app development is what I've read from the documentation. However, Azure Cosmos DB for PostgreSQL has just been announced like a new service in Azure. PostgreSQL is supposed to be a SQL database, right? I'll test that in this series of posts, where I run the simplest SQL schema, the old EMP/DEPT, showing what any Relational Database for OLTP should support: unique constraints and foreign keys.

There's a free trial, with no credit card required and a 7 days limit

In the "Recommended APIs", NoSQL is recommended, but PostgreSQL is there:
Recommended API

I didn't find immediately how to connect. The "Connectivity Method" has a "copy" button but it puts Public access (allowed IP addresses) in my clipboard, which is not very useful:
Overview

No worry, all is in the "Connection String" pane:
Connect strings

Nothing new, this is Hyperscale (Citus)

This Azure Cosmos DB for PostgreSQL is actually CitusDB. This new service is a marketing rename from Hyperscale. I've been working a lot with Oracle Database. Renaming features by marketing is not new. We can see it from the https://learn.microsoft.com/en-us/azure/postgresql/hyperscale/overview link which is an HTTP redirect to https://learn.microsoft.com/en-us/azure/cosmos-db/postgresql/introduction

Distributed tables

Great, I didn't test CitusDB yet in this blog post series about SQL features on distributed databases. Let's create the same sample schema:

citus=> CREATE TABLE dept (  deptno integer NOT NULL,  dname text,  loc text,  description text,  CONSTRAINT pk_dept PRIMARY KEY (deptno));CREATE TABLEcitus=> CREATE TABLE emp (  empno integer generated by default as identity (start with 10000) NOT NULL,  ename text NOT NULL,  job text,  mgr integer,  hiredate date,  sal integer,  comm integer,  deptno integer NOT NULL,  email text,  other_info json,  CONSTRAINT pk_emp PRIMARY KEY (empno),  CONSTRAINT emp_email_uk UNIQUE (email),  CONSTRAINT fk_deptno FOREIGN KEY (deptno) REFERENCES dept(deptno),  CONSTRAINT fk_mgr FOREIGN KEY (mgr) REFERENCES emp(empno));CREATE TABLE

I start to distribute the tables, following the CitusDB documentation:

citus=> SELECT create_distributed_table('dept', 'deptno'); create_distributed_table--------------------------(1 row)citus=> SELECT create_distributed_table('emp', 'empno');ERROR:  cannot distribute relation: empDETAIL:  Distributed relations must not use GENERATED ... AS IDENTITY.citus=>

Ok, this standard SQL feature exists in PostgreSQL since version 12 (it is also in YugabyteDB even with PostgreSQL 11.2 compatibility by the way) but not it is not supported in CitusDB. Let's try with serial:

citus=> drop table emp;DROP TABLEcitus=> CREATE TABLE emp (  empno serial NOT NULL,  ename text NOT NULL,  job text,  mgr integer,  hiredate date,  sal integer,  comm integer,  deptno integer NOT NULL,  email text,  other_info json,  CONSTRAINT pk_emp PRIMARY KEY (empno),  CONSTRAINT emp_email_uk UNIQUE (email),  CONSTRAINT fk_deptno FOREIGN KEY (deptno) REFERENCES dept(deptno),  CONSTRAINT fk_mgr FOREIGN KEY (mgr) REFERENCES emp(empno));citus=> SELECT create_distributed_table('emp', 'empno');ERROR:  cannot create foreign key constraintDETAIL:  Foreign keys are supported in two cases, either in between two colocated tables including partition column in the same ordinal in the both tables or from distributed to reference tablescitus=>

Another problem, very similar to what I have seen in the first post of this series, we cannot have referential integrity between distributed tables. In this example, dept is a good candidate for a broadcasted reference table, but OLTP system of records have relationships like ORDERS/ITEMS, ACCOUNT/TRANSACTION, CUSTOMER/PAIEMENT. Let's continue with this limitation of not distributing the parent table.

Foreign key limited to reference tables

I change the parent table dept table to not distribute it, as it is not supported for referential integrity, and try to distribute the child table emp:

citus=> SELECT undistribute_table('dept',true);NOTICE:  creating a new table for public.deptNOTICE:  moving the data of public.deptNOTICE:  dropping the old public.deptNOTICE:  renaming the new table to public.dept undistribute_table--------------------(1 row)citus=> SELECT create_reference_table('dept');NOTICE:  local tables that are added to metadata automatically by citus, but not chained with reference tables via foreign keys might be automatically converted back to postgres tablesHINT:  Executing citus_add_local_table_to_metadata($$public.emp$$) prevents this for the given relation, and all of the connected relations create_reference_table------------------------(1 row)citus=> SELECT create_distributed_table('emp', 'empno');ERROR:  cannot create constraint on "emp"DETAIL:  Distributed relations cannot have UNIQUE, EXCLUDE, or PRIMARY KEY constraints that do not include the partition column (with an equality operator if EXCLUDE).

Ok, same problem as with many New SQL sharded databases that are not Distributed SQL: anything that involves cross-shard transactions is unsupported. Foreign keys and Unique constraints cannot be created. There's no global integrity enforced by the database. This is annoying in OLTP system of records where there is, in addition to the generated primary key, multiple natural keys. Those need to be validated to avoid duplicates, especially on highly available databases whre transactions may be retried.

Let's drop all those constraints to see if we can go further:

citus=> alter table emp drop constraint emp_email_uk;ALTER TABLEcitus=> alter table emp drop constraint fk_mgr;ALTER TABLEcitus=> SELECT create_distributed_table('emp', 'empno'); create_distributed_table--------------------------(1 row)

Good, this works. But I had to get rid about foreign key (except to the mostly static reference tables that can be distributed) and all unique secondary indexes.

No Serializable alternative

Without Foreign key, if the application wants to avoid data corruption in a scalable way, it needs to run in Serializable isolation level.

Here is what could be a race condition, one user dropping a manager after checking it has no employees, another user adding an employee:

citus=> -- add the boss in department 10citus=> insert into dept(deptno) values (10);INSERT 0 1citus=> insert into emp(empno,ename,deptno,mgr) values(1, 'boss',10, null);INSERT 0 1citus=> begin transaction isolation level serializable;BEGINcitus=*> -- add an employee referring to the bosscitus=*> insert into emp(empno,ename,deptno,mgr) values(2, 'employee',10, 1);INSERT 0 1-- connect a concurrent sessioncitus=*> \! psqlpsql (13.7, server 14.5)citus=> begin transaction isolation level serializable;BEGINcitus=*> -- check if boss has employeescitus=*> select * from emp; empno | ename | job | mgr | hiredate | sal | comm | deptno | email | other_info-------+-------+-----+-----+----------+-----+------+--------+-------+------------     1 | boss  |     |     |          |     |      |     10 |       |(1 row)citus=*> -- remove the boss as it has no employeescitus=*> delete from emp where empno=1;DELETE 1citus=*> commit;COMMIT-- back to session one commiting its transactioncitus=> \qcitus=*> commit;COMMIT-- final state: one employee with inexistent bosscitus=> select * from emp; empno |  ename   | job | mgr | hiredate | sal | comm | deptno | email | other_info-------+----------+-----+-----+----------+-----+------+--------+-------+------------     2 | employee |     |   1 |          |     |      |     10 |       |(1 row)

This is a surprising result: no error but orphan child. This should not happen with serializable isolation level.

We should not have to read the documentation to know that an successful command actually did nothing. I can find it in the list of CitusDB limitations: https://docs.citusdata.com/en/v11.1/admin_guide/table_management.html?highlight=serializable#limitations.
I also see No support for tuple locks there, so no need to test for other alternatives to foreign keys. Locking whole tables is not a solution.

Being Distributed and PostgreSQL compatible is not easy. There are not a lot of players in this area which, like YugabyteDB, support all global constraints, referential integrity, isolation levels like PostgreSQL does. Actually, I know only YugabyteDB doing that but please comment if you think there are others.

CitusDB, like many sharded databases, is good for datawarhouses. They partition the data first and distribute the queries to a few shards, possibly each replicated with a primary-standby. This is sharding on top of monolithic SQL databases.

Distributed SQL like YugabyteDB, on the opposite, provide all SQL features on top of a distributed and replicated storage, which is seen as one global database with global ACID transactions. I detailed this architecture in https://www.yugabyte.com/blog/distributed-sql-yugabytedb-two-layer-architecture/


Original Link: https://dev.to/yugabyte/is-cosmosdb-a-new-sql-database-is-citusdb-a-distributed-sql-did-hyperscale-vanished-in-the-hyperspace-472d

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To