Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 24, 2019 11:25 pm

Databricks Open-Sources Delta Lake To Make Delta Lakes More Reliable

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise's data lake by bringing ACID transactions to these vast data repositories. TechCrunch reports: Delta Lake, which has long been a proprietary part of Databrick's offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill. The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things. What's important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs. The company is still looking at how the project will be governed in the future. "We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead," said Ali Ghodsi, co-founder and CEO at Databricks. "One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses." To invite this community, Databricks plans to take outside contributions, just like the Spark project.

Read more of this story at Slashdot.


Original Link: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/y_sec_a8W7Y/databricks-open-sources-delta-lake-to-make-delta-lakes-more-reliable

Share this article:    Share on Facebook
View Full Article

Slashdot

Slashdot was originally created in September of 1997 by Rob "CmdrTaco" Malda. Today it is owned by Geeknet, Inc..

More About this Source Visit Slashdot