Apache Wayang Makes Data Processing A Cross-platform Job

Sedang Trending 1 bulan yang lalu

Most information processing frameworks are built connected a azygous execution engine. Not Apache Wayang.

Last week, The Apache Software Foundation debuted nan Wayang information processing framework arsenic a top-level Apache project.

Named aft Indonesian puppet theater, Wayang is simply a information processing model built for unifying sets of information pinch its expertise to orchestrate aggregate information processing frameworks.

For an statement pinch a batch of information systems, this package is simply a spot of a Swiss Army knife, capable to execute different types of jobs depending connected nan needs astatine hand. It speaks some SQL and Java.

“In Wayang, users tin specify immoderate information processing exertion utilizing 1 of Wayang’s APIs and past Wayang tin take nan information processing platform(s), e.g., Postgres aliases Apache Spark, that champion fits nan application,” nan GitHub tract explains. “Wayang will orchestrate nan execution, thereby hiding nan different platform-specific APIs and coordinating inter-platform communication.”

Wayang tin beryllium utilized to tally federated SQL queries crossed different relational databases. Or, it could prime nan astir cost-effective processing level for a fixed job, and past tally that job. For optimal results, it tin moreover break up a occupation to tally crossed aggregate platforms.

“Users look a zoo of specialized platforms to execute information analytics. They typically tally their information analytics astatine a higher costs than necessary, arsenic selecting nan correct level is daunting,” immoderate of nan technology’s creators wrote successful a 2023 paper, explaining nan request for nan technology. “Furthermore, modern applications often require to execute information analytics that goes beyond nan limits of a azygous platform, making nan action of platforms moreover much difficult.”

(The originator of Wayang, Dr. Jorge-Arnulfo Quiané-Ruiz, died unexpectedly successful 2023.)

The caller task status, “combined pinch beardown organization momentum, positions america to heighten nan task and scope moreover much developers,” said Zoi Kaoudi, Apache Wayang PMC Chair, successful a statement.

Wayang’s Three-Layer Abstraction

Wayang’s three-layer architecture wedges an abstraction betwixt an exertion and supporting information systems, wherever it tin make rule-based decisions astir what systems should execute a fixed job, and past orchestrates those jobs.

diagram

Data processing happens astatine nan level layer, but nan level action is done done Wayang.

In this setup, nan exertion holds business logic arsenic usual, but nan underlying halfway furniture acts arsenic an intermediary, translating exertion logic into an intermediate practice called a “Wayang plan.”

A cross-platform optimizer automates information strategy selection. The personification doesn’t person to interest astir nan circumstantial level being utilized for nan task.

This allows nan exertion to usage and intermingle aggregate processing engines into 1 pipeline. For instance, Apache Flink, Apache Spark and Tensorflow tin each beryllium utilized together successful a azygous job. Wayang past orchestrates nan work.

One Workflow, Multiple Engines

Data is stored wrong a azygous repository, and capacity tin beryllium enhanced by selectively offloading information to much powerful engines.

Take, for example, a communal deep learning task: executing a stochastic gradient descent algorithm. This algo is fundamentally a group of Map/Reduce functions interspersed pinch immoderate parsing work.

The Wayang query optimizer tin find which of these jobs would champion beryllium executed connected Apache Spark, and which would beryllium done much efficiently done a azygous Java process.

It translates nan Wayang scheme into a circumstantial workflow, weighing successful factors specified arsenic operating costs and information activity costs, pinch nan extremity of minimizing full costs.

Costs tin beryllium measured successful position of power depletion aliases nan compute costs of nan runtime execution. By default, Wayang uses linear costs formulas but nan personification tin plug their ain optimizer, possibly a instrumentality learning (ML)-based one.

Workflow diagram

The Wayang optimizer (Wayang).

The frameworks that Wayang presently supports:

  • Apache Flink
  • Apache Giraph
  • GraphChi
  • Java Streams
  • JDBC-Template
  • Postgres
  • Apache Spark
  • SQLite3

Commercialization of Wayang

One of nan main committers of nan project, Kaustubh Beedkar, helped launch a company astir nan technology, Scalytics. Scalytics uses Wayang as nan basis for nan federated information processing characteristic successful its Scalytics Streaming Intelligence platform, marketed to widen nan Databricks level retired for separator platforms.

In effect, Wayang tin beryllium utilized to create a “virtual information lake,” according to nan company.

“The eventual extremity is to replicate nan occurrence of [database systems] for cross-platform applications: users formulate level agnostic information analytic tasks and an intermediate strategy decides connected which platforms to execute each subtask pinch nan extremity of minimizing costs specified arsenic runtime aliases monetary cost,” noted nan institution literature.

In summation to Wayang, ASF besides announced that nan Apache Artemis messaging level is now an Apache TLP.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to watercourse each our podcasts, interviews, demos, and more.

Group Created pinch Sketch.

Selengkapnya