Spring XD

Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. The project's goal is to simplify the development of big data applications.

Quick Start
Fork me on GitHub

NOTE: This project has been in EOL/EOS status since July 2017. Please review Spring Cloud Data Flow and the ecosystem-of-projects for further research, exploration, and use-case implementation.


通知

Building upon Spring Boot and Spring Cloud capabilities, Spring XD is redesigned as Spring Cloud Data Flow - a cloud-native orchestration service for composable microservice applications on modern runtimes. For more information and the reasons behind this redesign, please refer to Spring Cloud Data Flow's launch blog.

Spring XD's 1.3 GA will be the last release in the 1.x line and with this release, 1.x will be officially in maintenance mode, addressing only bug fixes. Spring XD is scheduled for End-of-Life, End-of-Availability, and End-of-Support by July 2017.

For any new feature requests or improvements, please submit them on GitHub issues. We welcome your help and feedback!

Benefits

Unified Platform

Spring XD is a unified platform for a fragmented Hadoop ecosystem. It’s built on top of battle-tested open source projects, and dramatically simplifies orchestration of Big Data workloads and data pipelines.

Open and Extensible

Spring XD is built to be adapted from the ground up to suit your enterprise’s unique needs, not dictate your technology choices for you. Extend in any direction with open plug-in points for your existing technology investments, implemented with simple Java classes.

Developer Productivity

Developers new to Big Data can use a no-coding, configuration driven tool to develop Spring XD applications. Java developers can also easily extend the platform or the DSL with familiar extensibility, testing, and automation tools inherited directly from Spring Batch & Integration.

The unified platform for big data

Features

Data from anywhere, to anywhere

Data-driven apps require refined and consolidated data at scale. Spring XD’s stream and batch workflow lets you build pipelines to consume data from various endpoints and consolidate them in Hadoop, in-memory data grids such as Redis or GemFire, and virtually any data store.

Rock-solid distributed runtime

Flexibility in distributing workload across your existing cloud, or on-prem hardware is key for maximizing ROI on hardware or IaaS spend. That’s why the Spring XD runtime is distributed, scalable, fault-tolerant, and highly available. It is instrumented to intelligently recover under failure conditions, load balance and dynamically scale on demand -- all out-of-the-box.

Deep Analytics

Spring XD provides PMML model scoring to compute predictions in real-time. Apache Spark Streaming is an out-of-the-box processor module in Spring XD, and can be plugged in to perform online machine learning with the help of MLLib algorithms.

Developer Friendly

It’s easy to integrate data with Hadoop and any data store - like Greenplum Database, HAWQ or GemFire. No coding is required to use the DSL (Domain Specific Language) and interacting with the server is done via REST, in any programming language.

Monitoring and Management

Remote monitoring and management of the runtime components are supported via JMX endpoints. A built-in Admin UI allows visualization and remote management of containers in the distributed setup.

Portable and Extensible Runtime

Spring XD runs anywhere Java does - on-prem, Pivotal Cloud Foundry, YARN, EC2, Mesos, Docker, etc. A plug-in based architecture allows Java/Hadoop experts to extend the runtime components, allowing DSL (Domain Specific Language) users to leverage the extensions immediately.

Use-Cases

Closed-loop Analytics

Spring XD orchestrates the entire analytics loop - gathering data from any source, triggering actions, handling feedback loops from machine learning models, and computing real-time predictions.

Internet of Things

Enable predictive analytics in-real time over large amounts of machine data, driving business and operation improvements in real-time. Spring XD’s data-integration adapters connect with various data-producing devices, and can be extended to meet any unique device or protocol.

Batch Workflow Orchestration

Traditional enterprise “Big Data” was often done with batch processing. Get productive by using out-of-the-box jobs as templates - avoiding the need to write code. The infrastructure, environment specifics and automation is handled by Spring XD, allowing the enterprise to solely focus on business logic.

Complex Event Processing

Spring XD provides integration with Project Reactor Streams, RxJava Observables, and Spark Streaming. Creating a data stream processor in XD allows you to use a functional programming model to filter, transform and aggregate data in a very concise and performant way. By working with events as you would with collections, Spring XD’s reactive-stream integration allows you to build complex event processors to respond to events in real-time.

快速开始

使用条件

  • To get started, make sure your system has as a minimum Java JDK 7 or newer installed.

手动安装

  • Download spring-xd-1.3.1.RELEASE.zip.
  • Unzip the distribution. This will yield the installation directory spring-xd-1.3.1.RELEASE. All the commands below are executed from this directory, so change into it before proceeding
$ cd spring-xd-1.3.1.RELEASE
  • Set the environment variable XD_HOME to the installation directory <root-install-dir>\spring-xd\xd

Create a stream

  • Start-up the runtime: The single node option is the easiest to get started with. It runs everything you need in a single process. To start it, you just need to cd to the xd directory and run the following command: xd/bin>$ ./xd-singlenode
  • Start XD Shell: ./xd-shell
  • Create your first stream by typing: xd:> stream create --definition "time | log" --name ticktock --deploy

Create a Stream using Flo

The 'time | log' stream definition in Flo

The 'time | log' metrics in Flo