5 min read

The busy developer's guide to Apache NiFi

The busy developer's guide to Apache NiFi

TLDR: Nifi is an open-source tool used to create workflows for moving and transforming data. Using NiFi’s user interface you create Flows by adding pre-built processors.

Processors are then linked together. Data moves through the Flow as a sequence of FlowFiles (original data + meta-data).

Why use NiFi?

Moving data between systems is a problem almost all organizations face today. NiFi will remove as much pain from this process as possible. Use NiFi to automate the movement of your data in a simple, reliable, scalable manner.

NiFi also offers a lot of cool features that make it an aboslute pleasure to work with. Some more advanced capabilities include data provenance, data replay features, and configurable queues.

Benefits of Using NiFi

  • Simple UI to build and manage Flows
  • Open-source
  • Guaranteed delivery of data

NiFi Overview

NiFi is an Apache open-source project built to automate the flow of data. It is a Java application that provides a web-based UI. Build a workflow by connecting various pre-built processor components. NiFi’s purpose is to transform and distribute data.

An example NiFi Flow

Basic NiFi terminology

  • Flow – this is what you create by connecting various NiFi processors together to move data.
  • Processor – a built-in NiFi component. Processors perform a specific action on the FlowFile it receives. List of NiFi Processors.
  • FlowFile – this is the single unit of information passed between processors. It contains both the actual content of your data and metadata that Nifi attaches. Processors will allow you to change the content and/or attributes of a FlowFile.
  • Connections – Processors connect to other Processors and form a Flow. NiFi allows for configuring the Connection queues.

How NiFi Works

To use NiFi you create a Flow. Connect various pre-built Processors to create a Flow. A Flow generally starts by using a Processor to read in some data.

This data becomes a series of FlowFiles when read in by NiFi. These FlowFiles will move through the various connections between Processors in your Flow. These NiFi Processors may act on a FlowFile’s content or attributes. If you need to reset your Flow, right-click and clear the state of a Processor.

If a Processor throws an error, FlowFiles will remain in the queue between Processors. To automatically remove FlowFiles from a queue set an expiration time.

How to Use NiFi

Basics of NiFi UI

The NiFi Documentation does a fantastic job of showcasing the UI and getting started. Rather than recreate all that work, I’ll point you to some of the highlights from those docs. The links below are to specific sections in the NiFi documentation, you should read that entire section (until the next subheading).

  • NiFi User Interface will show you the basics of the UI. This lays out and explains all the menus and toolbars that are visible when you first log in to NiFi.
  • Adding a NiFi Component shows how to create a new Processor from the Components Toolbar. Don’t worry too much about Input and Output ports or Remote Process Groups yet. Your goal for this section is to get a feel for the Components Toolbar, searching for a Component, and what the right-click menu looks like on a Processor.
  • Configuring a Processor is the meat of setting up a NiFi flow. This illustrates the various menu tabs that allow you to set up a Processor to do what you need. Since Processors are pre-built, you must configure each Processor you add.name
    • Settings Tab – basic information such as the name you want to label this Processor. Note the list of relationships on the right side of this menu. Configure relationships by automatically terminating or connecting to other Processors.
    • Scheduling Tab – specify how often this Processor will run
    • Properties Tab – the core of a Processor. You will spend most of your time adding or modifying the default properties on a Processor.
    • Quick note – Processors must be in the stopped state to allow modifications. If you right-click and see “View Configuration” instead of “Configure”, your Processor is still running.
  • Connecting Processors shows how to link Processors to build your flow. Like Processors, you can also configure Connections. Here you can set the size of the Queue or how long a FlowFile can sit in the queue before being “expired”.
    • One of my favorite things about NiFi is the ability to view data in a Queue. Right-click on a Queue > List Queue will open a screen like this:
    • Click the information button on the left. Then you can actually view the content of a FlowFile by clicking View:
  • Relationships – based on the output of a Processor, there are several different Connection paths a FlowFile could take. Relationships allow you to define which path to take. For example, if a FlowFile encountered an error in the Processor it will flow to the Processor connected by the Failure relationship. 
  • Monitoring a Flow explains the details you see when viewing a Processor in your Flow.

Creating a Flow

In general, most Flows will read in some data, process that data, and move that data elsewhere. Of course, you can build any type of Flow you need but here is a very basic example of how I often use NiFi.

  1. Read/ Consume some Data. Common Components to do this:
    1. S3 – ListS3, FetchS3
    2. Kafka – ConsumeKafka
    3. Poll an API – GetHTTP
    4. Local Files – GetFile
  2. Process/ Transform the Data. The options here are endless, but here are some common Components:
    1. SplitText – break a single text file into individual lines. Now each FlowFile is a single line for the following Processors to act on.
    2. ReplaceText – modify some content, can use Regex expression here
    3. UpdateAttribute – used to add or modify an attribute on a FlowFile. This allows you to extract some information from the content.
    4. RouteOnAttribute – allows you to create if-then logic in your flow.
    5. Merge – combine FlowFiles based on common attributes.
    6. CompressContent – compress FlowFiles.
  3. Write/ move the data to a different location. Common Components:
    1. S3 – PutS3
    2. Kafka – PublishKafka
    3. Local file – PutFile
    4. DB – PutSQL

Expression Language

To configure Processors you will rely on NiFi’s built-in Expression Language. This allows you to reference attributes on FlowFiles or call basic built-in functions like math operations or getting the time.

The NiFi Expression Language always begins with the start delimiter ${ and ends with the end delimiter }. Between the start and end delimiters is the text of the Expression itself. In its most basic form, the Expression can consist of just an attribute name. For example, ${filename} will return the value of the filename attribute.

Next Steps with NiFi

Install NiFi on your local machine and set up your first Flow. First download and install NiFi.

Then build a simple beginner NiFi Flow. Create a Flow that will poll an API, process those results, and store as a local file.

  1. Start with the GetHTTP component. Connect the Success Relationship to a “dummy” Processor, so that you can switch the GetHTTP component on. Now watch your first FlowFiles in!
  2. Inspect the FlowFiles on the queue to view your data.
  3. Change your FlowFiles by connecting more Processors and inspect your data at each step
  4. Use the PutFile Processor to finish this Flow and output your data to a local file.

Read up on more advanced NiFi concepts and check out some other NiFi tutorials.

Extra Resources

More Advanced NiFi Concepts

Have fun playing around with NiFi!