Do you need to retrieve and process huge amounts of data, e.g. Videos, images and documents from the source to the target?
We have a potential solution called Streaming in DW to help you manage massive amounts of data, which is explained in this blog.
The term "streaming" refers to continuous, never-ending flows of data that can be used without the need for downloading.
These streams improve efficiency and scalability since there is no need to load a huge amount of data into memory before a service execution. They can also accelerate the processing of large documents without overburdening the memory. One of the features of DataWeave is that it supports what's known as "end-to-end streaming" in Mule applications.
Instead of scanning the entire document to index it, DW processes the data as it arrives during Streaming. When using the deferred option, the Streaming DW can send the streamed output data directly to the next message processer. This behaviour allows DataWeave in Mule to process data more quickly, utilising fewer resources/memory.
To perform to enable streaming, these are the configuration properties that we need:
As the data is huge, streaming is enabled to avoid memory overloading, and processing gets done more quickly. In this scenario, we have processed one file at a time.
In the current example, a third party system stores the content and is exposed as the rest API.
1. An HTTP requestor should be used from the system API to invoke the rest API, get the content, and be sent in the response without using any transform activity.
Note: If any translation is performed on the response received, the data will be stored in the memory.
Below are the steps to be followed to enable streaming.
<http:request method="GET" doc:name="Request TARGET API To Get Content " doc:id="918128c2-fe2c-4523-be70-860da6941aa8" config-ref="HTTPS_Request_configuration" url="${getcontenturl}" outputMimeType="application/json; streaming=true">
2. From the process API, the above system API has to be called to get the content information, so in this step also, we will enable streaming to continue through the pipeline
<http:request method="GET" doc:name="GET Content Details" doc:id="f58b7c40-0444-4a72-bbda-a46c4438a708" config-ref="HTTP_Request_configuration" path="/getcontent" outputMimeType="application/json; streaming=true"/>
In this way, DW Streaming is utilised in this scenario to deliver better performance. If multiple attachments have to be processed in this scenario, we have to use concurrency and streaming together. Also, this streaming is supported for the data formats JSON, XML and CSV.
There are a few limitations of the DataWeave streaming solution. For example:
If you would like to find out more about how we can help you leverage the RAML and MuleSoft, give us a call or email us at salesforce@coforge.com.