Improving Performance in a File-Based FTP to S3 Flow

I wanted to share a recent improvement I made to a flow that might be useful to others running into performance issues with large files.

Original Setup

The flow was moving a ~30MB XML file from FTP to S3. It used:

  • A transform script to process each record
  • An output filter to exclude unwanted records
  • The default page size of 20

This setup worked functionally, but performance was a problem — the flow consistently took more than 20 minutes to complete.

What I Changed

To improve performance, I made two changes:

  1. Increased the page size from 20 to 1000
    This reduced the number of pages the platform had to process, and helped reduce overhead.

  2. Moved logic into a preSavePage script
    Instead of using a transform script and an output filter (which both run record-by-record), I moved all the logic into a preSavePage script. This allowed me to:

    • Process entire pages of records at once
    • Filter out records directly inside the script, eliminating the need for the output filter entirely

Results

After these changes, the flow now finishes in about 1 minute — a significant improvement over the original 20+ minute run time.

Key Takeaways

  • preSavePage scripts are more efficient when dealing with large volumes of records since they operate on the full page.
  • You can replace both transform logic and filtering with a single, streamlined script.
  • Increasing the page size can reduce the total number of requests, which helps with throughput — especially when working with large files.

If anyone else is looking to improve performance in similar scenarios, this is one approach that might be worth trying.

2 Likes

Hi Tyler,

I’ve noticed a couple of flows in our environment that take a long time to run. They were built by 3rd party developers before I got involved, so I don’t know the full ins and outs of all the steps etc. I noticed that the page sizes are set to 1 though, which strikes me as odd. After reading your post I’m going to experiment with increasing the size, but i just wondered if you were aware of any scenarios where a page size of 1 is a requirement/recommended. Many thanks.

Hey @Matthew_Lacey, there are typically a few reasons someone may use page size 1. Here are the ones I can think of:

  • Enforcement of records being processed serially. This is used in conjunction with connection concurrencies of 1 as well. Having both of these set ensures records exports are processed serially in the flow. This is a fairly rare requirement, as most systems can handle parallel requests.

  • If you have many lookup steps in the flow, you may eventually hit the 5 MB page size limit. By limiting the page to only hold 1 record, you give yourself more space to look up and add in additional information. This is also not super common, and you can mitigate this by using transformations to reduce the payloads so they only include the fields you need.

  • If you have a file import step later in the flow and you need it to create a file per record, then you can set the page size to 1 and turn on skip aggregation on the import. Doing this ensures that only 1 record gets into each file.

I may be forgetting some, so I’ll add if I remember any more. There is a good article on choosing page size here: https://docs.celigo.com/hc/en-us/articles/360043927292-Fine-tune-integrator-io-for-optimal-performance-and-data-throughput#exports-and-listeners-2

Thanks Tyler, that’s really helpful!