ILM Complete Failure - Cancelled deployment and big mess to try and cleanup

This post is part cautionary tale for anyone using or planning to use (and rely on) Integration Lifecycle Management (ILM), part rant and part request for Celigo to contact me and learn why ILM failed and help me out of this hole I’m now in as a result.

Last night I initiated a deployment that had been scheduled for weeks with a client. Development work had been done in a fresh Clone of the production Integration into Sandbox. The plan was simply to Pull/Merge the changes back into production.

Below are the steps that were taken:

  1. Merged the “branch” from Sandbox and it worked as expected - no conflicts and many objects were updated: 7 Exports, 10 Imports, 2 Flows and 1 Script (note: Many of the Import and Export updates were just connectionId changes) - 20 objects total
  2. While reviewing the result I realized I had left two Input filters on some imports (for testing). I also didn't have confidence in some of the static maps for NetSuite Imports (Due to NetSuite internal IDs being possibly out of sync)
  3. I reverted production to before the merge so that I could take screen shots of some of the static lookup set ups and also remove the filtering from the feature branch clone
  4. Went to the SB branch, removed the Input filters. Took screen shots of Production to verify Static Lookups post merge (#2). That’s it, no other changes were made.
  5. Initiated the Merge and received a conflict about the filters (WHY?! They're gone!). Where the first Merge updated 20 objects, this time only 4 were listed in the Preview
  6. Updated the client via Teams that the deployment had to be cancelled and I would contact Celigo to try to find out what in the world happened.

All of that was over the course of about 17 minutes and I was the only developer involved.

Regarding item #5, I mean what the hell is going on that there is a conflict with data that doesn’t exist? The filters are gone, they’re deleted from the source branch, yet they appear as a conflict and in the post-merge result. Something is very broken.

Now it’s the next morning and I have a big mess on my hands and a project that is further behind schedule. Worse, I have ZERO confidence in ILM. I need to decide if I should wait for Celigo to contact me (not going the support route) or manually create these objects in Production or Clone my feature Branch to Production. Ugh!….

Celigo\Tyler\Tony - I’ll drop whatever I’m doing and make myself available, North American or Indian timezone to collaborate on this, if you’re willing.

Technical Details

  • Merge #1, revision ID: 6940d5c7d9ca26673ecd0abb
  • Revert #1, revision ID: 6940d974d9ca26673ecd0c2f

Here you can see that all I did was remove the two filters:

And here’s a Pull preview (freshly attempted this AM)

Where are those Zombie filters coming from???

Hi Steve,

I had a similar issue yesterday. I had reorganized some scripts in an integration and deleted one of them. ILM saw it as a conflict. I was able to clear the conflict but then the merge operation failed. I had to reclone the integration. I deleted all the evidence already.

This was surprising and unexpected. ILM has been solid for a while now.

Hi @steveklett ,

I wanted to follow up with you on this and first acknowledge how frustrating this must have been. I also appreciate you documenting everything so clearly.

We've looked into the issue you've described and identified where it's coming from. It's related to how ILM tracks the "base" revision between your integrations (in your case the one in Sandbox and Production).

In ILM, the original merge you performed becomes the reference point for all subsequent merges. So even though you reverted your changes to take screenshots of your production, the base revision was still being considered. This results in the conflicts you're seeing. It's also why your new pull resulted in far fewer than expected changes in the review screen since it considers the pull was already done.

I wanted to check where you're at right now:

  • Have you already moved forward by cloning the sandbox integration into production?
  • Or manually recreating the changes?

At the same time, I can discuss with our engineering teams if we can reset the base at our end if possible.

Hi @atinavbajaj23 ,

I’m very glad to receive your reply and even happier that your team was so quick to find the cause - great work!

No, I’ve not done anything yet, I was giving Celigo a little time to see how my message was received. I would be happy to have the base reset and continue to use ILM - IF…. you’re confident the system will behave correctly. If there is risk it could still be confused, I would rather not risk corrupting the production environment in any way. If it’s clear-cut, straightforward fix?… yes, let’s do it.

Do you agree that the observed behavior is incorrect? That the base would remain post-reversion? If so, I’m assuming this will be treated as a bug and fixed in the future? Please confirm.

Regards,
Steve

Hi @atinavbajaj23

I just sent you a “Connective Chat” as well, but I’m not sure if you’ll notice it, I know I never notice them when people message me.

I'm in a holding pattern awaiting your reply so I know what my plan will be. When you get a minute can you please look at my last message and let me know what you think?

Regards,
Steve

Hi @steveklett ,

I discussed this with our dev team and unfortunately We can’t safely delete or edit individual revisions ; each revision is linked to another. Doing a surgical delete or manually changing values would risk corrupting both integration histories. Which was what we were afraid off.

But regarding a workaround that works for you now, here's what I recommend:

  • In your production environment under revisions tab, locate your first pull from Sandbox (the one with the input filters still there), described in Step #1 in your replication steps. Merge #1, revision ID: 6940d5c7d9ca26673ecd0abb
  • Select "Revert to after this version".

This will bring you back to a state where your production environment is as close to the changes you've made in your exports, imports, flows and scripts. The only thing that will show extra are the unwanted input filters.

Once you do this you have two options:

  • Remove the input filters manually from Production.
    OR
  • Disregard your current Sandbox clone, and create a brand new clone from Production in your Sandbox. Remove your output filters there with any other changes. And do a pull from this newer clone.

Please let me know if this works out.

Regarding this observed behaviour being the way it is - this is more than "working as designed".

While the underlying version control logic might be as expected, we expect you and @davidgollom to have a more predicable outcome when you use ILM features. So it does warrant investigation and improvements.

Once the current issue which is delaying your deployment is in a better place, we'd like to get in touch with you once we have improvements thought through around this area which could be guidance, guardrails or both as a feature improvement.