Recreating from Legacy - a Developer Story.

22 min readSep 23, 2022

INTRODUCTION

Our attempt to Recreate a big part of On-boarding experience (currently written in web) in Native Android and IOS such that both share same business and application rules while implementing their own native views.

CURRENT LEGACY SYSTEM BIG PICTURE

Our current legacy system is the whole point of this write up. But before I get inside describing it in detail let me make it clear there are no culprits or any individuals who’s to be blamed. Every code, every architecture comes with an expiry so there is nothing to be cynical about. System doesn’t become legacy in a day, this happens over course of time, due to Time constraints, business constraints, hot fixes and procrastination.

Now the system we are talking about is a part of the registration process of the shaadi.com App. There are a series of screens that the user has to go through in order to create its profile on shaadi.com. Each of these screens have multiple form inputs like dropdowns, text inputs, checkboxes, autocomplete text-inputs and multi selection widgets. Additionally these widgets are conditionally shown or hidden based on complex form rules. These rules are based upon various inputs by user or based upon the country of user or based upon availability of the data from the API.

Widgets like drop down or multiple actions show various options to the user. These options are either statically available in the system or are downloaded from the back-end providing some arguments in the API for example cities are populated based upon the state selected by the user and ethnicity is downloaded based upon user attributes like mother-tongue religion and country of Living.

A typical user will start with page 1 fill the data and press the continue button where the inputs by the user are validated followed by page 2, its validation followed by page 3, its validation and finally create profile call to action button . We have considered this as our scope of development and will focus on these three pages called registration page 1, 2 and 3. There are certain screens before page 1 and certain flow after Page 3 but those are out of our scope of consideration

By a little closer look we noticed that the labels are also based upon som attributes such as the profile creator and its relationship with the bride or groom or gender. We also noticed there are validation messages, some very straight forward and some again based upon the actors.

Additionally user can go to and fro between these pages, where going back, all the fields should contain the relevant values that the user had filled with and going forward, it must again validate before landing on the next page. Finally there is a feature where a user can start from where he or she left irrespective of the session and the platform i.e. a user could start the journey using shaadi.com website on a browser and later should be able to resume on a mobile App.

At this point in time we had no idea about the details of implementation or how it is internally working. The current system was being rendered in a WebView with very historic code in action. The screens were eventually embedded in the native screens of Android and IOS apps. As these screens are very important for most of the user acquisition part no one would dare to touch these functionalities due to Legacy code and direct impact on business it can have. The developers who initially worked on this part of the system have left long back with little to no knowledge transfers. Similar is the story with product managers and quality assurance teams. We have fragmented knowledge preserved with some individuals who have worked on the system for a long period of time now or we have fragmented and also outdated documentation. There is no one person or team or documentation to rely completely.

Gathering the requirement

This was essential for

Understanding the current system
Identifying the problem and proposing a long term solution
Estimating the efforts and time required to achieve our goal

We already knew that the current system is a Legacy one and addressing such a system is a long pending decision. Since it is rendered in a browser there are a lot of native capabilities it is deprived of. However, there must be a very good reason to touch such a sensitive part of the system.

We say let’s rewrite this flow in native, this will open a plethora of opportunities to change and tailor the experience according to ever changing business needs, ensuring the solution should have a longer shelf life.

One thing that we clearly understood about this legacy system is that the business rules are tightly coupled with the platform and the framework. Moreover it is directly accessing the database to fetch any data, hence reusing the existing code is not an option. Since it is directly accessing the data from the database we understood we will require some REST API to access information. These API are either already present in our libraries or we need to understand and draw a new requirement, get it developed as per our need. Finally APIs might already exist but the shape of the data is not compatible with mobile so we will need a decorated compatible version of the same API.

We also hunted for documentation and found some pieces of information thanks to some individuals who faced the same problem and tried to sketch some documentation around the system behavior. But since these documents are not updated regularly and people from different teams at very different times have collected this information, not everything is true and there is some conflicting information in them. So no matter who we talk to, what we access, we still need some black box testing to understand exactly how the system is behaving in production? Even if the input and output strategies are not enough then, our last resort is to dig the code or reach out to the people who can dig it faster because of their expertise with the platform and the language.

We didn’t hit these obstacles at once but it happened in all the stages of development even after reaching the first few rounds of QA.

The KMM Pitch

Registration Part 2 is not the first KMM project we are working on, before this we already deployed a new native version of Registration Part 1 on production which has been fruitful for the product as well as the development team working on the same. Registration Part 1 was written using test driven development hence all the rules and the behaviors are documented in tests. A recent Redesigned version of Registration Part 1 was released and it looks very promising.

A good architecture is where the business rules are separated from the platform code. This is not particular to KMM. This is applicable to Android or an IOS app architecture individually too. What KMM enables us is to share the same business rules as one code base, while the UI code remains local and native to the main app. Finally these business rules are accessed as a library.

With some good learning from our experience with Registration Part 1 and we understood some problems that needed attention across the platform teams both Android and IOS. One big challenge was to make our iOS developers comfortable with this foreign concept and make them feel home. We understand, the iOS UI System is more like a tree structure and the view system of Android is a flat system. This was the reason the iOS developer had to write an additional wrapper to make things compatible with their own architecture. We have tried to address those problems in Registration Part 2 where iOS developers might not have to write that additional layer of wrapper. We have consolidated individual widget data into a thing called ViewData which represents the complete state of View/Screen at any given point of the time. So now iOS developers will not have to subscribe to multiple channels for States of various widgets. However now Android developers have to temporarily write a small layer which compares and updates the widgets based on the change in version of the data.

VIEWDATA

What do I mean when I say consolidated data? Well just to give a background a complete state of screen is represented by a data where we control the widgets on the screen. This data will have information or commands for the widget such as visibility information or whether the widget is enabled or disabled or what will be the drop down options or what will be the value field etc. Considering each screen is a form with loads of widgets, each widget is controlled with a chunk of data. If we consolidate all these chunks we get view-data and that’s what our proposed solution is. I also mentioned that temporarily Android Dev will have to write this additional layer optionally to achieve optimized outputs but in the coming future this flat view system will be replaced with a tree like view system Jetpack compose. Hence both iOS and Android view rendering will be more or less similar( off course they will be native in their nature).

So now if you have got a picture about how this reactive system will keep on updating the state and both Android and IOS will subscribe to this data and render or update their respectively. Other technical details we will see in the later sections of this article.

Approaches and Execution

Before I was introduced to this project, most of the requirements gathering with respect to documents and specs for API were already being gathered by one of my colleagues. He had already started some discussion and made some trackers for pending acceptance criterias, opened & assigned some todos to various stakeholders while closing some points. His work gave us that initial thrust.

This Legacy nature of the project forced us to ask the right questions before anyone
could help us. So most of the clarity came only when we were able to ask the right questions. As I said, the documentation won’t be enough to connect some dots. Other developers and testers helped us to find those answers but again they were busy with their priority so the information was not coming at the rate as it was supposed to.

We already had spent a lot of time trying to find just enough acceptance criteria . Hence we started the development with whatever we knew. The acceptance criteria became our units for testing so we simply added those acceptance in our test. This way we started writing business and application rules in our code. We made all those test passes as per our need. This viewdata approach where the state of all the widgets on the screen is consolidated as a single data was new to us. So we had to test it with a User Interface. One of my colleagues helped me with the UI. I made him understand how he should subscribe to the data and how to interpret that data in such a way that by just changing the rules it should be properly updated on the user interface. And he nailed it. In a very short time he was able to develop a UI that could talk to my business logic and vice versa. Now during the development I could publish my business rules locally(as a library to the main Android project) and actually verify the behavior of the system.

This way we also started a client level implementation with the Registration library in Android. From now on both shared library development and Android UI implementation will happen simultaneously.

Confidence in Acceptance Criteria

At any point of time the system was a reflection of only whatever we knew. We created a portion of page 1 consisting of some related input fields, by applying whatever form rules we knew about. After comparing it with the production App we saw some differences. So we added whatever was missing just to realize that now something else was behaving differently. Eventually we were able to identify a pattern. With some conclusion on a Pattern we reiterated the same functionality again. This was making our life even more difficult in some cases where we were not able to find a pattern. This trial and error would slow us down and we needed another strategy.

We ask QA for more and more acceptance criteria making life of busy QA even more busy. I remember asking QA if they don’t know end to end acceptance criteria, how will they be able to give a sign off on completion? What will be the strategy for testing? The answer was obvious: they are going to compare the live and the testing version side by side. This answer shook us for a while and later it became the source of gathering acceptance criteria. We simply started sharing this crude version of our Registration build with QA and asked them to verify (and not test these builds). We called it daily builds. To make their life easy we gave them the impact areas and asked them not to spend more than an hour or two with any of the build. QA would go through the build and compare its behavior with production and maintain a tracker/todo where in detail the issue was described. These todos are now our acceptance criteria. These issues were some links that were missing in our initial acceptance criteria.

This way of gathering acceptance criteria was working for us but eventually there was more work for QA. After some cycles of this said daily build we stopped sharing builds with QA(on demand). By that time we had collected plenty of acceptance criteria. Some of them we already knew but now they were reassured.

The Back End Support

When we started development we considered backend as a detail and deferred the decision of working on any network API related code. The idea was to work on only business and application rules and mock everything else. After spending a considerable amount of time with form rules at some time we had to take some API into consideration. We had to identify from where the data is accessed in order to populate various drop downs on screen. When I say field data, it means the data that is populated in a drop down and when I say field value, it means the item that is selected from that drop down.

Some field data are statically available and we could simply copy paste in our project while some field data are only accessed via some API. For this same reason we collaborated with a developer from a back-end team. This guy is super supportive. The time and effort given by this person brought so much to the table. He made us understand where we could find static data. He helped us find various endpoints from where field data can be populated(API). He also shared his postman entries with us which was a very mindful collection of API that were usually used together.

While we were searching for endpoints we understood that some of the API responses were not compatible with mobile so they need some extra development. Also some API never existed as the Web was accessing the database directly. He started the development of these API as soon as possible and shared sandbox environment where this current and upcoming development from the back end will be deployed.

There were some back and forth during this phase but I was happy with the progress.

The Reverse Engineering

All this time we had been chasing for correct acceptance criteria by following the documentation and comparing the behavior with Production (by sharing daily builds). Still a myriad of things were happening, at various occasions there was some magic happening on the production which had no pattern. In such situations we always turned up to this experienced developer who is more familiar with this Legacy code of the web then any other on the floor. He used to perform this stunt of diving into Legacy code and finding answers for us. His contribution is really plausible.

We would spend a lot of time together finding missing answers, identifying if there is a rule or an API adding any behavior of production behind the curtains. He was our last resort to find answers. He made us understand what data was being accessed directly from the database or what API is being called to achieve certain results. Based on his response we either find and integrate the API or develop a fresh one as per our needs.

Such findings sometimes resulted in rewriting a complete portion of the feature. And we had to rewrite it wherever and whenever needed and stick to the idea with which we started this project: that the system should be a reflection of what we know.

System design

Throughout the journey we stood with some values and followed them religiously. The first value is to maintain the quality of the system by using the best practices. We followed programming principles and approaches. The idea was to first Make it Work then Make it organized followed by Make it optimized.

We wrote a working version by using Test driven development. If there was any missing case reported it was covered by a test. Similarly if an issue was reported it was again covered by a test ensuring the mistakes were not repeated again. What we thought of as just three screens with user input fields turned out to be a hierarchy of complex rules. There were form rules. There were resetting rules (where if a field is selected with a certain value all the dependent fields must reset or not reset or partially reset).

Session restore — Starting from where left

There are rules for session restore: when the user starts profile creation from some other platform and drops there, later wants to continue from the phone app. His progress must be restored as they it was left. According to the acceptance criteria on each page when it loads there is an API where we check if users have made any progress. If yes those progress are applied, if not then all fields are defaulted accordingly. Then again when Users try to navigate to the next page the fields are validated they are again backed up to the API before navigating to the next page . This backup and restore functionality was the case for all three pages . We saw opportunities here and a way to benefit from native environment, so we optimized it . Instead of calling the same API On load of each page to get progress we called the API and cached the response before starting Registration Part 2. On each page before navigating to the next page we update this cache data and send this consistent data to the draft API where the progress is saved . This way we implemented some use cases that were responsible to backup and restore the progress .

Exposing Public Interfaces for Client

Since we are writing a library it is important that our public interfaces(methods to interact with libraries and their signatures) be stable. On every update it should not break the client. However it should be easy to update and extend the functionalities. For this we have used MVI Architectural Pattern. Where we expose only 3 elements to the client for each ViewModel.

A Stream for States
A Stream for Events
Actions: A mechanism to add input to ViewModel.

States

There are 2 possible states:

Loading
Update(ViewData)

Loading as the name suggests. While Updates carried current snapshot of ViewData, based upon this data UI will react.

sealed class UIState {
   object Loading : UIState()
   data class Update(val viewData: ViewData) : UIState()
}

Events

Events are one-time quick instruction like showing error or navigating away from the screen.

sealed class UIEvents {
   object Finalized : UIEvents()
   object ErrorLoadingData : UIEvents()
}

Actions

Actions are output of User Interaction on screen which are sent as an Input to our business logic. Our libraries process these rules and update ViewData accordingly.
Like User Selected Location-State as Maharashtra. This will serve as input to our Library(VM) and as a reaction Cities of selected Location:State will be downloaded and populated. Additionally Actions should be specific. It should not be verbose and for similar kinds of input there should be an Action. For example Action for Checkbox checked should be different from Action for Dropdown Selection. Hence Actions are typed so that there is less or no verbosity when choosing what Action to fire.

sealed class Action {
   object Start : Action()
   object Refresh : Action()
   object Visit : Action()
   object Submit : Action()
   data class Selected(
       val field: Reg2x2SelectionFields,
       val value: Selection,
       val tag: String = ""
   ) : Action()
....}

Actions Goes up and States and Events goes downward in direction

As promised, going in detail into what is ViewModel, Use Case or Repository is out of the scope of this article. But all I can say these are different layers, each with some responsibility and these layers are decoupled from each other hence making our system flexible to change.

This approach was used for every part of Reg 2 in our scope. Any client (say Android and IOS) can now listen to these streams and update their UI accordingly.

So far so good..

A lot of time was invested. A lot of effort was put into finding Acceptance. Being skeptical about each rule sometimes paid off well. We had so much information it felt like it would be now stupid to wait for even more information. I started a full fledged development. Already behind the deadline things were on fire. Investing any more time or doubting the information was costly. It is so difficult to estimate timelines for such a project that I have already given estimates multiple of time depending upon what was known. Although initially there was no pressure but my own guilt that was driving me to take no rest and progress no matter what.

Day in and Day out, starting every morning from where it was left in the evening. Everyday was a Registration day. To test anything on Registration 2 one needs to go through all the form filling right from start(Reg 1) finally reaching the testing screen and check if things worked out this time. As per my estimates every week there used to be more than 100 registration(of course not on production). We could have mocked Reg 1 but we did not want to be confused with the system behavior and mocked behavior. So we kept it a natural Production-like experience. We could address multiple things and check them altogether to optimize our testing.

There were so many sources of Information it was difficult to mentally map them. So I also created a doc to dump everything that I know or is known in a simple layman’s English. Here I documented everything regarding Form Rules, Validations, API Specifications, even static data that is to be used in Project. It is still WIP and we are trying to update it regularly. I don’t know if this document has the same fate as it was for documentations before. Still we would like to contribute considering the future developers who are going to revisit these screens(It could be us also).

We created the first cut. It was not perfect. But it was high time and we needed at least one round of QA. This would take us closer to our goals eventually closing some todos added during Daily Build rounds.

QA Started

At that point of time our QA was handling multiple projects and was quite an occupied person. We shared a build with her and she started the QA. But with QA, hunting for even more Acceptance Criteria started. Our build was scrutinized and some very delicate information started to come into the limelight. Such that our build was kind of unstable for QA. It was this time each bit of the system was zoomed in and checked. The result of this first round was mind bending and somewhat disappointing for me. It would have been much much better if these details were available to me earlier.

Apart from these there were certain ambiguities in the system which initially didn’t make any sense to us so we hesitated to put in this new system. Turns out those ambiguities need to be re-introduced because it might be driving some Product Metrics. Consider a very simple example: We have a country dropdown in Reg Part 1 where the user is presented with a list of countries and needs to select one. In our case we also had a country dropdown but here for a different purpose. We thought we could reuse the country list from Reg Part 1. But it turns out there were subtle differences between the list expected on both screens. Like one case accepts the UK as a country and another accepts the United Kingdom. It was important to keep things as production because there must be some metrics running on these which expects the United Kingdom and keeps count of it. Finding such a list or making the current list as expected was one such task.

From the start as I said we only added those bits that were clear to us. We wanted the system to fail and fail quickly so that we could update it with correct information in hand. Although such approaches are not good when it comes to stats like bug count. But in the longer run it ensures stability and control over the system.

Hence although feedback from the first round made us bleed but we were far from anything like sweating. Now we had better information in hand and most importantly the correct one. Again we had to revisit some sections of the module, even rewrite some. We also reconsidered our tests, updated them compulsorily. The process was slow but steady. We could see the functionality of Reg 2 improving as compared to the build first shared with QA. We tried closing some of the obvious issues and missing cases. While some were closed after detailed discussions with other stakeholders. However, some issues were again a gray Area and we knew it was wrong but did not know what could be the closest correct answer to it. So we had to turn our heads again to people like I mentioned, a super helpful backend developer and our veteran friend who could dive into the legacy codebase.

Reinforcement

Although I was the only developer working on Reg Part 2 till now, all the things I had to hold together was a delicate affair. By stretching everyday from the past few months I was completely indulged in Reg, thinking about code, approaches, handling communications, long meetings everyday was making this uphill climb difficult day by day. I am handling both library and Client implementations now. It was pure gambling of my time everyday. Where should I put my time today, in Library Coding, in addressing clients, Closing QA, chasing people to understand correct behavior of certain issues, getting UI feedback closed etc. It was not simple to keep calibration of work now. At various occasions I found my-self in the center of everything after many days of non-productive hard-works. And quitting was never an option.

To make my world a little breathable I asked my Manager to give me a Reinforcement. A Senior Dev who could work shoulder to shoulder with me. Someone who have good endurance and could work with me in this mine of fire. And it was granted. I now have a partner! He was fresh and energetic after coming on board he quickly took the responsibility of client side programming and closed various design feedback. He chased people to get the work done, something which I was saturated with. I was about to become a father which added some constraints and my responsibility was channelized away from Reg 2 for some days. During these days my Partner kept the ship sailing. He single-handedly closed various library and client issues even though he was not familiar with KMM much. There was a learning curve before he could completely take charge which was nailed in a very quick duration.

After I returned from my quick paternal vacation I joined him again and we kept on closing new issues that were raised. At this stage there was no difference between an issue or acceptance criteria. We already lost our sense of smell to identify which is what. Every week we could close all the issues on Friday and we were greeted with dozens of issues on a Monday. These issues are actually not the visible ones. Up till now our Reg Part 2 screens were working fantastically. It was obeying all the form rules. All the correct data was presented to the user. Flows were working as expected. Then what kind of issue were they?

Data Discrepancy Issues

With Reg Part 2 we were able to re-engineer the front end of the legacy system but it is not done yet. Our backend services are still legacy one. So, although we have nailed the user facing presentation part, our backend was expecting old type of data. So there were subtle differences between the data we are sending and the system expecting. Due to which the post Reg Part 2 flows of our system were breaking. Metric Tracking was breaking and god knows what.

We needed a seam that could convert the data to exactly what is needed. Take for example the CastNoBar checkbox. Checkbox is either true or false. Throughout the journey of Reg Part 2 it was saved in boolean type. But the server was expecting something else. Following was expected.

A String
If Checked then “Yes”
If Unchecked then “0”

We have ridiculously named this type of data as BoolShit. A boolean data which is expected in string format where true is a binary Yes and false is a binary 0.

A huge amount of time was invested in investigating the data discrepancy and its sources and at the end what is expected from the client to server. In most of the cases our Super Supportive Backend Developer and Veteran friend played a huge role here. We were totally dependent on their findings. Although we had been present in some day-long meetings with them(Just of moral Support). It took us light years to get even a small insight enough to know whether we are on the right track or not. Now all of us are stretching everyday to support each other. It was exhausting but I was happy with the progress we made.

Oh lets not forget the contribution of our dearest Product Manager. None of the product managers were end to end aware of all the business rules that are currently on Production. All they cared about was their Metrics. My existing Product was serving its notice period. He helped us to check what database tables needed to be verified when registration was completed. And left the company after a day or two. And now the new Product manager was actually not the right audience to answer our tricky query. But he was still able to help us by using a magical statement which kept all of us motivated. He said “Let’s keep everything the same as Production”. And…We did the same. In some cases we had to replicate a bug of Production too(Not that we are proud of). Having said that, we never polluted our code with it. Those legacy facing functionalities or processes are hidden in a separate layer which is called Seam. This seam protects the good part from the legacy part. Once our legacy back-ends are addressed in future we can remove these seams or add new and better one’s.

Conclusion

Re-engineering legacy systems need a good hands-on experience w.r.t tech and domain. It needs dependable resources who are ready to squeeze out time from their busy schedule. It needs better documentation either present before development or made one during development. We need understanding stakeholders who are flexible enough. Once started, de-prioritizing such development will worsen the situation as information is very volatile and holding them together is a delicate affair and chances of miscommunication increases. Larger the project, larger could be the communication gaps. We need strong and decisive QAs who could take calls and understand what is really required for closure and particularly what can be picked up later. Similarly strong UI designers who take calls and help close design faster and who could think ahead of time especially w.r.t intermediate states of UI which are missing in current legacy systems. Finally a wise Product Manager is like the cherry on top.

I personally feel Paired Programming is answer to complex project as there is always someone to validate each others thoughts.

Since such features are always rolled out in parts after running experiments there can be certain things that can be deferred and picked up later. Engineering is not about what all can be done, it is also about identifying what is not much required and can be deferred for the best. Our Module is still in QA but in a much much better state and we are confident enough it will bloom some day(at the time I am writing this).