The great Angular to React Migration - the challenge to get the basics right

In total, only 8.5 percent of projects hit the mark on both cost and time. And a minuscule 0.5 percent nail cost, time, and benefits - How Big Things Get Done.

The sobering reality

All stakeholders in a project start with the best intentions, to ensure projects are executed on time. But delays tend to creep in, and it's the same case at Fabrik. As the scale of projects increase, so do team sizes, complexities, interdependencies, and inevitable delays. While The great Angular to React migration at Fabrik is not the Taj Mahal (construction duration of 17 years + 5 years for polish), the same pit falls apply to us. We already had 3 false starts over the last two years, and we are giving stiff competition to the Taj Mahal's timelines. That's the point of capturing our progress on long-form content, to hold ourselves accountable to each other in this fourth attempt, and writing helps us think more about the failure points.

Based on our experience building Fabrik, we believe a dedicated team of 3 engineers can compete the migration in 6-9 months. And budgeting for delays of all kinds, 12 months is a reasonable estimate to complete the process. At the end of 12 months, we will have an exact replica of Fabrik's current Angular version with obvious corrections in code quality and performance parameters on React. With the broad constraints in place, we are breaking down the product itself into multiple segments, independent of each other, and each segment is rebuilt using a combination of AI tools and common sense rather than rewriting the codebase from scratch.

At the moment, I see the first pitfall here - we constantly lose sight of the big picture getting stuck in the minute details that are important, but not consequential as to delay the entire process. And that's where leadership involvement and intervention is key. I say this because senior folks have broad visibility on market sentiment, internal engineering capabilities, they get feedback from sales guys, and generally are in a better position to prioritise/deprioritise activities far better than teams working in specific functions like sales or engineering. For now, we have made quick decisions on minor things that can potentially drag things indefinitely. We keep an eye on these things through weekly internal sync-ups where decisions are made.

The inputs driving prioritisation

Even as we talk of the big picture, people once again start losing the forest for the trees. There's a constant need to align them back to the specific tasks at hand while looking at the big picture. That requires meticulous involvement from all teams, and a decision-maker (independent of these teams whose goals are aligned with the outcome) that will work with the teams to break the monolith into big rocks. We've reached this stage and identified the big rocks:

View port - the customer uses the view port every time they use Fabrik (a digital twin, a virtual reality, or a collaborative experience). There are multiple features within the view port and there's some amount of prioritisation needed here.
Communications with the view port - internally we use JSONs (like almost everybody else) to inform the view port on what should be displayed, what data sources to look for, fetch types of content, and present it in a specific manner.
The authoring studio - experiences displayed on the view port is authored on the studio. And the users create JSONs (mentioned above) through the studio. This is primarily used internally at Fabrik, and hence is lower priority.

Storage, APIs, and servers - React is a frontend framework and these modules of solution is not of importance here. We are able to complete the entire migration without dependency on these modules.

The key connecting component between the older and newer versions is the JSON. We are tempted to fix this component first since we are combining the view port and authoring studio together, but it brings down the entire system. We cannot make any edits on the JSON using the new platform to avoid conflicts with our old version. The authoring studio on the other hand has few folks authoring experiences and is a bit lower in priority for migration. And finally, there's a great demand from customers to improve the performance and visual experience of the view port.

Given these considerations, the only big rock open to start migration is the view port that will consume the current JSON while being robust enough to consume the modified JSON of the future. As of this writing, we are enabling one feature at a time on view port on React3Fiber as independent modules with corresponding functional and performance unit test cases written as a part of the module.

Beyond the view port, we will add the authoring capabilities to the view port without fundamentally changing the JSON structure. And finally, optimise the JSON structure itself to improve performance.

The current status (6 weeks in)

And now finally, let's dive into the specifics of our approach and capture all things from fall through the cracks:

Foundation - the bedrock of the platform to ensure subsequent layers built on top do not crumble at the first sign of trouble. We are breaking the foundation into these components:
1. Environment and dependencies - setup the functioning environment with dependencies installed on the local machine. Ensure all libraries are pointed to the most recent version or the most recent stable version.
2. Development pipeline - the entire exercise of compile, build, integrate, and deploy should be robust to ensure:
  1. compile and build times are small, developers can quickly make changes and test/validate. It should not take 1-2 minutes for every compile/build cycle, should happen within seconds. We've spent effort to make sure our compile and build times are small, it's an ongoing effort.
  2. integrate and deploy lands the code snippet into staging after due process (captured below) is complete.
3. Functional empathy - developers should be aware of their customers and the challenges the might encounter. And proactively correct them. Every module will have it's customers (other modules and/or developers) and it should be a self-contained unit informing them success and failure criteria with clear messages indicating the reasons why this module is not giving the expected results.
4. Performance - a minimum performance measured as frames per second (30-60 FPS) on 4GB RAM. We might not be there on day one, but we will get there and this is the gatekeeper. It will flag when a new build or module drops performance to below 30 FPS.
5. Logging - we are in the process of capturing our logs into a dump for retrieval and review. Logs provide a lot of data for us to improve our developer pipeline, capture most common issues, and proceed to solve them. In addition, any obscure error that is not easily reproduced can be handled here.
6. Bug capture - we cannot look at every session of every customer, and we will miss out on challenges they face. Catching bugs is one way to prevent some of these challenges, and connecting them to our git to automatically create tickets is based into our application now.
7. Storage structure - we save all our models into a single location with primitive role-based or access-baed restrictions. we are working on storage structure that is more robust and customer-specific to ensure security of assets between customers.
8. API structure - we do not have strong API guidelines, we need to create them so that we can work closely with a developer community in the future (12-18 months down the line).
9. 3D pipeline - we manage 3D models through different tools and platforms, we are unifying them into a single interface so that end-customers are not jumping between tools to get something useful. This includes model optimisation, format conversion, materials, etc.
Process - an exercise in enforcement, we are still trying to make this a habit. 6 weeks in, we are still not sticking to mandated release cycles for many reasons. This is again a work-in-progress that requires constant nagging and follow-up. Almost all our processes follow standard best practices as a way to ease in developers into our ecosystem without any unlearn/relearn blockers.
1. Versioning - major.minor.patch.
2. Release cycles - biweekly sprints.
3. Ownership - this is a challenge and ties to functional empathy. Without empathy, it really hard to get a team to perform like a single unit. I notice empathy is translated as being nice and saying thank you/sorry appropriately. But, that's not it. That doesn't make deadlines, that doesn't ship out products. It is being proactive to anticipate challenges the customers will face and solving for it so that customers do not spend time figuring out what I own. And then being nice to them beyond that because it's tough as it is :) This is what ownership means, and we generally misinterpret it.
4. Review and commit - peer review process.
5. Dev. -> staging -> prod.
6. Clean console log/logging habits - make it easy for everybody else.
Programming - these are the small rocks that together build the big rocks.
1. Features.
  1. Boilerplate application.
  2. Login (backward compatibility).
  3. Listing existing 3D experiences.
  4. Launching one of them (loading a scene).
  5. Displaying a 3D model.
  6. Ensure the scene has lights/shadows to see the model.
  7. Controls -> mouse/keyboard.
  8. Displaying text (WIP).
  9. Navigating between steps (WIP).
  10. Adding hotpots (WIP).
2. Messaging - our team is running fast towards the goal and the communication to end-users is sidelined (again, ties back to functional empathy and ownership). So for every action, we have only one successful path and all other paths fail. But all the failure criteria are not captured or users are not informed why their actions fail. This is generally tucked under 100s of lines of console log that other developers spend hours searching for. This is true for all features listed above, we haven't done a good job on this.
3. Unit test cases - not started on this for any of the features, this will come back to haunt us if not corrected immediately. It was the same mistake we did in our Angular version with patches and all errors became reactive firefighting at 11pm on Friday nights.
4. Code quality/hygiene - we are expected to follow industry standards, but it is not reflected in the code. I'm sure we are using AI co-pilots to help us here, but very little is done on this front.

The results

We'd probably score ourselves 4/10 with significant room to improve. and it is critical to set things right at the beginning. We will continue this exercise, try not to compete with Taj Mahal or other big projects for the notorious distinction of delays, and capture the process of the next 4-6 weeks in another blog.

‍