You've successfully subscribed to MetaCX
Great! Next, complete checkout for full access to MetaCX
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Blog

How to Build a Platform (in two short years)

. 9 min read . Written by Jake Miller
How to Build a Platform
(in two short years)

Outcomes are the smallest atomic unit of business value. They are the key deliverables or goal achievements by which businesses define success. In the age of collaboration, outcomes aren't one-sided. They’re shared between multiple parties all who have their own contributions and needs to be met. Think of these as shared outcomes.

The platforms that have become synonymous with B2B software simply weren’t built to deliberately facilitate the achievement of goals between people that hail from different organizations and different roles in the relationship like buyers, suppliers, or partners.

After decades of learning about the problems businesses solve leveraging technology, we have learned that a one-sided approach to B2B software is an obstacle to true collaboration between buyers and suppliers. Key insights that may be discovered are hampered--and, consequently, so is the potential of the relationship.

Suppliers want to make this digital transformation. But the nature of enterprise architecture is in the way. Enterprise software is one sided, designed for an organization to collect and store information relevant to their customers, services, and operational data, without regard to the value that a more collaborative architecture would provide.

That’s why we set out to build a platform.

Two Roads Diverge

There are two valid paths to build a product. The lean method is to crank out features, go to market early, solicit customer feedback, and iterate as rapidly as possible. The second approach, which is much less common, is to build ‘platform-first’ where the underlying architecture is designed more abstractly, and if done correctly, can serve a wider variety of use cases over a longer period of time.

Given the problems to be solved and characteristics required of a shared data platform, we deliberately chose the second path, the one less traveled. To make this all work, we realized there was a lot that would need to be built.

So, here was our challenge: We wanted to build bridges where people share information and collaborate around outcomes.

We wanted a system to process and return real-time metrics, success milestones, and CRUD updates to the browser given any number of data sources.  

We wanted to normalize the identity of people across organizations. The event payloads would all have identifiers for companies and users but depending on the source system, those may not all map neatly to each other. That mapping would have to be done in our platform.

We wanted to keep track of all raw events ingested from the n-number of data sources so that those events could be replayed later to create new metrics.

We wanted to be able to turn the clock back on any screen in our product to see what the data looked like at any specific point in time. That not only includes the values of metrics, but also the state of success milestones, and values of records in data sets, effectively change data capture (CDC).

And we wanted it to all be real-time.

And, we wanted to do all of this at IoT scale.

Now do you understand why it took us two years?

Here are a few points that drastically influenced our approach:

Goals are lost during handoffs

The investment your team made to build a relationship and agree on what each party wants to give and receive out of the relationship, in many cases vanishes after the sales process is complete. Crucial context is lost, because cross-department collaboration is difficult without a common place to maintain the context, and is made even harder without  a direct line to the customer.

The world happens in real-time

Not all situations demand real-time information, but many B2B businesses have discovered that surfacing insights as they occur give them a competitive edge to be more effective and efficient with resources.

Measuring outcome achievement at scale is hard today

Managing a small book of business can be done via a spreadsheet. When your book of business scales, it becomes more difficult to track your customers interactions.

Collaboration is important, and so is presence

Presence is different from collaboration. In-person collaboration invokes a sense of connectedness. You are together actively working to solve problems. At MetaCX, we think that human connection is a first-class concern, important enough that a user experience should foster a sense of presence.

The line between the physical space and virtual space is blurring. We don’t need physical offices anymore. During the pandemic of 2020, massive swaths of the population are working remotely. It's time that the virtual experience offered through software fosters collaboration and connectedness.

Product analytics are only part of the story

In his article, The Most Important Metrics You’re Not Tracking (Yet), Gene Cornfield articulates exactly my point. Customer performance indicators (CPIs) extend beyond button clicks and page views. The context of those user interactions and how they align to outcome achievement is more important.

As a simple example, what we found during development is that most companies have a very difficult time determining which customers logged into their products - a seemingly simple data point. More importantly, user logins are not contextualized as part of an outcome, which is much more interesting information.

Again, it all centers around desired business outcomes, the goals buyers and suppliers want to achieve together. And, because your product is where your customers are, it makes sense that you would want to know if your customers are using your product. You want to close the loop by establishing a true correlation between your products and the achievement of desired outcomes.

The MetaCX Taxonomy

The CXReactor, which is the name for the MetaCX event engine, is responsible for ingesting raw events from n-number of data sources, applying rules, emitting signals, and handling reactions to those signals.

Signals are events with inherent business value and are filtered from live event streams ingested from any number of datasources.

Metric calculations are a type of reaction to signals.

Success milestones are used to measure success at scale.

Outcomes are established in shared success plans, and collaboration on these plans is done on bridges.

Bridges are the intersection point between organizations where collaboration occurs, and data is shared.

Characteristics of a Shared Data Platform

Ask any engineer the most important characteristics of a platform and his or her list likely includes: scalability, resiliency, extensibility, and security. These are price-of-entry features. The modern shared data platform has several additional features that should be considered core components.

Normalized identity

Normalizing the identity of a person or customer from n-number of datasources is a challenge most organizations must solve.

We built a generalized identity mapping service that issues arbitrary universal identifiers for alternate keys. This identity service is incredibly efficient and the resolution of identity only has to occur once for each event upon ingestion. That reduces downstream complexity that otherwise would require complex and unmanageable queries. We call these universal identifiers MetaKeys.

Streaming

To preserve our real-time requirements, we need to pre-compute as much as possible at time of ingestion. This includes resolving MetaKeys, performing metric calculations, and computing the state of success milestones. Not only would data need to be streamed as input to the CXreactor, but data needed to be streamed back to users’ browsers.

Immutable Events

A permanent record of raw events received by the system supports auditing, traceability, and event replay.

This concept was borrowed from that of blockchain technology. This component of blockchain is compelling because it ensures a complete and verifiable historical record of events. Blockchain’s cryptographic signature means that the log of events on that chain are verifiable, not an immediate requirement for our platform, because the log is immutable and we can guarantee this log of events is unchanged. Our platform will be well suited to employ blockchain technology in the future for appropriate use cases.

N-Number Dimensions

Data cubes have been a mainstay of data warehousing for decades now and still offer an effective medium to pre-aggregate and store information indexed for quick retrieval. Because we wanted our metrics to all be live, we wanted to be able to see what the value of a metric was at any point in time, and we wanted that number to be queryable in < 50 ms. That required that the values be precomputed in multi-dimensional arrays.

All metrics and milestone achievement states are automatically dimensioned by time on one axis, and the customer on the other. The platform will support n-number of dimensions, though to control cost the total number of non-time dimensions per metric is capped by default.

N-Number Data Sources

Data from any source can be ingested, filtered, and emitted as a signal. To garner a rich understanding of customer behavior, data from many different sources is needed. We built a generic event endpoint to make it easy to point-and-shoot event feeds to the CXreactor.

Idempotent Services

An idempotent operation means that the value of an output will be the same, regardless of how many times the same input is received. Event-driven architectures are almost always designed to guarantee at-least-once delivery of events. By their very nature, event-driven architectures are susceptible to some data duplication.

An idempotent architecture is nothing to scoff at. It turns out it is pretty difficult to do, but it is essential to maintain data integrity. It is easy enough to update a historical value in a columnar table, but when the inputs to one system affect the outputs of a different system, and if you want to keep complexity low, each system has to know how to handle events only once.

Shared Data

Data shipping and direct cross-organization data sharing have a place in this architecture. Data shipping is most common today where that information is literally copied from one system to the other. A shared data platform would, well, allow data to be shared not only across users of the same organization, but users across third-party organizations. That means sellers can share data like metrics or data feeds directly to buyers, and vice versa, eliminating the need for duplicating data, and also introducing a mechanism for data owners to revoke access at any time, and maintain control over their information.

Native GDPR & CCPA Support

GDPR, and the similar CCPA framework, introduced major challenges for businesses. A modern data platform will make traversal of data warehouses for customer or person data a first-class operation. A modern platform will automatically track the vendors with which data is shared, and the context of customer and user.

The Technology Stack

We built our platform entirely on the Google Cloud Platform. Our goal was to leverage managed products that would facilitate rapid prototyping. Most of the managed products work well for our needs. Several don’t. A topic for another article.

Our primary datastore is BigTable. Given the near-real time requirement, BigQuery, while an incredibly powerful product that we use in many other capacities, isn’t well suited to serve requests that require < 100 ms latency.

We wanted to build a great developer experience, too. It should take less than an hour for developers to set up their machines, and our goal is that the entire platform can be run on a developer's local machine. That means for any managed product we used, there would need to be an emulator.

We wanted to standardize all code to JavaScript. That removes friction required from jumping from language to language, and broadens the number of folks that can work on any portion of the codebase.

We wanted an automated CI/CD pipeline. We chose to use a mono-repo to reduce management of multiple repositories. We could very easily have twenty or more repositories which would have quickly become unmanageable.

Finally, and most importantly, we wanted to build a ‘yes, and’ culture. An engineer or team that authored code can’t say ‘no, you can’t do that’, but instead will say ‘yes, and also do x, y, and z.’ That empowers individuals to touch any part of the system, while ensuring that the changes made address any concerns of the ‘owner’ of that code.

Reflections and Lessons Learned

Invention is rewarding, but it can be painful along the way. One of the most difficult challenges was creating efficient algorithms. We got the technology and the algorithm wrong for two of our major services, metrics and milestone evaluation. Then we got the algorithm right, but the technology wrong. Finally, we got both the algorithm and the technology right.

When setting out to build a platform, accept that you’re going to get it wrong, at first. You’re going to rewrite and refactor code, sometimes large swaths. But, you’re going to invent things along the way and the experience, in our case, was well worth the investment.

One of my favorite quotes is by Hemingway, “Write drunk, edit sober.” That is to say, start your work unconstrained, and then tailor. Of course, reasonable parameters should be set and monitored, but in early stages of prototyping, adding early constraints may cause unique and novel approaches to solve problems to be overlooked, or worse, dismissed.

We have laid the groundwork for a powerful platform. We have a long road ahead of us and I are more excited than ever to continue this journey.