Ordered list of blogs will go here with a widget
The one thing I have always enjoyed growing up was figuring out how things worked. Sometimes the cover would be removed to see how it worked, but a majority of the time, I would push buttons to see what happened. It wasn't until later that I found out the technique was called black box testing, and the process is called reverse engineering.
When it came to computers and digital electronics, I would read anything and everything I could on how the lowest layers worked. It started with why the computer is organized the way it is with a central processor, and the bus that would interface it with RAM and I/O devices from disks to keyboards. What the BIOS does, and how the computer bootstraps from nothing into running an operating system such as Linux, Windows, or MacOS. How each of these operating systems abstract away all of the underlying fundamentals. How the compiler builds the software that runs on these systems, and why it works the way it does. What things like a web browser does when you ask it to retrieve a page from how it downloads it from the Internet to how the browser renders that page. What happens when you add an item to your shopping cart, and how your purchase turns into a package you receive later.
This is what I think of when someone is looking for a person who is familiar with the full stack. Someone who knows how all of the pieces fit together, where to look when things don't fit quite right, and what to add to provide value at the end of the day. For web based companies, this term ends up corresponding to someone who is familiar with a particular web framework such as Django or Rails, a database such as Postgres or MySQL, and a JavaScript UI framework such as VueJS or React. While similar, I would argue there is a fundamental difference, and will save that for another blog post. With this full-stack experience, it makes knowing what and where to look for issues much easier as well as knowing the limits of certain solutions.
This underlying knowledge is the main tool I employ to solve solutions, and form the basis of my skill set of bridging technologies. In addition, the techniques that I use can be used for not just bridging technologies together, but also for debugging systems where you may not have access to the underlying code.
The common flow for me goes something like: 1. Build up the high level view of what the process should be 1. Tap points at the boundaries of those processes 1. Capture data from static sources as well as data in flight 1. Look for patterns/Analyze the data 1. Replicate and tweak 1. Systemize the tweaks
The process I am outlining will apply to classes of bugs that deal with incorrect results. Performance issues will follow a similar approach as above, but with the difference of capturing timing or sizing data for the components to ascertain where the trouble component is.
The first part that is required for any complex system is building up the mental model of what I call, "the world." From my clients' perspective, this is building a mental model of the flow of what should happen when on-boarding a new customer for instance. This should be the happy scenario when everything is working fine, or what things should like like in the future. The goal here is to create a mental model of what the process looks like from the business perspective and map it to the components that are performing the actions. The end result will provide clues as to how the various pieces implementing the business process interact with the technology.
There is a secondary goal to this process, and it is to identify where some of the gaps are between what the client believes is happening, and how it is implemented. This is okay as it identifies potential risks when bridging technologies, or possible points of failure. Ultimately this is a possible point of interest for observation later on.
With a rough idea of what the pieces are, then you can start poking around with them. The key to black box testing or reverse engineering is to identify the interfaces. If the component is open source or an off the shelf component then this becomes much easier with research. Be warned that some documentation can have gaps, or be wrong. Hence the need for observing.
To observe the interface, it helps to have a rough idea of what to look at. Is this a shared library running in an application? Is this a network application? How is the data shared? Is the remote service sitting in the cloud or in a high availability fashion? Depending on the answers to these questions will determine the types of tools that can be used, and how to deploy them. I wish there is a nice cheat sheet available for this, and for performance problems, Brendan Gregg has a page dedicated to Linux Performance.
Normally applications have some kind of logging that can be adjusted to provide additional data, however this is normally up to the vendor, and at times can be useless. Although it is better than nothing.
The goal of this exercise is to determine what the inputs are for the system, and what the resulting output will be. Logs will help illustrate how the data is transformed, but capturing things like network traffic for an online service or the underlying system calls an application makes to the operating system will be required to at least figure out what the component is doing, and how it works. Capturing a memory "core" dump, especially around the time of failure can be invaluable as the dump will give you the state of the program at the time, the instructions, and the data allocated that can be traversed.
Note that there may be quite a bit of back and forth between observing and capturing the data, and the analysis of the data. There may need to be some custom tools in addition to the more traditional Unix tools that can be used to make sense of data.
What I've found to be incredibly helpful is to look for structured patterns, sometimes with the raw data. Other times looking for human readable text and numbers that can provide clues on what kind of data is being sent. With these two focal points, one can inspect the neighboring data to gain clues as to the semantics of this segment. Whether it is a modification of a customer's address for instance vs the user placing the order.
With a slight understanding of what the data looks like coming into or leaving the component, then one can attempt to craft their own data packets to inject into the component to see the expected behavior. This is what I would call, the pushing buttons phase.
For me, I'll start with some debugging tools, or I'll write some custom code to verify things. Once I'm able to verify what I'm seeing, and have a good understanding of what is going on, then I will clean up my initial experimental proof of concept, and make it more generic such that can be used as a production ready bridge.
This is where the experience comes in. Knowing where to look and what to look for is the biggest unknown when it comes to time, and where one's experience can make a big difference. These techniques are applicable towards any type of business as it provides the glue that binds their business process to the underlying technologies.