Data Prep Demystified

View Original

It All Starts With Intelligent Information Capture

In my first job out of college I was hired as a research assistant in the Investment Management Database team (IMDB) at the Investment Consulting firm, Cambridge Associates. Investment Management firms that sought to meet our clients had to complete a questionnaire and the information/data from that questionnaire would be manually entered into a proprietary internal database by our team.

I didn’t realize this at the time but the essence of my job, which was largely a data entry role, was to facilitate the capture of information as defined by AIIM;

“Capture is the process of getting information from its source into some type of more formal information management environment or system, and then recording its existence in the system” - CIP Study Guide 2016, AIIM.

Information Capture is a vast and broad topic and the beginning of Information Management. Actually, there is one prior step to Capture and that is Creation; To capture information it must exist, which implies it must have been created at some point. However Capture requires more strategy and effort.

I’d like to qualify what I mean by information. Information is both unstructured and structured.

Unstructured information consists of all content types (paper, office documents, audio files, video files etc) as well as the content within them. For example a word document is information, but so is the content of the word document.

Structured Information is more commonly known as Data, specific values (numeric, character, etc) that can be stored in a table format (rows and columns).

To summarize;

Information = Unstructured + Structured

Information = Content + Data

I have observed a few core challenges with regards to Information Capture, and have some recommendations on how to address them.

Technology: One of the culprits of poor information capture is manual data entry which is time consuming, not scalable, and error prone. Information needs to be intelligently captured using the right technology that is scalable and enables automation. The right technology will not be a single application, but an ecosystem of connected applications that will capture information for all processes.

Process: Intelligent business processes encourage information capture by abiding by the rules of a governance framework. This could be something as simple as data validation for email and social security numbers to prevent errors, to complex conditional business logic that can account for a wide variety of outcomes. Without governance in the process there will be chaos.

People: Last, but certainly not least are the employees working with information at the point of capture, and those beyond capture. Employees need to learn how to work with information, technology that can automate information capture, and how to operate within the confines of a governance framework for business processes. This is always the most challenging part of the process: Inherently changing People's mindsets and how they work with information.

Done correctly information capture should occur intelligently, fueling information management and setting the stage for data preparation.