Importance of quality data is being discussed consistently nowadays and you may already have become familiar with the reasons for the value placed on it. From allowing business leaders to acquire better customer intelligence and improving customer service to optimizing internal operations and becoming a more efficient organization, data has the power to truly transform a business. We’ve already explored how to begin leveraging data to drive business growth, and now you may be curious – how do data projects actually start?
Stages of a data project may vary across different consultancies and will depend on project size and goals, but in this post, we will outline the steps that our team at GFAIVE follows and which likely remain the same for many experienced AI and data science consultancies.
All projects begin with a scoping and research phase during which the aim is to establish what the goals of the project are and get on the same page with the client. Below, we will discuss each of the steps followed by GFAIVE in more detail.
Meeting the Team
Everything starts with getting acquainted with the client. It is important for us to establish a sense of trust, security and transparency from the very beginning. Hence, an introductory call session (Skype or Zoom) or a face-to-face meeting is set up, during which all relevant stakeholders meet and discuss their vision for the project. From the clients’ side this typically includes someone responsible for technological advancement and strategy like a CEO or CTO. From our side, a Project Manager and a Lead Data Scientist become actively involved from the get-go.
Defining the Scope
Next, it is imperative to establish what the goals, objectives, deliverables and general scope of the project will be. This step involves having a meaningful conversation with the client’s team during which we ask and discuss the following questions:
What issues is the business currently facing? What problem is the client looking to fix? (E.g. they want to be able to accurately predict CLV in order to know which customers to focus on).
What is the desired effect that the project will have on clients’ business?
How will the success of the end solution be measured?
What won’t we look at? (E.g. only historical data from the last 3 years will be considered).
In what time frame must it be completed?
Is it necessary to integrate the solution into existing infrastructure?
These questions aim to establish common ground, so that both sides are on the same page and to avoid any confusion going forward. If the client is not aware of the potential of AI and data science we make sure to dispel any illusions and clarify what is possible. When done effectively, everyone walks away with a clear understanding of what the project will entail and what the deliverables will be.
Establishing Security Measures
Before we start working with data, we want to make sure that our clients feel secure with transferring their data to us or granting access to it. So, a strict non-disclosure agreement (NDA) is signed which outlines the confidential materials, knowledge and information that both sides agree not to disclose to anyone else. Occasionally, there may be additional requirements to follow like GDPR when working with EU countries, HIPAA with healthcare companies, and FINRA for the financial sector. On a case by case basis, we can determine how to best provide our services while adhering to the above-mentioned guidelines.
During this stage, we take the time to determine the best level of security for working with data. If needed, we organize our work within a secure environment, with safe instruments and encrypted data when dealing with personal information. It is important for us to ensure this data security so that clients feel at ease from the beginning of our collaboration and everyone can focus on the goals of the project.
Initial Data Set Analysis
Once the objectives are established and the NDA has been signed, we can gain data access and proceed with data exploration. We take a look at the data the client has provided and analyze it to see if there are any missing variables and sparse fields. Through data linking and data cleaning we are able to connect information from diverse databases and aggregate it for an efficient workflow. Moreover, by observing the current processes of data collection within the company our data scientists start coming up with ways how those can be optimized and creating an overall data strategy for the client.
Once this part is complete, we have an understanding of what can be done and whether the pre-discussed objectives are feasible. Data exploration can take a while if the client has not been collecting quality data and needs to adopt a data collection strategy to ensure future success.
It is rare for clients to already have quality data when our work begins so, if there isn’t enough for our data scientists to work with, they will either establish a data collection scheme for missing data or synthesize and generate data for hypothesis. This is done by making estimations based on historical evidence and which results in artificially generated data but one that would likely resemble the real thing.
Now that our experts have taken a look at the available data and are aware of what is available to work with – they start researching possible and optimal solutions. At this stage, data scientists examine current and trending approaches to similar challenges and identify solutions that can be applied to the specific use case.
Throughout the process, data scientists evaluate the pros and cons of different approaches so that they are able to provide justifications for the choice they will ultimately make. Once a certain solution is chosen our data scientists are ready to move on to the validity check of the technology.
Technical Validity Check
The technical validity check is performed in order to evaluate the complexity of producing the chosen solution. Specifically, the data team assesses how the needed data will be stored, how long the processing may take, whether or not the solution will be scalable and how much the work may cost. To put it simply, this helps us and the client determine if the solution is feasible and worth it. During the validity check we always take into consideration our clients’ budget and can propose a simplified, but scalable solution that can be further developed in stages. Essentially, we offer various options of development and the client chooses that which suits their needs and capabilities best. These may include their budget, technical capabilities and existing infrastructure.
At this final stage of the scoping and research phase of the project, our Project Manager, Lead Data Scientist and the client go over the proposed solution and determine whether it is approved and what the KPI’s of it will be. Here, both sides make sure that they are on the same page before solution development begins by determining measurable model metrics and the time frame and price of each project phase. This ensures a mutual understanding of the results that the solution should provide. At GFAIVE, we follow the agile method of project management - with short milestones and payment upon delivery. So, the client determines what would be considered a success for each subsequent phase which allows for the collaboration to be transparent and comfortable.
As you can see data science projects begin with a scoping and research phase which incorporates: meeting the team, defining the scope, establishing security measures, performing an initial data set analysis, researching the possible solutions, carrying out a technical validity check and approving the chosen solution. Following these steps, the development phase starts during which both sides remain in close communication to ensure successful project completion.
In one of our next posts, we will continue this topic and will discuss in detail the succeeding steps of the data science project process and the factors that influence its duration and complexity.