According to the PMI PBOK development lifecycles can be predictive, iterative, incremental, adaptative or hybrid models. Now, to understand what suits better to a data science project we should first understand what are the unique characteristic of every type.
Predictive lifecycles are according to the waterfall model where the scope and cost is estimated from the initial phase. Next comes the iterative lifecycle, which cost and scope are determined from the beginning, but can be modified during the course of the life cycle. The incremental lifecycle uses the product iterations which add on the final value of the product as phases. The adaptive lifecycles are agile, iterative or incremental. Now the hybrid lifecycles have elements from all the previous categories. The predictive elements allow the fixed requirements to be specified, and the requirements that are evolving are following an adaptive lifecycle.
This fits better to a data science project, and we can understand better why by exploring the phases of a data science project.
The first phase of a data science project is requirements gathering from the business. This is by nature an unpredictable phase as data science is relatively new and business people don’t know the predictive nature of the tools we try to build. They only have a high level overview of what we want to achieve, but it is BI experts mostly who provide the necessary consulting on what we try to achieve.
The next phase which is the exploratory data analysis, is when we have to find the necessary sources of data that we have available in order to build the data pipeline and achieve our objectives, again this requires a certain degree of flexibility as we need to communicate with multiple departments and IT specialists to get the right data.
The final phase is going back to the business and showing what we have built. Sounds as a stable phase with rigid timelines, but trust me, is not. Taking into account that a data science project can take up to 6 months, a lot of requirements have been changed and business people need extra education about the business benefits that we bring.
How we control all these flexible phases, is by using a hybrid model. Every Project Manager wants to be punctual to their timelines and should follow as much as possible predictive lifecycles, but this is only to the elements that a data science project allows.