Building effective analytics requires interdisciplinary knowledge and skills. It can go beyond understanding context of problem, gathering/cleaning data, feature engineering, training/testing a model and making solutions available (though critical). Understanding when to buy, build, or outsource a solution is essential to innovating, scaling, and sustaining data products. Let’s walk through the full-stack of skills and considerations to enable effective analytical environment.
First let’s baseline – as the data science field matures different types of data scientist skills, roles, and responsibilities will emerge (fun note here on “battle of the Data Science Venn Diagrams…let’s not take ourselves to seriously). Being a “Analytical leader” tasked to deploy effective data products will require deep context of why, when, where, and how data will help achieve desired outcomes.
At a glance this may be overwhelming, but read this as a team roster not ones resume. Let’s start with 4 major skills and add details as we go.
Establish a Vision and Strategy for Organization, Community, or Platform.
- Start with Why? Set a vision that is clear, compelling, codifying, and complete – this will inspire all types of people.
- Translate the vision into a strategy which guides the “What are we going to do?” questions.
- Be clear on vision and strategies purposes – “to inspire and give purpose”. This is not the “HOW?” part.
Define a relevant problem or opportunity.
- Ask targeted questions, define problem, impact, and urgency.
- Understand end-user, supporting process, and desired outcome.
- Analyze data within context of user, process, and outcome (think – exploratory data analysis).
- Visualize data and tell a story that matters (including impact to business metrics and real outcomes that are well understood)
Assess when to build, buy, or outsource analytical solutions, find unique value.
- Do we have access to data? Do we trust data? Can we govern data? Does data change often?
- What’s the required latency of data to support decision-making frequencies (think – velocity of decisions, speed of process)
- Does current data infrastructure support volume and variety of data to meet latency requirements? Whats the cost to get there and maintain it?
- Do we have right skills and resources to support solutions across life-cycle (e.g. definition, develop, deploy, maintain, and evolve)? All very different skills.
- What do other teams, vendors, and/or products do well already? Can we do it better?
Be hands-on – establish ability to quickly build, test, fail, iterate, and deploy usable solutions
- Do we have the right data and analytics workbench that enables analytics solution life-cycle?
- Are these workbench environments cost effective as they change with new technology (e.g. consider open-source stack)
- Are these workbench environments scale-able, configurable, supportable, and portable?
- Do you have people who can learn quickly and demonstrate solution within context of business problem? Actual workable solutions?
Ensure analytical solutions are effective.
- Are users aware of solutions and potential impact?
- Are they actually using it and getting desired outcome?
- Are users providing feedback on how to improve solution and are improvements being made?
- Assess total cost of solution ownership (includes data platform, support, storage, micro-services, enhancements to remain competitive).
- Is solution effective for users and economical to maintain and evolve (giving strategic advantage)?
‘Must-Have’ Technical Skills
- Acquiring Data (Python, SQL, APIs, web-scrapping, spreadsheets)
- Storing Data (databases, data ingestion methods)
- Transforming Data (Python and SQL)
- Extract Insights (Hypothesis Testing, ANOVA, Anomaly Detection, Optimization, etc)
- Automating Tasks (Python, Task Schedulers, Alerts)
- Applying Machine Learning (use-cases, purpose, pros/cons, limitations)
- Selecting best visualization (fit for use)
- Harnessing cloud resources (containerization, virtual machines, AWS, GCP, Docker)
Useful Analytics Techniques and Concepts
- Forecasting
- Classification
- Anomaly Detection
- Linear Programming
- Stochastic Programming
- Decision Trees
- Modern Portfolio Theory
- Expected Value
- Fitting Distribution
- Probability Density Functions (PDF)
- Cumulative Distribution Function (CDF)
- Mean, Variance, Standard Deviation, Coefficient of Variation (CV)
- Exploratory Data Analysis (EDA)