Building your own machine learning systems!
You might want to better understand what’s going on, you might have data security requirements that mean you can build models on your own data while keeping everything in-house, you might want to compete in Kaggle Competitions or you just might want to nerd out in this space.
You’ve may have played with stand-alone Generative AI services like OpenAI’s ChatGPT and Google’s Bard. Perhaps you’ve started using integrated services like Bing Chat while you search, Github Copilot while you code, you may even be on an enterprise Microsoft Office license and using Office Copilot (at the time of writing not available to consumer or education licenses).
Where to start – what’s a good place to learn about this technology?
So. Many. Choices. Which is great – and perhaps a little daunting!
So here’s a list of various options – I’ll aim to update this article from time to time as things changes – feel free to let me know if you find other options that should be on this list.
Kaggle is an online community supporting data scientists and machine learners. Kaggle hosts competitions with rewards spanning anything between the joy of learning through to competing for hard cash rewards from major sponsors.
Kaggle provides free hosting of Jupyter notebooks (backed by GPUs), datasets and a bunch of excellent courses. Each course and competition has a community chat and the whole site is governed by Community Guidelines starting with “Be patient. Be friendly.”.
Some excellent starting points are:
- Intro to Machine Learning
Learn the core ideas in machine learning, and build your first models.
This is a launching point to other ML courses – starting with bite-sized chunks of theory and exercises and a gentle introduction to Kaggle competitions.
Once you’ve done that you’ll be linked through to courses like Data Cleaning, Intermediate Machine Learning, Intro to Deep Learning and more!
- If you’re approaching this from closer to the beginning of your programming journey you might want to start with courses like Intro to Programming, Python and Pandas.
fast.ai provides a number of comprehensive top-down courses – the team behind the courses focus on getting started as quickly as possible with working code using high level abstractions that mean students do not need to understand the details until they need to. They describe this a “whole game” approach: “That means that if you’re teaching baseball, you first take people to a baseball game or get them to play it. You don’t teach them how to wind twine to make a baseball from scratch, the physics of a parabola, or the coefficient of friction of a ball on a bat.”
While there’s a lot to learn in these courses the content is approachable (and depending on how you’re going you can lean into video / reading / workbooks). Throughout the course Jupyter notebooks are used to provide working exercises.
If you’ve already started down a lower-level course like DeepLearning.ai, fast.ai courses can give you some tangible builds and higher level understanding that might help you frame and give context to your lower level learning.
The courses are:
- Practical Deep Learning
This was the first course and is being updated over time. The basic course can be completed by watching videos and completing notebooks, but for a deeper understanding the course is backed by a book available for free online and also as a printed book if you’d like to leaf through it on the table while you learn.
- From Deep Learning Foundations to Stable Diffusion (or just “Part 2” for short!) is a continuation – with the promise that by the completion of the course you will have reconstructed the text-to-image Stable Diffusion model.
In balance to fast.ai’s approach – DeepLearning.ai is generally considered to be starting from the other end of a “whole game” or top-down approach and promises to teach from the bottom-up. These are paid courses (unlike the options above) using the Coursera platform.
Courses can be taken individually as short courses or combined into specialisations – grouped into Introductory, Intermediate and Advanced levels.
If you’ve already started down the fast.ai path and need some more low-down details, working through a DeepLearning.ai course might be what you’re after.
- Introductory “Machine Learning Specialization“
- Intermediate “Deep Learning Specialization“
- Advanced “Machine Learning Engineering for Production (MLOps)“
Microsoft ML courses on github
Microsoft have released a number of courses on their github account. These are Jupyter notebook courses that can be forked and hosted wherever you like (for example on github Codespaces) so you can work through the examples.
- Machine Learning for Beginners
Focusing primarily on “classic” machine learning – a good place to start!
- Data Science for Beginners
How to get, store and get information out of data!
- Artificial Intelligence for Beginners
Covering “… Symbolic AI, Neural Networks, Computer Vision, Natural Language Processing, and more”
- Generative AI for Beginners
Focusing on the fundamentals of building generative AI applications.
These are presented in Foundational and Advanced courses (in this case not using a Jupyter notebook format! These are done as a more traditional read-through + quizzes to test understanding).
There are also a set of Guide documents for various topics.
Something I really like is the Machine Learning Glossary – this would be an excellent resource to bookmark while you’re working through Machine Learning courses in general. Not only is the content comprehensive and helpful, it also includes cross-references between related topics.
Modules / Libraries
The courses above usually use the Python programming language and rely on a number of powerful data science and machine learning libraries. Here are a few of the top libraries you’ll want to know about and bookmark.
A library which is fundamental to doing data science with Python is Pandas!
Like scikit-learn (above) – the documentation for Pandas is excellent and often a good place to go when you are reading code examples and wondering exactly how one of the data reading / transforming / investigation / visualisation steps has been pulled off!
- The User Guide contains heaps of worked examples of using Pandas and…
- The API Reference tells you everything else you need to know!
This is a machine learning library – often used by the courses above – and comes with lots of documentation which can be helpful when trying to understand specific examples in courses or to get a feel for other options when coding.
- The User Guide provides detailed information about the library, the maths behind it’s components and how it can be used.
- The API Reference drops you quickly into exact details of how to use module classes and functions (but also provides a way to quickly look at other similar options (for example if you’ve started using a specific model, you can see others in the same category that you might want to try).
- Copies examples of uses of the code and visualisations.
- And a blog to keep you up to date with what’s new!
fastai (the library) is “… a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains”. It was created by fast.ai and of course is used throughout the fast.ai courses.
The aim of this library is to provide both high level abstractions (to facilitate the top-down pedagogy of fast.ai) while also allowing developers to get under the hood and tweak the code when needed.