Newsletter #85 - Sloppy Use of Machine Learning Is Causing a ‘Reproducibility Crisis’ in Science
...And we are back with more AI related news, updates and insight into the impact AI is making to tackle global grand challenges. I hope your summer period has been a restful one as it clearly hasn't been for the AI community. See below for some of the latest AI updates!
Packed inside we have
- BlenderBot 3: Meta's AI Chatbot Iteration That Improves Through Conversation
- How a Former Academic Helped Launch a Company That Makes Cities Healthier
- and Sloppy Use of Machine Learning Is Causing a ‘Reproducibility Crisis’ in Science
If you would like to support our continued work from £1 then click here!
Marcel Hedman
Key Recent Developments
BlenderBot 3: Meta's AI Chatbot That Improves Through Conversation
What: Meta are releasing BlenderBot 3 to the AI community. BlenderBot is a bot that can search the internet to talk about nearly any topic. A key element to the bot is that it incorporates feedback from its conversations to improve safety and its conversational ability.
Size is not always indicative of performance, however this model has increased by 58x its predecessor BlenderBot 2. Representing a significant step forward in its ability to display personality, long term memory and knowledge.
Key Takeaway: Sharing of the model with the research community offers the ability for even greater progress to be made in conversational AI. While on the surface chatbots may seem to be trivial, their use cases are becoming increasingly important in areas such as healthcare and manufacturing. Therefore work to increase their progress should always be welcomed.
Working AI: How a Former Academic Helped Launch a Company That Makes Cities Healthier
Link: https://www.deeplearning.ai/working-ai-jared-webb/
What: BlueConduit is a company that uses machine learning to locate lead pipes — which can leach the highly toxic material into drinking water — in cities where old infrastructure poses a hazard to residents.
What began as an academic problem quickly was seen to be a problem experienced by many cities and that required rapid intervention.
Key Takeaway: This is a great example of the transition from academia to the business world with real-world impact. The article explores some of the major challenges that comes with this transition including: limited data sets, switching to business/ goal oriented working and staying up-to-date with the latest academic research.
Sloppy Use of Machine Learning Is Causing a ‘Reproducibility Crisis’ in Science
What: As I'm sure many have seen, there have been a wave of results claiming to use AI to make predictions with near perfect accuracies. At the peak of COVID-19, a whole host of papers claimed to be able to predict Covid from anything from facial expressions to whichever predictor next came on the list. A group of Princeton researchers have been exploring claims made by researchers using AI in a range of fields and have found that the results are rarely reproducible.
Key Takeaway: The group have found errors in 329 studies across a range of industries and it is being put down to researchers rushing to use machine learning without a comprehensive understanding of its techniques and their limitations.
Common pitfalls of AI use included:
(https://www.deeplearning.ai/)
- Data leakage including lack of a test set, training on the test set, deciding which features to use based on those that performed well on the test set, and testing on datasets that include duplicate examples
- Drawing erroneous conclusions from insufficient data
- Applying machine learning when it’s not the best tool for the job
AI Ethics and 4 good
🚀 Artificial intelligence uncovers carcinogenic (Cancer) human metabolites
🚀 Advances, challenges and opportunities in creating data for trustworthy AI
Other interesting reads
🚀 Testing Firefox more efficiently with machine learning
Cool companies found this week
Artificial General Intelligence
Keen Technologies - A new AGI company founded by John Carmack (game developer who co-founded id Software and served as Oculus’s CTO) has just raised $20m from Sequoia and angels.
Web data
Common Crawl - An open repository of web crawl data that can be accessed and analyzed by anyone.
Climate
Bearing AI - Green shipping powered by Artificial Intelligence. Recently raised $7m in post-seed funding.
...and Finally
AI/ML must knows
Foundation Models - any model trained on broad data at scale that can be fine-tuned to a wide range of downstream tasks. Examples include BERT and GPT-3. (See also Transfer Learning)
Few shot learning - Supervised learning using only a small dataset to master the task.
Transfer Learning - Reusing parts or all of a model designed for one task on a new task with the aim of reducing training time and improving performance.
Generative adversarial network - Generative models that create new data instances that resemble your training data. They can be used to generate fake images.
Deep Learning - Deep learning is a form of machine learning based on artificial neural networks.
Best,
Marcel Hedman
Nural Research Founder
www.nural.cc
If this has been interesting, share it with a friend who will find it equally valuable. If you are not already a subscriber, then subscribe here.
If you are enjoying this content and would like to support the work financially then you can amend your plan here from £1/month!