Australian Data Science Student/Enthusiast at University of California, San Diego | Intern at Floodlight | Writer with W&B | www.linkedin.com/in/briggs599

How Uber’s Manifold and Weights & Biases’ model tracking tool can help evaluate and improve the quality of your next machine learning project.

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Image for post
Image for post

The more time you spend working with machine learning models the more you realize how important it is to properly understand exactly what your model is doing and how well it is doing it. In practice, keeping track of how your model is performing, especially when testing a variety of model parameter combinations, can be tedious in the best of circumstances. In most cases I find myself building my own tools to debug and analyze my machine learning models.

Recently while working on a slew of different models for MAFAT’s doppler-pulse radar classification challenge, read more here, I found myself wasting time manually building these debugging tools. This was especially tedious as I was working on building an ensemble, a collection of machine learning models for a majority classification strategy that can be very effective if done correctly. The problem with creating an ensemble is the variety of different models and diversity of classification that is required to make the strategy effective. This means training more models, performing more analysis, and understanding the impact of more parameters on the overall accuracy and effectiveness of your model. Again, this required me to spend more time trying to create my own debugging tools and strategies. To better utilize my time and resources I decided to turn to the range of tools available online for debugging and analyzing machine learning models. After trialing a few different options, I was able to narrow down my list to two great tools every data scientist should consider when developing and refining their machine learning…


Ever want to track your model’s progress while you are out and about? Check out this easy guide for passing progress notifications to your phone.

Image for post
Image for post

We’ve all been there. Whether you are experimenting with a new fun model or grinding for that Kaggle competition prize pool, it can be hard to leave your models running in peace. The problem with overparenting your projects is is that while your model is training it's the perfect time for you to go out and about and take a break from the computer, whether that be grocery shopping, exercising, or whatever you enjoy doing when you aren’t being a data scientist.

On a recent project, I had this exact problem. I wanted to experiment with new parameters and see how they performed after a few epochs, the problem was I didn’t want to spend all day in my room watching a model train, adjusting parameters, and repeating the process, so I devised a solution I thought worth sharing. Rather than spend all day waiting around for portions of the model to finish I would instead have the script send notifications to my phone with updates on its progress. If the model began to underperform I could finish what I was doing, return home, halt the process, change the parameters, and start again. This meant that while I was out I could still keep an eye on the project, which for me gave some much-needed peace of mind. …


One of the key skills a data scientist should have is being able to wrangle data from a variety of sources. In this blog, I will discuss three unorthodox data types and how you can get started working with them.

Image for post
Image for post

We live in the age of information, a time when there is more data at our fingertips than at any other point in history, and it's growing.

DC predicts the world’s data will grow to 175 zettabytes in 2025 … If you attempted to download 175 zettabytes at the average current internet connection speed, it would take you 1.8 billion years to download.

- Bernard Marr, 2019

That is a lot of data. So why are people still using the same airline CSVs or soccer player statistics? This is a trap I myself have fallen into on occasion and I think it happens mainly for two reasons: laziness and familiarity. Laziness, because we know how easy it is to use precleaned CSVs. …


How Scikit-Learn provides a quick and simple way to dip your toes into the world of linear regression and modeling.

Image for post
Image for post

Mathematical modeling and machine learning can often feel like difficult topics to explore and learn, especially to those unfamiliar with the fields of computer science and mathematics. It surprises me to hear from my non-STEM friends that they feel overwhelmed trying to use basic modeling techniques in their own projects and that they can get caught up in the semantics of the field. This is a shame because linear modeling can be very helpful in a number of instances, and with all the open-source code on the internet, implementing your own model has never been easier. …


How GetOldTweets3 and Textblob can help perform basic sentiment analysis on stock-related tweets.

Image for post
Image for post

Coinciding with a huge trend towards financial independence and independent investing, the 2020 pandemic has shaken up the economic world bringing many new and inexperienced traders to the table, as well as a turbulent pandemic market.

[On the Australian Market] Average daily turnover by retail brokers increased from $1.6 billion in a regular period to $3.3 billion between the end of February and the start of May. And each day during this COVID-19 period a staggering 4,675 new accounts were registered — up by a factor of 3.4 times. At the same time there was a large spike in dormant accounts. …


A retrospective on my summer as a data scientist and how GSI Technology’s summer program breaks the internship status quo.

Image for post
Image for post

Data science is a field that can be hard to break into, especially if you are an undergraduate student. My name is Braden Riggs and some of you reading this might be familiar with my previous blogs which have dived into a plethora of different data science topics and projects. This blog, however, will be a little different. As my summer is coming to an end, I wanted to write a reflective piece on my experience interning with GSI Technology, a high-performance memory manufacturer based out of Sunnyvalle, California.

Interning as a Data Scientist in a majority hardware company places me at an interesting intersection between the hard and soft side of the technology industry. Studying as a data scientist at the University of California, San Diego, my background is primarily in software, so I was surprised when I saw the opportunity on my college job board to work as a data science intern with a hardware company. …


Given the vast range of choices for approximate nearest-neighbor search algorithms, how can you be sure you are picking the best one for your project?

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Image for post
Image for post

Whether you are new to the field of data science or a seasoned veteran, you have likely come into contact with the term, ‘nearest-neighbor search’, or, ‘similarity search’. In fact, if you have ever used a search engine, recommender, translation tool, or pretty much anything else on the internet then you have probably made use of some form of nearest-neighbor algorithm. These algorithms, the ones that permeate most modern software, solve a very simple yet incredibly common problem. Given a data point, what is the closest match from a large selection of data points, or rather what point is most like the given point?


How humans and animals leave different doppler-pulse footprints and MAFAT’s latest data science prize for creating a model that can distinguish between them.

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Image for post
Image for post

In the world of data science the industry, academic, and government sectors often collide when enthusiasts and experts alike, work together to tackle the challenges we face day-to-day. A prime example of this collaboration is the Israeli Ministry of Defense Directorate of Defense Research & Development (DDR&D)’s MAFAT challenges. A series of data science related challenges with real-world application and lucrative prize pools. In the program’s own words:

The goal of the challenge is to explore the potential of advanced data science methods to improve and enhance the IMOD current data products. The winning method may eventually be applied to real data and the winners may be invited to further collaborate with the IMOD on future projects. …


The trials and tribulations of attempting to benchmark approximate nearest-neighbor algorithms on a billion scale dataset.

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Image for post
Image for post

Note: this article is a continuation of a project I wrote about here

Where to start?

As discussed in my previous blog, benchmarking Approximate Nearest-Neighbor algorithms is both a necessary and vital task in a world where accuracy and efficiency reign supreme. To better understand when to use any of the range of ANN implementations, we must first understand how performance shifts and changes across a variety of ANN algorithms and datasets.

In my first scuffle with the challenge of benchmarking ANN algorithms, I discussed the work of Aumüller, Bernhardsson, Faithfull, and the paper they had written exploring the topic of ANN Benchmarking. Whilst the paper and subsequent GitHub repo were well constructed, there were some issues that I wanted to further investigate relating to the scope of algorithms selected and the size of the dataset we were benchmarking on. …


by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Image for post
Image for post

Natural Language Processing?

As an undergraduate data scientist, I am often exposed to a range of new topics and ideas shaping the world of data science. Because of the nature of data science, many of these topics span different disciplines or even make up their own disciplines such as the field of natural language processing or NLP. Born out of a combination of linguistics, mathematics, and computer science, NLP is the exploration of drawing data, meaning, and understanding from the swaths of text data in our world. To the untrained eye, NLP may not seem like an important or relatively difficult task. However, its application and importance in our world is undeniable. Technologies such as spam detection, predictive text, virtual assistants, search engines, sentiment analysis, and translators are but a few examples of the plethora of applications for NLP that have become staples in our world. Unfortunately for NLP specialists, human language, be it English or Mandarin, is complicated, messy, and regularly changing, making it very challenging to translate to machines. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store