Sign in

Australian Data Science Student/Enthusiast at University of California, San Diego | Intern at Floodlight | Writer with W&B | www.linkedin.com/in/briggs599

How Uber’s Manifold and Weights & Biases’ model tracking tool can help evaluate and improve the quality of your next machine learning project.

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

The more time you spend working with machine learning models the more you realize how important it is to properly understand exactly what your model is doing and how well it is doing it. In practice, keeping track of how your model is performing, especially when testing a variety of model parameter combinations, can be tedious in the best of circumstances. In most cases I find myself building my own tools to debug and analyze my machine learning models.

Recently while working on a slew of different models for MAFAT’s doppler-pulse radar classification…


Ever want to track your model’s progress while you are out and about? Check out this easy guide for passing progress notifications to your phone.

We’ve all been there. Whether you are experimenting with a new fun model or grinding for that Kaggle competition prize pool, it can be hard to leave your models running in peace. The problem with overparenting your projects is is that while your model is training it's the perfect time for you to go out and about and take a break from the computer, whether that be grocery shopping, exercising, or whatever you enjoy doing when you aren’t being a data scientist.

On a recent project, I had this exact problem. I wanted to experiment with new parameters and see…


One of the key skills a data scientist should have is being able to wrangle data from a variety of sources. In this blog, I will discuss three unorthodox data types and how you can get started working with them.

We live in the age of information, a time when there is more data at our fingertips than at any other point in history, and it's growing.

DC predicts the world’s data will grow to 175 zettabytes in 2025 … If you attempted to download 175 zettabytes at the average current internet connection speed, it would take you 1.8 billion years to download.

- Bernard Marr, 2019

That is a lot of data. So why are people still using the same airline CSVs or soccer player statistics? This is a trap I myself have fallen into on occasion and I…


How Scikit-Learn provides a quick and simple way to dip your toes into the world of linear regression and modeling.

Mathematical modeling and machine learning can often feel like difficult topics to explore and learn, especially to those unfamiliar with the fields of computer science and mathematics. It surprises me to hear from my non-STEM friends that they feel overwhelmed trying to use basic modeling techniques in their own projects and that they can get caught up in the semantics of the field. This is a shame because linear modeling can be very helpful in a number of instances, and with all the open-source code on the internet, implementing your own model has never been easier. …


How GetOldTweets3 and Textblob can help perform basic sentiment analysis on stock-related tweets.

Coinciding with a huge trend towards financial independence and independent investing, the 2020 pandemic has shaken up the economic world bringing many new and inexperienced traders to the table, as well as a turbulent pandemic market.

[On the Australian Market] Average daily turnover by retail brokers increased from $1.6 billion in a regular period to $3.3 billion between the end of February and the start of May. And each day during this COVID-19 period a staggering 4,675 new accounts were registered — up by a factor of 3.4 times. At the same time there was a large spike in dormant…


A retrospective on my summer as a data scientist and how GSI Technology’s summer program breaks the internship status quo.

Data science is a field that can be hard to break into, especially if you are an undergraduate student. My name is Braden Riggs and some of you reading this might be familiar with my previous blogs which have dived into a plethora of different data science topics and projects. This blog, however, will be a little different. As my summer is coming to an end, I wanted to write a reflective piece on my experience interning with GSI Technology, a high-performance memory manufacturer based out of Sunnyvalle, California.

Interning as a Data Scientist in a majority hardware company places…


Given the vast range of choices for approximate nearest-neighbor search algorithms, how can you be sure you are picking the best one for your project?

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Whether you are new to the field of data science or a seasoned veteran, you have likely come into contact with the term, ‘nearest-neighbor search’, or, ‘similarity search’. In fact, if you have ever used a search engine, recommender, translation tool, or pretty much anything else on the internet then you have probably made use of some form of nearest-neighbor algorithm. These algorithms, the ones that permeate most modern software, solve a very simple yet incredibly common problem. Given a data point, what is the closest match from a large selection of data…


How humans and animals leave different doppler-pulse footprints and MAFAT’s latest data science prize for creating a model that can distinguish between them.

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

In the world of data science the industry, academic, and government sectors often collide when enthusiasts and experts alike, work together to tackle the challenges we face day-to-day. A prime example of this collaboration is the Israeli Ministry of Defense Directorate of Defense Research & Development (DDR&D)’s MAFAT challenges. A series of data science related challenges with real-world application and lucrative prize pools. In the program’s own words:

The goal of the challenge is to explore the potential of advanced data science methods to improve and enhance the IMOD current data products. The…


The trials and tribulations of attempting to benchmark approximate nearest-neighbor algorithms on a billion scale dataset.

by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Note: this article is a continuation of a project I wrote about here

Where to start?

As discussed in my previous blog, benchmarking Approximate Nearest-Neighbor algorithms is both a necessary and vital task in a world where accuracy and efficiency reign supreme. To better understand when to use any of the range of ANN implementations, we must first understand how performance shifts and changes across a variety of ANN algorithms and datasets.

In my first scuffle with the challenge of benchmarking ANN algorithms, I discussed the work of Aumüller, Bernhardsson, Faithfull, and the paper they had…


by Braden Riggs and George Williams (gwilliams@gsitechnology.com)

Natural Language Processing?

As an undergraduate data scientist, I am often exposed to a range of new topics and ideas shaping the world of data science. Because of the nature of data science, many of these topics span different disciplines or even make up their own disciplines such as the field of natural language processing or NLP. Born out of a combination of linguistics, mathematics, and computer science, NLP is the exploration of drawing data, meaning, and understanding from the swaths of text data in our world. To the untrained eye, NLP may not seem like…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store