Project Overview
A number of streaming services - Spotify for instance, are increasingly tailoring the great power of AI & ML in their platforms, AI DJ (Beta), Voyager Lib. Nearest Neighbour Search (NNS) for finding related songs, Voice translation pilot- for translating podcast episodes & Targeting for users in-app messaging for instance. (Source: Spotify R&D | Engineering). When it comes to approaching such advancements, Zero immediately sprung to mind. An ML model for forecasting the next trending/popular artist whose song will be a hit in a specified period of time on a 'Y' music streaming platform. Structured data (JSON) was used for the project, which initially began in Amazon Sagemaker in the AWS cloud platform.
Workflow
Zero's complexity are determined by users' listening habits, their favourite genres and a whole bunch of wide criteria like: - The artist's followers, - The artist's frequency of song release, - Certain artists have a propensity to release singles or albums frequently. Put them up against artists who put out albums every five to ten years. In either case, the artist will trend inside the shorter allotted term and the longer allowed period, accordingly. - Beats - BPM & Bass, - Market, availability of artists to certain markets only can influence certain regions. Aspects I also had to look into are limited to certain observations: - The artist is limited to a small market but has a high amount of followers, - Despite an artist's songs having good beats, the artist has little fans, - lyrics/message of the artist's songs, For simplicity (small-scale), I made an assumption of working with a single genre: House. House is a style of music typically with a tempo ranging between 120-130 BPM, four-on-the-floor beats and with repetitive vocals effect. The Bass can be scaled between 0-1, with 0 being the extreme shallow bass and 1 being an extreme deep bass.
Feature Details: Markets
This image displays the feature details for 'markets', including its type (numeric), prediction power, validity, missing data, outliers, and statistical measures like min, max, mean, median, and skew. A histogram on the right visualizes the frequency distribution of market values.

Feature Details: Monthly Followers
This image shows the feature details for 'monthlyFollowers', providing insights into its numeric type, prediction power, data validity, and statistical summary (min, max, mean, median, skew). The accompanying histogram illustrates the frequency distribution of monthly follower counts.

Anomalous Samples Detection
This section details how Data Wrangler identifies anomalous samples using the isolation forest algorithm. It explains that low anomaly scores indicate anomalous samples, while high scores are associated with non-anomalous samples. A table below lists examples of anomalous samples with their respective anomaly scores and feature values.

Bias Metrics Overview for Monthly Followers
This image presents an overview of bias metrics for the 'monthlyFollowers' predicted column, with a threshold of 1,000,000. It displays values for 'Class Imbalance (CI)', 'Difference in Positive Proportions in Labels (DPL)', and 'Jensen-Shannon Divergence (JS)', providing a quick assessment of potential biases in the dataset.

Detailed Explanation: Class Imbalance (CI)
This image provides a comprehensive explanation of the 'Class Imbalance (CI)' metric. It covers the key idea, how to understand the metric with an example, its typical ranges, and important notes regarding how models tend to perform with imbalanced data, including potential impacts on training and test errors.

Features
- Data extraction and preparation using SageMaker Data Wrangler
- Analysis of user listening habits and genre preferences
- Prediction of trending artists based on various criteria
- Consideration of artist's followers and song release frequency
- Analysis of beats (BPM & Bass) and market availability
- Identification of anomalous samples using isolation forest algorithm
- Bias metric assessment for fairness in predictions
Technologies Used
- Amazon SageMaker's Data Wrangler
Project Details
Status
On HoldHelp Me Improve
Have suggestions, feedback, or a different perspective? I value your input! It helps me shape better work and improve as a developer.
Contact Me