Commit ee542d90 authored by Alessandro Melchiorre's avatar Alessandro Melchiorre
Browse files


parent e7b2d632
......@@ -16,20 +16,20 @@ To run the notebook, please unzip the in the current folder.
- precomputed_profiles_thr.pkl
The dataset examined contains:
- 1.475 users
- 1.544.996 unique tracks
- 34.738.390 listening events
- 1.470 users
- 1.544.646 unique tracks
- 34.692.133 listening events
N.B. Listening events and tracks are identified by Spotify URIs. The statistics reported in the paper, instead, identifiy these with a 4-entry tuple (track, artist, album, MusciBrainz id).
N.B. Listening events and tracks in the previous files are identified by Spotify URIs, in contrast to the 4-entry tuples (track, artist, album, MusciBrainz id) used in the paper.
## listening_histories.pkl
Contains the listening histories of all the users in the dataset. For each user, the tracks listed in their listening history are listed in the same order they were listened to.
dict[user_label] = listening_history
dict[user_label] -> listening_history
- user_label is an integer from 0 to 1474
- user_label is an integer from 0 to 1469
- listening_history is a list of Spotify URIs
(see code as an example on how to load the data)
......@@ -39,9 +39,9 @@ Contains the audio [features](
Each row represent a track identified by spotify_uri.
The columns are:
Columns are:
- spotify_uri: URI of the track, as recognized from Spotify
- spotify_popularity: popularity value from 0 to 100 (it varies, it suggested to update it)
- spotify_popularity
- acousticness
- danceability
- duration_ms
......@@ -70,4 +70,11 @@ Columns are:
- n_les: the original number of listening events of the user fetched from (until 25 November 2019)
- n_tracks: the original number of tracks, considering the conjunction of track name, artist name, album name, MusicBrainz id as key.
(see code as an example on how to load the data)
\ No newline at end of file
(see code as an example on how to load the data)
## precomputed_profiles_thr.pkl
Contains the precomputed user profiles over different thresholds. The threshold range from 0 (no threshold enforced) to 100 with increments of tens.
dict[threshold] -> user_profiles
- user_profiles is a pandas.DataFrame. Each row is a user while each column is a feature-statistcs pair (e.g. energy_skeweness)
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment