Commit 1b551f48 authored by alessandro's avatar alessandro
Browse files

updated readme

parent 3f87af17
### PERSONALITY CORRELATES DATASET ####
# Personality Correlates of Music Audio Preferences for Modelling Music Listeners
> Alessandro B. Melchiorre (alessandro.melchiorre@jku.at), Markus Schedl (markus.schedl@jku.at) \
> Johannes Kepler University Linz (JKU) and Linz Institute of Technology (LIT), AI Lab, Austria
> The dataset is based on the MyPersonality dataset (https://sites.google.com/michalkosinski.com/mypersonality), Last.fm data (https://www.last.fm/), and Spotify (https://www.spotify.com/) data.
> The dataset is based on the [MyPersonality](https://sites.google.com/michalkosinski.com/mypersonality) dataset, [Last.fm](https://www.last.fm/) data, and [Spotify](https://www.spotify.com/) data.
The code for the correlations is: Personality Correlates of Music Audio Preferences for Modelling Music Listeners.ipynb
This zip contains 4 files in total:
- listening_histories.json
- refined_listening_histories.json
In order to run the notebook, it is necessary to unzip the data.zip in this folder.
**Data** contains the following files:
- listening_histories.pkl
- spotify_features.csv
- users_info.csv
- correlation_computer.ipynb
- precomputed_profiles_thr.pkl
1) listening_histories.json
dictionary with the following structure:
The dataset examined contains:
- 1.475 users
- 1.544.996 unique tracks
- 34.738.390 listening events
user_label -> listening_events
- user_label is an integer (or integer string) from 0 to 1474 (1475 keys in total)
- listening_events is a list of the spotify_uris listened from the specific user
N.B. Listening events and tracks are identified by Spotify URIs. The statistics reported in the paper, instead, identifiy these with a 4-entry tuple (track, artist, album, MusciBrainz id).
N.B. in this dataset, each listening event is defined only in term of the spotify_uri, instead of considering the track name, artist name, album name, and MusicBrainz id. This implies that statistics computed on this dataset will be different from the ones reported in the paper.
2) refined_listening_histories.json
Same structure of (1) but it considers only users with >= 30 listening envents.
It contains 1350 users.
## listening_histories.pkl
Contains the listening histories of all the users in the dataset. For each user, the tracks listed in their listening history are listed in the same order they were listened to.
3) spotify_features.csv
csv file
Each row holds information for one spotify_uri.
Columns are:
**Format:**
dict[user_label] = listening_history
where:
- user_label is an integer from 0 to 1474
- listening_history is a list of Spotify URIs
(see code as an example on how to load the data)
## spotify_features.csv
Contains the audio [features](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/) of all tracks in the dataset.
**Format:**
Each row represent a track identified by spotify_uri.
The columns are:
- spotify_uri: URI of the track, as recognized from Spotify
- spotify_popularity: popularity value from 0 to 100 (it varies in time, hence it suggested to update it)
- spotify_popularity: popularity value from 0 to 100 (it varies, it suggested to update it)
- acousticness
- danceability
- duration_ms
......@@ -43,11 +55,11 @@ Columns are:
- tempo
- time_signature
- valence
(see https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/ for description of these features)
4) users_info.csv
csv file
Each row holds information for one user, defined by label.
(see code as an example on how to load the data)
## users_info.csv
Each row represent a user identified by label.
Columns are:
- label: id of the user
- ope: openness
......@@ -58,5 +70,4 @@ Columns are:
- n_les: the original number of listening events of the user fetched from Last.fm (until 25 November 2019)
- n_tracks: the original number of tracks, considering the conjunction of track name, artist name, album name, MusicBrainz id as key.
5)correlation_computer.ipynb
jupyter notebook of the code used for computing the correlations
\ No newline at end of file
(see code as an example on how to load the data)
\ No newline at end of file
### PERSONALITY CORRELATES DATASET ####
# Personality Correlates of Music Audio Preferences for Modelling Music Listeners
> Alessandro B. Melchiorre (alessandro.melchiorre@jku.at), Markus Schedl (markus.schedl@jku.at) \
> Johannes Kepler University Linz (JKU) and Linz Institute of Technology (LIT), AI Lab, Austria
> The dataset is based on the MyPersonality dataset (https://sites.google.com/michalkosinski.com/mypersonality), Last.fm data (https://www.last.fm/), and Spotify (https://www.spotify.com/) data.
> The dataset is based on the [MyPersonality](https://sites.google.com/michalkosinski.com/mypersonality) dataset, [Last.fm](https://www.last.fm/) data, and [Spotify](https://www.spotify.com/) data.
The code for the correlations is: Personality Correlates of Music Audio Preferences for Modelling Music Listeners.ipynb
This zip contains 4 files in total:
- listening_histories.json
- refined_listening_histories.json
In order to run the notebook, it is necessary to unzip the data.zip in this folder.
**Data** contains the following files:
- listening_histories.pkl
- spotify_features.csv
- users_info.csv
- correlation_computer.ipynb
- precomputed_profiles_thr.pkl
1) listening_histories.json
dictionary with the following structure:
The dataset examined contains:
- 1.475 users
- 1.544.996 unique tracks
- 34.738.390 listening events
user_label -> listening_events
- user_label is an integer (or integer string) from 0 to 1474 (1475 keys in total)
- listening_events is a list of the spotify_uris listened from the specific user
N.B. Listening events and tracks are identified by Spotify URIs. The statistics reported in the paper, instead, identifiy these with a 4-entry tuple (track, artist, album, MusciBrainz id).
N.B. in this dataset, each listening event is defined only in term of the spotify_uri, instead of considering the track name, artist name, album name, and MusicBrainz id. This implies that statistics computed on this dataset will be different from the ones reported in the paper.
2) refined_listening_histories.json
Same structure of (1) but it considers only users with >= 30 listening envents.
It contains 1350 users.
## listening_histories.pkl
Contains the listening histories of all the users in the dataset. For each user, the tracks listed in their listening history are listed in the same order they were listened to.
3) spotify_features.csv
csv file
Each row holds information for one spotify_uri.
Columns are:
**Format:**
dict[user_label] = listening_history
where:
- user_label is an integer from 0 to 1474
- listening_history is a list of Spotify URIs
(see code as an example on how to load the data)
## spotify_features.csv
Contains the audio [features](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/) of all tracks in the dataset.
**Format:**
Each row represent a track identified by spotify_uri.
The columns are:
- spotify_uri: URI of the track, as recognized from Spotify
- spotify_popularity: popularity value from 0 to 100 (it varies in time, hence it suggested to update it)
- spotify_popularity: popularity value from 0 to 100 (it varies, it suggested to update it)
- acousticness
- danceability
- duration_ms
......@@ -43,11 +55,11 @@ Columns are:
- tempo
- time_signature
- valence
(see https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/ for description of these features)
4) users_info.csv
csv file
Each row holds information for one user, defined by label.
(see code as an example on how to load the data)
## users_info.csv
Each row represent a user identified by label.
Columns are:
- label: id of the user
- ope: openness
......@@ -58,5 +70,4 @@ Columns are:
- n_les: the original number of listening events of the user fetched from Last.fm (until 25 November 2019)
- n_tracks: the original number of tracks, considering the conjunction of track name, artist name, album name, MusicBrainz id as key.
5)correlation_computer.ipynb
jupyter notebook of the code used for computing the correlations
\ No newline at end of file
(see code as an example on how to load the data)
\ No newline at end of file
File added
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment