README.md 2.15 KB
Newer Older
alessandro's avatar
first  
alessandro committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
### PERSONALITY CORRELATES DATASET ####


> The dataset is based on the MyPersonality dataset (https://sites.google.com/michalkosinski.com/mypersonality), Last.fm data (https://www.last.fm/), and Spotify (https://www.spotify.com/) data.


This zip contains 4 files in total:
- listening_histories.json
- refined_listening_histories.json
- spotify_features.csv
- users_info.csv
- correlation_computer.ipynb

1) listening_histories.json
dictionary with the following structure:

    user_label -> listening_events
- user_label is an integer (or integer string) from 0 to 1474 (1475 keys in total)
- listening_events is a list of the spotify_uris listened from the specific user

N.B. in this dataset, each listening event is defined only in term of the spotify_uri, instead of considering the track name, artist name, album name, and MusicBrainz id. This implies that statistics computed on this dataset will be different from the ones reported in the paper.

2) refined_listening_histories.json
Same structure of (1) but it considers only users with >= 30 listening envents. 
It contains 1350 users.

3) spotify_features.csv
csv file
Each row holds information for one spotify_uri.
Columns are:
- spotify_uri: URI of the track, as recognized from Spotify
- spotify_popularity: popularity value from 0 to 100 (it varies in time, hence it suggested to update it)
- acousticness
- danceability
- duration_ms
- energy
- instrumentalness
- key
- liveness
- loudness
- mode
- speechines 
- tempo
- time_signature
- valence 
(see https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/ for description of these features)

4) users_info.csv
csv file
Each row holds information for one user, defined by label.
Columns are:
- label: id of the user
- ope: openness
- con: consciousness
- ext: extraversion
- agr: agreeableness
- neu: neuroticism
- n_les: the original number of listening events of the user fetched from Last.fm (until 25 November 2019)
- n_tracks: the original number of tracks, considering the conjunction of track name, artist name, album name, MusicBrainz id as key.

5)correlation_computer.ipynb
jupyter notebook of the code used for computing the correlations