week #9
This week I continued my D3 explorations. Last week I basically followed a tutorial and made some adjustments, so that went fairly smoothly. But this week my biggest challenge was starting — everything I could think of felt either not interesting enough or too complex to achieve (and I don't know enough about D3 yet to even know what's possible). I like dealing with topics that affect how people make up their minds on things — that's kind of broad, but I figured I could do something with the NYT API. So I started by signing up and getting all kinds of data.
The problem was it was all textual data, and D3 mostly works with numerical data — I would need to count something, or compare things — ideally over a long(er) period of time for the data to be meaningful. Might get back to this at some point but for now I decided to pivot because I'm still just learning the ropes of D3.
One of the first ideas I had when I wanted to learn D3 were about looking at keyboard characters' frequency in all kinds of things (from passwords to baby names to stock market acronyms). I decided to give it a try and looked for databases of compromised passwords that I could use for analysis. I found an overwhelming amount of such databases. I started with a short list just to get going but ended up using a 100,000 line list. The idea was to show the keyboards' keys on the screen, and have them colored differently to represent their occurrence in a given dataset. I started with just making one simple greyscale keyboard where each key gets darker the more it's used in passwords. I then added a second keyboard, visualizing the occurrence of each character used for the first letter of a given password. My main goal was to set it in a way that is scalable and that would easily let me visualize different character occurrences.
The next thing I would like to do is figure out a better way to scale the data. Most keys don't get used so often, but a few keys get used a lot — visualizing this is quite difficult because the gap between the two extremes wash out most of the keys. Last week I used a quantized scale because that made sense with the data, but I'm not sure it makes sense here. A logarithmic scale either didn't work because I did something wrong or wasn't a good solution anyway. I'd like to read more and understand these things better so I can make informed decisions regarding visualizing the data in a meaningful way. So I'll start from that next week.
I'll try to have very open ended questions that I could use to infer whether my goals are being met or not, like: