X button icon

Jasmine Nackash is a multidisciplinary designer and developer intereseted in creating unique and innovative experiences.

Week #02

Traceroute analysis

Quick note — I've found it more helpful to write by blog posts as I'm doing the work and not after. Kindly take this into consideration while reading :-)

Initial probing

I started with just running the traceroute command on a bunch of urls I visit often and see what I get. I noticed that after a few hops it almost always starts outputting asterisks — in class it was mentioned that asterisks mean there's a problem, a trouble accessing the next path. I thought this would only happen if the address is actually down, but the addresses I tried were very much alive when accessing through the browser. After reading some more I learned that there could be a number of reasons for the traceroute command to result in asterisks, namely: 

  1. Many routers or network providers are configured to either ignore or deprioritize packets sent by the traceroute command so as to reduce network load.
  2. Security measures / firewall configured to block responses from being returned too.
  3. Might be a limit to the rate or timeout, or it might mean the destination has been reached? From my understanding, the traceroute command is sending a little query it expects to get an answer for with a progressively increasing time value — this number is basically how many nodes said packet can go through the network to reach the destination. First hop is just 1, so it returns almost immediately, and so on. The node limit increases with every hop until the destination is reached — at which point the destination server might respond differently or not at all, hence the asterisks(?)

I used the Traceroute Mapper tool to visualize some routes on a map:

It seemed like all routes always go through two nodes: one in Hicksville, and one in Rhinebeck. I know Optimum (my ISP at home) has facilities in Hicksville and I guess Rhinebeck appears to be another central node in Optimum's network infrastructure, from which there might be a direct path to the destination? I also realize that the numbers are corresponding to the hops (max number does correspond to the number of entries on the route printed) but not sure why I'm not seeing all the points in between?

Put a few routes into a spreadsheet to try and get a better look at what's going on.

I realized the first two hops are basically me (my router and inner network?)— as they remained the same with every traceroute. The next few nodes are mostly the same for all routes and they lead to my ISP. Some lead to Hicksville and some to Rhinebeck (some hops included multiple IPs, and I learned its probably Load Balancing — "...a technique used to distribute network traffic across multiple routers or network interfaces to ensure that no single path becomes overloaded"). Anyway, the yellow cells are the first nodes per route that go somewhere "outside" — out of my ISP's network:

  • Github goes to a different ISP (Arelion) and eventually does reach some IP that is indicative of github (hence the green-colored cell).
  • Google goes to... Google
  • itp.nyu goes to Level 3 (another ISP, a pretty big one — mostly used by big players like universities and banks as we've read about in Networks of New York).
  • All others ended while still being in the Optimum / Cabelvision network, which I assume means whatever's next has blocked the traceroute packets...

Browsing History

Then I was interested in getting my browsing history. First I had to get that file — I googled and asked ChatGPT and got general instructions. I'm not using chrome but by following a similar path I located the history file for my browser (Arc — which is chromium-based too).

Tried opening that file but it turned out to be quite heavy (108MB) and illegible for the most part. That file was created on January 27, 2023 so it has been collecting my history for quite a while (maybe in the future I could get that file for every browser I've used in the past 10 years and see trends and differences in my browsing habits over time?).

Opening the history file in VSCode resulted in this frightening thing. Fortunately someone made the effort to mention the format in text, so after some googling I was able to install and use the SQLite3 command line library that should allow me to read this file

I then looked up ways to do that with the SQLite library and managed to print my top 50 most visited sites, which... did not make much sense to me. Some of the entries — like Gmail, YouTube, and clients' websites — make total sense. But a lot of others, like some spreadsheets for students to sign up for office hours with me, or some address on Google Maps — there's no way I visited these urls so many times! This might be an Arc (my browser) thing. It is possible it keeps some favorites accessible at all times and that could explain why it might be loading some urls all the time even if I never requested to open them.

This feels almost too private to share... And apparantly my ITP admission doc is pretty high up there...

I was wondering what other information I could get and asked ChatGPT for help with the commands. It helped me get a list of visits per hour from which I made a chart because why not

This also didn't feel quite right. After reading a little I found out that all time is in UTC (+4 hours from NY, or -3 hours from TLV — the main locations where I've been browsing from) and that in order to get accurate data I would need to do some conversions...

I came across visit_duration which shows how long the web page was open in my browser. This is not the time I was actually engaged with it though. The whole history document is 594 days old — out of which 427 I've had my Gmail open (~72%) which is honestly kind of wild. I guess Gmail was open for pretty much the whole time I was awake during the last 594 days.

I wanted to make a map of "my network" by just putting screenshots from the Traceroute Mapper one on top of each other but the scale was too large to see anything useful, so I created a Google Maps list instead. As I was looking through the places on Maps I saw most of the locations seem like commercial buildings, and I get why the companies are not listed on the map, but one location was just this house in Jersey... And I doubt the route to facebook goes through a hidden server farm in this place — the whole area seems to be entirely residential...

Final thoughts (mostly questions)

  • Well, obviously I'm wondering how accurate all these lookup methods are. I mean, the IP addresses should be correct (not taking into account spoofing or whatever) but then location doesn't seem to always make sense...
  • And what significance is there to knowing a web address' ISP and AS? Besides debugging when something stops working, I mean.
  • If I were to have my own server, that I host myself, I would still need to connect at some point to a bigger ISP, right? But at what point? Would traceroute on my website's address lead to me? Or to my ISP?
  • I thought it was interesting to see that 7 out of my top 10 visited websites are a google doc. And another one is Gmail. I wonder how similar or different others' lists are. I would never have guessed it would be this way (I would have definitely guessed Maps but it's not on the list!). It feels kind of dystopian thinking how much of my time is spent in front of Google's interfaces...

Seen in the wild