The Three Sexy Skills of Data Geeks
i’ve only worked with fairly small data sets thus far (nothing requiring regular expressions or multiple machines) but it’s pretty dead on.
i’d like to note that the second one in particular, data “munging”, is no joke. when i did the sex charts, the craziest parts were filtering the data over and over again. since it’s hard to comprise a consistent table from various unrelated people about their entire sexual history, it took a lot of elbow grease. that means by hand and with pen/paper.
it went something like this:
- receive filled out survey with a tabular list of experiences + associated data.
- sort them chronologically
- sort them by year > month
- sort them by overlaps, for each individual month (for mult. people)
(link courtesy of atmos)
Uhhh munging is 50/50 for me. I’ve been working on a tool recently that allows you to “scrape” Google & Amazon for metainfo on books, getting clean data is the hardest part - is there are so many potential formats for everything. Finding the right hook tags in the page to get the end result is even more annoying. Sometimes there are no unique tags or strings and you have to base if off length, which being variable - is a pain.
-
berezina liked this
-
3countylaugh liked this
-
chrispazen liked this
-
deplorableword liked this
-
inky liked this
-
breefield reblogged this from nikography and added:
-
nikography reblogged this from atmos and added:
-
atmos posted this