What I wish I had known

I recently wrote a blog post for the Britt Anderson Group about personal development in academia and what one could do with a psychology degree and coding skills.

I took two courses, An Introduction to Methods in Computational Neuroscience and Human Neuroanatomy and Neuropathology with Dr. Britt Anderson1, both of which are among the very best formal education experiences I’ve ever had2. I was also lucky enough to publish a paper with him. Very few people have had as much of an impact and influence on my work, personality, and intellect as Dr. Anderson has. I’m fortunate to have met him during my academic career, and was honored to write a post for his lab’s site. He was my mentor during graduate school without even knowing it.

The post is titled “What I Wish I Had Known: Advice from a Former Psychology Graduate Student” and you can read it in full here.

That said, I wanted to quote the first and second-last paragraphs here, because they fit Take no one’s word for it pretty nicely.

I wanted to write about advice I would give my past self while completing my psychology degrees, but before that, I want to begin with a caveat: each person’s journey is different, and the more you can bring yourself to not feel the pressure to follow any person’s particular advice or prescribed steps, the easier it might be to find your path to what comes next. That was hard for me at first, but you get better at it the more you try.

[…]

This post included a lot of dry thoughts and recommendations, so let me end on a different note: the best thing you can do for yourself, no matter your goals or interests, is to realize that you can learn anything and get really good at whatever you set your mind to. It’s never too late, and the lack of formal training is no deal breaker, and might in fact make things easier. The hardest part is to start, and once you do, the second hardest thing is to keep a schedule of learning and practice.

[…]

  1. MD. A real doctor. 

  2. In the first I implemented the Hodgkin-Huxley model of the neuron in C and Excel (yeah…), and in the second I got to handle and examine brains, skulls, and human cadavers donated to scientific research. That last one was a profound experience. 

7f1e81796bb6582c91dc88e4c674ef27aeb2f7c2
Nvim-R for people who love R, vim, and keyboards

For love or money, I write R code almost every day.

Back when most of the R code I wrote was for personal projects or academic research, I worked in RStudio. That’s when I wrote R for love.1

Once I started writing more R code for money, I wanted to find alternatives to RStudio, and unfortunately, there aren’t any. RStudio is without competition in the GUI side of the market.

I don’t remember how I found Nvim-R, but I did and now I can’t imagine working without it. It has become my favorite and default way to write R, no matter the size or the complexity of the script. I use Nvim-R with neovim, but it works with vim too.

Nvim-R screenshot

Install it with your package manager of choice; I use Vundle:

Plugin 'jalvesaq/Nvim-R'

Once you have a .R file in the buffer, you can invoke Nvim-R with the default shortcut <LocalLeader>rf. Nvim-R will open a vim shell window that runs the R console. The configuration I ended up with – see screenshot above and vim configuration code below – puts the source editor in the top left, the object browser in the top right, and the console in the bottom. You can set it up so that hitting space bar in normal mode sends lines from the source editor to be executed in the console. Want to send many lines? This function also works with highlighted lines in visual mode.

Speaking of the object browser: Nvim-R’s object browser is similar to RStudio’s. It shows you the R objects in the environment and updates itself as objects are created and removed. Except I like Nvim-R’s object browser better. It assigns different colors to objects of different types, and it will, by default, expand the dataframes to list their columns underneath them (you can turn that off so that you expand the dataframes you want by hitting Enter on their entry).

I rarely read vim plugin documentation because the features and options I need are often general enough to be described in the READMEs, but if you’re considering using Nvim-R, I highly recommend reading its help/docs. They’re well-written and they’ll help you customize your development environment to your exact specifications: :help Nvim-R.

These are my settings, and they only give a taste of how customizable Nvim-R is:

" press -- to have Nvim-R insert the assignment operator: <-
let R_assign_map = "--"

" set a minimum source editor width
let R_min_editor_width = 80

" make sure the console is at the bottom by making it really wide
let R_rconsole_width = 1000

" show arguments for functions during omnicompletion
let R_show_args = 1

" Don't expand a dataframe to show columns by default
let R_objbr_opendf = 0

" Press the space bar to send lines and selection to R console
vmap <Space> <Plug>RDSendSelection
nmap <Space> <Plug>RDSendLine
  1. Academia doesn’t pay much. See this for more on the subject. 

96704abeb441c0e9e5cfb95ad6cfbf3a198f371f
Recently
Victrola Coffee Roasters, Seattle

Code

I made some good improvements to my journaling script, caplog. So far, like its half-sibling t, this command-line utility seems to be sticking, which makes me happy because I don’t have to worry about anyone acquiring it and forcing me to find another home for all my entries. It’s simple and it delights me.

Reading

Articles

Books

2a6138aa5d80c2f55fc352515a91acf55db37bfb
Academia's troubles

Coast

Someone asked me whether studying cognitive science changed my view of humanity.1

I had to think about it for a while, and my answer — which was only mildly surprising to me — was that instead of changing my views about humanity, it mostly changed my views about science and research. Specifically, that it made me a lot more skeptical whenever anyone claims that “science”, “research”, or studies have shown anything, say, about humanity.

I left academia for two reasons: first is that academia2 is extremely competitive; the jobs are few, the applicants are many3, and I didn’t want it as much as I saw other people did. Second is that soon after I started graduate school I started to become very disillusioned with the field, its practices, and its incentives. Eventually I became so cynical about a career that I hadn’t even started yet that I knew I should never try to start it in the first place.


Academia has a serious problem wherein it runs on a system of incentives that rewards bad scientists and pushes good scientists out. At risk of weirding you out by commenting on my own writing, that is a remarkable statement. I just told you that academia rewards bad scientists.

Most of this is not news. Every once in a while the BBC or The Economist will talk about the replication crisis and bad incentives in science, or the file-drawer problem, or p-hacking. But I think the enormity of the problem escapes the majority of people.

Here’s what I think you should do. Find a slow weekend morning or afternoon, make yourself a pot of coffee, and spend an hour or two reading Retraction Watch and Andrew Gelman’s site. I used to subscribe to Retraction Watch’s RSS feed but ended up unsubscribing because it was too prolific and I couldn’t keep up. Things are so busy over there that they publish a weekly weekend reads post that you could not possibly finish reading in a weekend unless you had absolutely no other plans.


Here is the life cycle of an academic. There is very little variation in this cycle:

Undergraduate degree (do a thesis or something)
➝ Grad school, do a Master’s
➝ More grad school, do a PhD
➝ Almost definitely a Post Doctoral fellowship4

You remain in the Post Doc holding pattern until you find a job in a university or college. The dream is to land a tenure track position in a research-intensive institution. The reality is people are increasingly taking lesser and lesser positions because the demand for those dream tenure track jobs far outpaces the supply.

Landing a job, especially a good one (the tenure track and get-to-do-some-research-and-not-just-teach kind) is now determined by one factor and one factor only: publications.5 Publish or perish is not a joke, it is the law of the land. Most important is your past publications count and the likelihood that you will be as productive if not more so in the future. Those are also second and third most important. Fourth most important is the prestige of the journals where you get published. Are you publishing in Psych Science or in Frontiers? Makes some difference. I don’t know if anyone actually reads your papers or whether the quality of your writing, methodology, or, you know, science, factors heavily.

Here is a list of things that, again, with possibly very few exceptions, mean absolutely nothing for your prospects at getting a job:

  1. You champion open science.
  2. You write blog posts about experiments that haven’t worked out, or interesting statistical issues or practices.
  3. You contribute to open source projects, or statistical or data visualization packages.
  4. You are an active mentor and are generous with your time with students or peers.

If there isn’t a publication coming out of it, it doesn’t matter.

It’s not hard to see what a system like that does to the quality of the scientific process. Science is misunderstood by many. Science is not certain, experiments require care, results are not guaranteed, and theories are sooner or later wrong. It takes time to do something right, and you can still end up with a null result; and no amount of hard work could have made a positive finding more likely. But you want a job, so you will do everything you can to end up with a positive result anyway. You will choose topics that are in fashion, you will try to choose easy experiments that can be published regardless of the result6, and yes, you will operate under the constant pressure to make your t-test or ANOVA give you a p-value less than .05, and you might justify bending the rules of statistics to get it.

The better researchers, the ones who prioritize good theory and well-designed experiments and analyses are at a disadvantage. They will not try to publish shoddy studies and will not squeeze a result where none exists. They will want to run the experiments that will help the field choose between competing theories and make progress instead of running the experiment that will produce the 35th uninformative but curious interaction and publish that instead. And for those ideals, they will pay.

I am not a very social person and I never created a big network of academics when I was a student, and yet I personally know several incredible researchers who ejected out of academia as they saw their academic career prospects shrink and ended up in industry; where they are valued get paid way more for their skills than in the field where they would prefer to be. What a tragedy.


In case you decide to not visit Andrew Gelman’s site – your loss, really – I’ve plucked out an example for you.

The absolute minimum background information you need to know is that psychology, especially social psychology, has been going through a replication crisis in which many popular and thought-to-be-bulletproof findings like ego depletion and power pose do not replicate.

The news and internet have not been kind to psychology during this tumultuous time, nor should they have been.

Susan Fiske, a social psychologist and past president of the Association for Psychological Science, wrote an article titled Mob Rule or Wisdom of Crowds? (PDF download) in which she writes one of the most ill-advised opinion pieces I’ve ever seen by an academic. In it, she – sigh, there is no other way to say this – rants and rails against “online vigilantes” and “self-appointed data police” “volunteering critiques of such personal ferocity and relentless frequency that they resemble a denial-of-service attack that crashed a website by sheer volume of traffic”.

I don’t know what the “website” is supposed to be here, but the idea that academics are suffering a denial of service attack from online critics is laughable. I know of few other institutions that live in their own protected bubble as the psychology department. The whole article is a shame and I am embarrassed on her behalf.

Even though this article was invited by the APS’s Observer, it seems the reaction was so negative that it was never published, and you might find it tricky to find a copy online. The link above gives you the PDF hosted on my own site, plus alternate link 1 and alternate link 2.

The Observer posted an unsurprisingly spineless comment on the issue, including this amazing final paragraph:

Those wishing to share their opinions on this particular matter are invited to submit comments in the space below. Alternatively, letters can be sent to apsobserver@psychologicalscience.org. We ask that your comments include your full name and affiliation.

Yes, please include your affiliation, lest we forget for a moment to judge the value and validity of your comment according to the authority of whether you’re professor or just a normie.

In one of Andrew Gelman’s comments on the issue, aptly titled “What has happened down here is the winds have changed”, he writes:

In her article that was my excuse to write this long post, Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.

I couldn’t have said it better.


Back to the original question (which I will rephrase to maintain the flow of the story I’m telling here): how has studying cognitive psychology changed my view on things?

For one, I find myself in the uncomfortable position of seeming like a contrarian to two opposing groups: authoritarian conspiracy theorists who assume all scientists are malicious liars with an agenda7, and the intellectual, educated but non-scientific class that has confused the platonic concept of “science” with the practice of scientific inquiry, and therefor defends anything with the smell of science as unquestionable and unassailable. It’s much easier, in fact, to deal the first group. It’s the second one that depresses me.

Call them what you like, the intellectuals, the elite, the educated class, whatever, I’m not necessarily a fan of any of those terms. They are relatively affluent, internet-savvy people, who, in their attempt to fight back against anti-reason trends in the West (often socially conservative right-wing groups, although by no means all), have grossly overcorrected and now defend any output of human research as sacrosanct truth beyond reproach.

I won’t mince words, this is worship of authority. People confuse the day to day practice with the scientific method itself, and treat the results of the practice as though it was perfect and the output guaranteed by the sanctity of the lab coat and the scatterplot.

In another response to the Susan Fiske article titled “Weapons of math destruction”, NeuroAnaTody writes:

Science is moving forward so quickly that I don’t even think it’s necessary to point out ways in which the article is wrong. I will instead list a some elements of the scientific revolution that trouble me, even though I consider myself a proud (if quite junior) member of the data police.

  1. Belief in published results. I have so little of it left.
  2. Belief in the role of empirical research. Getting to otherwise hidden truths was our thing, the critical point of departure from philosophy.
  3. Belief in the scientific method. I was taught there is such a thing. Now it seems every subfield would have been better off developing its own methods, fitted to its own questions and data.
  4. Belief in statistics. I was taught this is the way to impartial truths. Now I’m a p-value skeptic.
  5. Belief in the academic system. It incentivizes competition in prolifically creating polished narratives out of messy data.

Emphasis mine, because point 1 is the headline for me. Belief in published results, I have so little of it left. That is how studying cognitive psychology has changed my view on things. Whenever I hear of a study that “showed something”, and especially if it’s in the field of psychology, my assumption is that it’s spurious.

So what am I telling you? Science is permanently broken and we are left rudderless in a sea of claims and counter-claims?

No, that would be confusing the practice of science with the scientific process as it should be, the same mistake I think the study-worshipper makes. The scientific method is still the best way we have to approach the truth about the world, we just need to set up the incentives to encourage following it better.

In the meantime, I think you should be extremely skeptical of everything you hear, which is is an uncomfortable position but is not an option at this point. The “study” goes through many stages on its way from having touched the truth and transformed into data, to your eyes. It has gone through an experimental design (done by a human), data collection (done by a different human), analysis (possibly done by a third different human), write-up (one or more humans), review (I will stop mentioning that things are done by humans), and interpretation by a journalist or reader.

In all of these stages, a human makes a judgement call to the best of their abilities, and as all other humans, they operate under pressures and incentives. Speaking of incentives, I am also telling you that academia is not the world of enlightened philosopher kings and queens operating outside the realm of dirty wants and desires the rest of us live in. Academics operate within a terrible, broken system of incentives, and you must keep that in mind whenever you’re consuming their research.

The other message I want to leave you with is that academia is broken and I don’t see it being fixed any time soon. It won’t be fixed until academics are evaluated based on more than their number of publications. It won’t be fixed until hiring committees stop looking at how many papers you’ve published and start looking at the quality of your contribution to knowledge. Yes, it’s much harder to decide whether you created knowledge and contributed to theory than it is to look at your impact factor, but that’s what has to happen. That’s it.

  1. Someone else pointed out that it would be difficult for me to answer that question because I didn’t know an alternative life where I didn’t study cognitive science and what my view of humanity would be in that world. But the question is still valid because I can compare to before I started studying cognitive science, or reflect on how my views changed during the study. 

  2. My personal academic experience is in psychology, but the points I make in this post generalize to all disciplines as far as I know. 

  3. Many. The ratio is quite bad. You shouldn’t necessarily trust the numbers from any of those articles, but you can conclude that the picture is bleak for anyone who is in the crème de la crème of their field, and hopeless for everyone else. 

  4. A Post Doc is not a student. They are an employee who is often an independent researcher in a lab and get paid way too little. 

  5. There might be exceptions to this, but they are exceptional exceptions. 

  6. I have personally received this piece of advice, explicitly, more times than I can count. 

  7. I think scientists do have an agenda that we ought to acknowledge more. They are, after all, people. 

8dc096684d17f0e150ac2d5891f3552fa8228265
Recently: New World

Recently I changed jobs, cities, and countries. Everything is new.

Code

I write a lot more lines of code per day than I used to, and it’s never been less public. That’s one of the many new things I’m getting used to. I’m having a lot of fun, and I’m trying to figure out how to bring some of that back here.

Reading

Articles

Books

e0b9c2ea7ae8340f472ae59cc582f4322bae4404
'Take no one's word for it' in 2016

“has it been a year already!?”

I hear James Stacey say into my ears as I stare at this draft wondering how to start.1

Unlike with most previous years, I did not have this experience with 2016. This has been a good year for me; I did a lot, and made plenty of progress personally and professionally. My subjective experience is not that it went by too fast, but that the passage of time feels just right.

The knock on new year’s resolutions is that they encourage you to wait until a seemingly arbitrary moment in time before you make a big change or do something to make your life better. Another knock is that this encourages you to attempt large changes instead of piecemeal changes, which increases the amount of discipline required for success, and therefore increases the chances of failure. Larger changes would happen less frequently, and that makes error-correction harder.

I think there is truth in there, but as with a lot of things people criticize today, the criticism loses a lot of nuance or selectivity and becomes absolute. You shouldn’t wait until new year’s to make your life better, but setting checkpoints for retrospectives and projections at regular intervals is useful. New year’s is arbitrary, but no more arbitrary than any other time or date if you don’t have better reasons for them. Just make sure you’re not using it as an excuse to procrastinate.

Personally I think an annual cycle is too infrequent for most stock-taking and revising stops. You can start a cycle on January 1st, but make it triannual or quarterly. Or choose your own date if January 1st is too problematic for you.

Last year I said I wanted to write more, and that’s the closest I’ve come to making a “resolution”. I like writing for what it helps me learn and get better at, including writing itself, and quantity should only increase when it’s a mean, not an end.

I’m happy with how 2016 turned out for Take no one’s word for it, and am tickled pink to share the visualizations for the year.

Posts by month and year

Total posts by year

Words by month and year

Total words by year

Future

I’m moving to a new country and starting in a new research scientist role in 2017, and one way or another I think that will affect my writing here. What I hope will happen is that I’ll be able to write more about science and data as I learn more things faster in my new position.

I’m excited.

  1. I don’t usually listen to podcasts when I write, but I wanted to get myself into a certain mindset. 

6b0321568b96a8a15c2fc07d3c3cb5f40c9a8cc0
tmux workspace scripts

tmux describes itself as a “terminal multiplexer”. Pleasantly, it goes on to explain what that means:

It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal. And do a lot more.

The way I would describe it is that tmux runs terminal sessions independently of the terminal window you’re viewing those sessions in. This means that you can do some work in a tmux session, close the terminal window, or “detach” from the tmux window, and later reattach to the tmux session and find your work, tmux windows, and tmux panes exactly as you left them.

tmux windows and panes1 are the other features I really appreciate about tmux, in addition to the great ability to detach and close terminal windows without killing the work or processes running in the tmux session. Each pane within each window is a separate shell session.

I’ve been using tmux for a few years (I think), but until recently, my use had reached a plateau: I would manually start a tmux session, and start creating windows and manually splitting them into panes as I need for my work. When done, I would, inefficiently, start entering a bunch of exit commands to close all the panes one by one, until closing the last one kills the tmux session.

I was setting up a complicated workspace for simplestatistics when I thought to look into the possibility of writing a script that I could run to set up all the windows and panes I need. Unsurprisingly, it is possible, and great.

This is the finished simplestatistics tmux workspace script in its current form. You can find an up-to-date version of it here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# !/usr/local/bin/fish

# detach from a tmux session if in one
tmux detach > /dev/null ^ /dev/null

# don't set up the workspace if there's already a simplestatistics session running
if tmux list-sessions -F "#{session\_name}" | grep -q "simplestatistics";
	echo "simplestatistics session already running"
else
# okay no simplestatistics session is running

cd ~/projects/simplestatistics
tmux new -d -s simplestatistics

# window 0 - main
tmux rename-window main

# set up window 1 - documentation
# - index.rst
# - README.md
# - __init__.py
# fourth empty pane
tmux new-window -n documentation

tmux split-window -h -p 45
tmux select-pane -t 0
tmux split-window -v
tmux select-pane -t 0
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/" C-m
tmux send-keys "vim __init__.py" C-m

tmux select-pane -t 1
tmux send-keys "cd ~/projects/simplestatistics/" C-m
tmux send-keys "vim README.md" C-m

tmux select-pane -t 2
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/" C-m
tmux send-keys "vim index.rst" C-m
tmux split-window -v

# set up window 2 - changelogs
tmux new-window -n changelogs
tmux send-keys "cd ~/projects/simplestatistics/" C-m
tmux send-keys "vim changelog.txt" C-m

tmux split-window -h
tmux send-keys "cd ~/projects/simplestatistics/" C-m
tmux send-keys "vim HISTORY.rst" C-m

# back to window 0 - main
# 2 vertical panes: both will be used to edit main statistics functions
tmux select-window -t 0
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/statistics" C-m
tmux send-keys "ls" C-m
tmux split-window -h
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/statistics" C-m

tmux select-pane -t 0
tmux split-window -v
tmux send-keys "cd ~/projects/simplestatistics" C-m
tmux send-keys "bpython" C-m
tmux select-pane -t 0

tmux attach-session -t simplestatistics
end

If you attempt to start a session within a session, tmux warns you that sessions should be nested with care, and nesting sessions is not something I want to do anyway, but I want the ability to start session Y and attach to it while in session X. So lines 3 ➝ 4 attempt to detach from a tmux session, and sends normal and error output to /dev/null. If I’m attached, it detaches me before creating the session, and if I’m not, it fails silently.

Lines 6 ➝ 9 check to see if there’s already a running session named simplestatistics and stop execution with a message that reads "simplestatistics session already running" if it does find it.

Lines 12 ➝ 65 do the work of creating the workspace, which is made up of three windows.

window 1 - documentation

The second window (tmux windows are zero-indexed) contains the panes I use to edit and generate documentation for simplestatistics. The right pane is created with 45% of the window width.

Clockwise from top left:

  • __init__.py To add the new function I’m working on.
  • index.rst The main documentation page for Sphinx.
  • README.md
  • A shell for generating documentation.

window 2 - changelogs

Opens two versions of the changelogs in vim:

  • changelog.txt A Markdown-based changelog for all reasonable persons and machines.
  • HISTORY.rst A restructured version for PyPi.

window 0 - main editing

The layout is a bit unusual. The top left and entire right are listings of the directory that contains the function files. I use the big right pane to work on the new function, and the left one for general shell work and references.

The bottom left pane runs bpython for interactive testing.

Closing notes

If you work in the terminal and don’t use tmux, consider using it. It’s so nice to have several workspaces that never die until you kill them. If you do use tmux and often end up with complicated workspaces, consider scripting them!

  1. The terminology here is confusing: windows are actually tabs, their names appear at the bottom of the window, and they contain panes arranged in different layouts. It would make more sense to rename windows ➝ tabs, and rename panes ➝ windows. 

126cd334e6703ad8d4fbff7c62f0cae8367d6477
Sanitizing dirty Medium links on Pinboard with R

I’ve been on a Pinboard API roll lately. In hindsight it’s not surprising since I use Pinboard so much. Today’s post is another one in which I use R and the Pinboard API to fix a wrong in the world.

Problem

Have you ever noticed those Medium post links? Here’s an example:

https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43#.ncvbvfg3r

See that #.ncvbvfg3r tacked on the end? I noticed it a while ago, and I’m not the only one. That appendage tracks referrals, and I can imagine it allows Medium to build quite the social graph. I don’t like it for two reasons:

  1. Hey buddy? Don’t track me.
  2. It makes it difficult to know if you’ve already bookmarked a post because it’s likely that if you come across the post again, its url is not the same as the one you already saved. When you try to save it to your Pinboard account, it won’t warn you that you already saved it in the past.

You can find a discussion about this on the Pinboard Google Group.

Maciej Cegłowski, creator of Pinboard, was reassuringly himself about the issue:

I think the best thing in this situation is for Medium to die.

Should that happen I will shed few tears. I don’t want Medium to die, but they need to get better. In the meantime, they exist and I have to fix things on my end.

(½) Solution

I wrote a script that downloads all my Pinboard links, and removes that hash appendage before saving them back to my Pinboard account.

This is half a solution because it only solves reason 1, the tracking. Each time I visit or share a sanitized link, a new appendage will be generated, breaking its connection to how I came across the link in the first place.

It doesn’t solve reason 2 – if I had already saved a link to my Pinboard account, and then come across it again and try to save it, having forgotten that I already did so in the past, Pinboard won’t match the urls since the one it has is sanitized. Unless Maciej decides to implement a Medium-specific feature to strip those tracking tokens, there’s not much I can do about that.

First, let’s load some libraries and get our Pinboard links.

library(httr)
library(magrittr)
library(jsonlite)
library(stringr)

# My API token is saved in an environment file
pinsecret <- Sys.getenv('pin_token')

# GET all my links in JSON
pins_all <- GET('https://api.pinboard.in/v1/posts/all',
                query = list(auth_token = pinsecret,
                             format = 'json'))

pins <- pins_all %>% content() %>% fromJSON()

I load my API token from my .Renviron file, use the GET() function from the httr package to sent the GET request for all my links in JSON format, and then convert the returned data into a data frame by using content() from httr and piping the output to the fromJSON() function from jsonlite package.

Let’s examine the pins dataframe:

pins %>% 
    select(href, time) %>% 
    head() %>%  
    knitr::kable()

Which gives us:

href time
https://twitter.com/Samueltadros/status/800208013709688832 2016-11-20T14:23:11Z
http://gizmodo.com/authorities-just-shut-down-what-cd-the-best-music-torr-1789113647 2016-11-19T15:21:06Z
http://www.theverge.com/2016/11/17/13669832/what-cd-music-torrent-website-shut-down 2016-11-19T15:18:33Z
http://www.rollingstone.com/music/news/torrent-site-whatcd-shuts-down-destroys-user-data-w451239 2016-11-19T15:16:16Z
https://twitter.com/whatcd/status/799751019294965760 2016-11-18T23:56:23Z
https://twitter.com/sheriferson/status/799761561149722624/photo/1 2016-11-18T23:49:49Z

Let me break down that last command:

  • Start with pins dataframe.
  • Pipe that into select(), selecting the “href” and “time” columns.
  • Pipe the output into head() which selects the top (latest, in this case) 5 rows.
  • Pipe the output into kable() function from the knitr package, which converts the dataframe into a Markdown table.

That last part is very handy.

Now we have all our links, let’s select the ones for Medium links.

medium <- pins %>%
    filter(str_detect(href, 'medium.com'))

Again, let’s break it down

  • Store into medium the output of…
  • Piping pins into the filter() function from dplyr package.
  • Piping the output of that into filter() function, which is using str_detect() from the stringr package to search for “medium.com” in the “href” column.

Checking the medium dataframe shows…

href time
https://medium.com/something-learned/not-imposter-syndrome-621898bdabb2 2016-10-25T18:50:36Z
https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43#.ncvbvfg3r 2016-10-11T06:15:48Z
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.by7z0gq33 2016-10-02T01:07:24Z
https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db#.nmev5f200 2016-09-19T23:35:16Z
https://medium.com/@akelleh/a-technical-primer-on-causality-181db2575e41 2016-09-07T16:30:57Z

Now, this looks like it worked, but I’m paranoid. It’s possible that the filtering caught links that have domains that end with “medium.com” but are not Medium links.

I want to be more careful, so I’ll use a function that I used before to extract the hostname from links.

get_hostname <- function(href) {
  tryCatch({
    parsed_url <- parse_url(href)
    if (!parsed_url$hostname %>% is.null()) {
      hostname <- parsed_url$hostname %>% 
        gsub('^www.', '', ., perl = T)
      return(hostname)  
    } else {
      return('unresolved')
    }
    
  }, error = function(e) {
    return('unresolved')
  })
}

pins$hostname <- map_chr(pins$href, .f = get_hostname)

medium <- pins %>%
    filter(hostname == 'medium.com')

This is dataframe of Medium links that I am more confident about.1

Now! Let’s remove that gunk.

medium$cleanhref <- sub("#\\..{9}$", "", medium$href)

That’s all. A quick regex substitution to remove the trailing hash garbage.

Old links Clean links
https://medium.com/something-learned/not-imposter-syndrome-621898bdabb2 https://medium.com/something-learned/not-imposter-syndrome-621898bdabb2
https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43#.ncvbvfg3r https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.by7z0gq33 https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471
https://medium.com/@joshuatauberer/civic-techs-act-iii-is-beginning-4df5d1720468 https://medium.com/@joshuatauberer/civic-techs-act-iii-is-beginning-4df5d1720468
https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db#.nmev5f200 https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db
https://medium.com/@ESAJustinA/ant-to-advance-data-equality-in-america-join-us-were-hiring-developers-and-data-scientists-147f1bfedcb5#.mh8dpuqz9 https://medium.com/@ESAJustinA/ant-to-advance-data-equality-in-america-join-us-were-hiring-developers-and-data-scientists-147f1bfedcb5

Now we need to put this data back into the I N T E R N E T.

As far as I can tell reading the Pinboard API2, there’s no way to update a bookmark in-place with a new url. The best way to do this is to delete the old bookmarks and add the new ones with the tags, shared, to-read status, and date-time information of the old ones.

This is the dangerous part. I want to be as careful as possible. I want to store the https responses for each deletion and addition, and just so I don’t anger the rate-limiting gods, I will inject a 5 second delay between requests. 5 seconds is probably overkill, but this isn’t production code, it’s a personal thing and I don’t mind waiting.

medium$addition_response <- vector(length = nrow(medium))
medium$deletion_response <- vector(length = nrow(medium))

for (ii in 1:nrow(medium)) {
    deletion <- GET('https://api.pinboard.in/v1/posts/delete',
                    query = list(auth_token = pinsecret,
                                 url = medium$href[ii]))
    
    medium$deletion_response[ii] <- deletion$status_code
    
    addition <- GET('https://api.pinboard.in/v1/posts/add',
                    query = list(auth_token = pinsecret,
                                 url = medium$cleanhref[ii],
                                 description = medium$description[ii],
                                 extended = medium$extended[ii],
                                 tags = medium$tags[ii],
                                 dt = medium$time[ii],
                                 shared = medium$shared[ii],
                                 toread = medium$toread[ii]))
    
    medium$addition_response[ii] <- addition$status_code
    
    Sys.sleep(5)
}

A quick inspection of the deletion and addition response codes reveals nothing but sweet, sweet 200s. A quick inspection of the Medium links on my Pinboard account reveals clean, shiny, spring-scented urls.

The full code is available as a gist here and embedded below:

  1. The dataframe created using the hostname extraction function has the same number of rows as the one created with a simple grep of “medium.com”, which means it probably wouldn’t have been a problem to stick with the earlier solution. The second solution is still a lot better. 

  2. … which is a link that must be a record-holding in the number of times I’ve linked to it from this site. 

286b2ace4ad82480babc05c37be135877a9d86b9
Archive