Nvim-R for people who love R, vim, and keyboards

For love or money, I write R code almost every day.

Back when most of the R code I wrote was for personal projects or academic research, I worked in RStudio. That’s when I wrote R for love.1

Once I started writing more R code for money, I wanted to find alternatives to RStudio, and unfortunately, there aren’t any. RStudio is without competition in the GUI side of the market.

I don’t remember how I found Nvim-R, but I did and now I can’t imagine working without it. It has become my favorite and default way to write R, no matter the size or the complexity of the script. I use Nvim-R with neovim, but it works with vim too.

Nvim-R screenshot

Install it with your package manager of choice; I use Vundle:

Plugin 'jalvesaq/Nvim-R'

Once you have a .R file in the buffer, you can invoke Nvim-R with the default shortcut <LocalLeader>rf. Nvim-R will open a vim shell window that runs the R console. The configuration I ended up with – see screenshot above and vim configuration code below – puts the source editor in the top left, the object browser in the top right, and the console in the bottom. You can set it up so that hitting space bar in normal mode sends lines from the source editor to be executed in the console. Want to send many lines? This function also works with highlighted lines in visual mode.

Speaking of the object browser: Nvim-R’s object browser is similar to RStudio’s. It shows you the R objects in the environment and updates itself as objects are created and removed. Except I like Nvim-R’s object browser better. It assigns different colors to objects of different types, and it will, by default, expand the dataframes to list their columns underneath them (you can turn that off so that you expand the dataframes you want by hitting Enter on their entry).

I rarely read vim plugin documentation because the features and options I need are often general enough to be described in the READMEs, but if you’re considering using Nvim-R, I highly recommend reading its help/docs. They’re well-written and they’ll help you customize your development environment to your exact specifications: :help Nvim-R.

These are my settings, and they only give a taste of how customizable Nvim-R is:

" press -- to have Nvim-R insert the assignment operator: <-
let R_assign_map = "--"

" set a minimum source editor width
let R_min_editor_width = 80

" make sure the console is at the bottom by making it really wide
let R_rconsole_width = 1000

" show arguments for functions during omnicompletion
let R_show_args = 1

" Don't expand a dataframe to show columns by default
let R_objbr_opendf = 0

" Press the space bar to send lines and selection to R console
vmap <Space> <Plug>RDSendSelection
nmap <Space> <Plug>RDSendLine
  1. Academia doesn’t pay much. See this for more on the subject. 

c4b19f278c80ef84cdae887e579db8d6dadda8e0
Recently
Victrola Coffee Roasters, Seattle

Code

I made some good improvements to my journaling script, caplog. So far, like its half-sibling t, this command-line utility seems to be sticking, which makes me happy because I don’t have to worry about anyone acquiring it and forcing me to find another home for all my entries. It’s simple and it delights me.

Reading

Articles

Books

2a6138aa5d80c2f55fc352515a91acf55db37bfb
Academia's troubles

Coast

Someone asked me whether studying cognitive science changed my view of humanity.1

I had to think about it for a while, and my answer — which was only mildly surprising to me — was that instead of changing my views about humanity, it mostly changed my views about science and research. Specifically, that it made me a lot more skeptical whenever anyone claims that “science”, “research”, or studies have shown anything, say, about humanity.

I left academia for two reasons: first is that academia2 is extremely competitive; the jobs are few, the applicants are many3, and I didn’t want it as much as I saw other people did. Second is that soon after I started graduate school I started to become very disillusioned with the field, its practices, and its incentives. Eventually I became so cynical about a career that I hadn’t even started yet that I knew I should never try to start it in the first place.


Academia has a serious problem wherein it runs on a system of incentives that rewards bad scientists and pushes good scientists out. At risk of weirding you out by commenting on my own writing, that is a remarkable statement. I just told you that academia rewards bad scientists.

Most of this is not news. Every once in a while the BBC or The Economist will talk about the replication crisis and bad incentives in science, or the file-drawer problem, or p-hacking. But I think the enormity of the problem escapes the majority of people.

Here’s what I think you should do. Find a slow weekend morning or afternoon, make yourself a pot of coffee, and spend an hour or two reading Retraction Watch and Andrew Gelman’s site. I used to subscribe to Retraction Watch’s RSS feed but ended up unsubscribing because it was too prolific and I couldn’t keep up. Things are so busy over there that they publish a weekly weekend reads post that you could not possibly finish reading in a weekend unless you had absolutely no other plans.


Here is the life cycle of an academic. There is very little variation in this cycle:

Undergraduate degree (do a thesis or something)
➝ Grad school, do a Master’s
➝ More grad school, do a PhD
➝ Almost definitely a Post Doctoral fellowship4

You remain in the Post Doc holding pattern until you find a job in a university or college. The dream is to land a tenure track position in a research-intensive institution. The reality is people are increasingly taking lesser and lesser positions because the demand for those dream tenure track jobs far outpaces the supply.

Landing a job, especially a good one (the tenure track and get-to-do-some-research-and-not-just-teach kind) is now determined by one factor and one factor only: publications.5 Publish or perish is not a joke, it is the law of the land. Most important is your past publications count and the likelihood that you will be as productive if not more so in the future. Those are also second and third most important. Fourth most important is the prestige of the journals where you get published. Are you publishing in Psych Science or in Frontiers? Makes some difference. I don’t know if anyone actually reads your papers or whether the quality of your writing, methodology, or, you know, science, factors heavily.

Here is a list of things that, again, with possibly very few exceptions, mean absolutely nothing for your prospects at getting a job:

  1. You champion open science.
  2. You write blog posts about experiments that haven’t worked out, or interesting statistical issues or practices.
  3. You contribute to open source projects, or statistical or data visualization packages.
  4. You are an active mentor and are generous with your time with students or peers.

If there isn’t a publication coming out of it, it doesn’t matter.

It’s not hard to see what a system like that does to the quality of the scientific process. Science is misunderstood by many. Science is not certain, experiments require care, results are not guaranteed, and theories are sooner or later wrong. It takes time to do something right, and you can still end up with a null result; and no amount of hard work could have made a positive finding more likely. But you want a job, so you will do everything you can to end up with a positive result anyway. You will choose topics that are in fashion, you will try to choose easy experiments that can be published regardless of the result6, and yes, you will operate under the constant pressure to make your t-test or ANOVA give you a p-value less than .05, and you might justify bending the rules of statistics to get it.

The better researchers, the ones who prioritize good theory and well-designed experiments and analyses are at a disadvantage. They will not try to publish shoddy studies and will not squeeze a result where none exists. They will want to run the experiments that will help the field choose between competing theories and make progress instead of running the experiment that will produce the 35th uninformative but curious interaction and publish that instead. And for those ideals, they will pay.

I am not a very social person and I never created a big network of academics when I was a student, and yet I personally know several incredible researchers who ejected out of academia as they saw their academic career prospects shrink and ended up in industry; where they are valued get paid way more for their skills than in the field where they would prefer to be. What a tragedy.


In case you decide to not visit Andrew Gelman’s site – your loss, really – I’ve plucked out an example for you.

The absolute minimum background information you need to know is that psychology, especially social psychology, has been going through a replication crisis in which many popular and thought-to-be-bulletproof findings like ego depletion and power pose do not replicate.

The news and internet have not been kind to psychology during this tumultuous time, nor should they have been.

Susan Fiske, a social psychologist and past president of the Association for Psychological Science, wrote an article titled Mob Rule or Wisdom of Crowds? (PDF download) in which she writes one of the most ill-advised opinion pieces I’ve ever seen by an academic. In it, she – sigh, there is no other way to say this – rants and rails against “online vigilantes” and “self-appointed data police” “volunteering critiques of such personal ferocity and relentless frequency that they resemble a denial-of-service attack that crashed a website by sheer volume of traffic”.

I don’t know what the “website” is supposed to be here, but the idea that academics are suffering a denial of service attack from online critics is laughable. I know of few other institutions that live in their own protected bubble as the psychology department. The whole article is a shame and I am embarrassed on her behalf.

Even though this article was invited by the APS’s Observer, it seems the reaction was so negative that it was never published, and you might find it tricky to find a copy online. The link above gives you the PDF hosted on my own site, plus alternate link 1 and alternate link 2.

The Observer posted an unsurprisingly spineless comment on the issue, including this amazing final paragraph:

Those wishing to share their opinions on this particular matter are invited to submit comments in the space below. Alternatively, letters can be sent to apsobserver@psychologicalscience.org. We ask that your comments include your full name and affiliation.

Yes, please include your affiliation, lest we forget for a moment to judge the value and validity of your comment according to the authority of whether you’re professor or just a normie.

In one of Andrew Gelman’s comments on the issue, aptly titled “What has happened down here is the winds have changed”, he writes:

In her article that was my excuse to write this long post, Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.

I couldn’t have said it better.


Back to the original question (which I will rephrase to maintain the flow of the story I’m telling here): how has studying cognitive psychology changed my view on things?

For one, I find myself in the uncomfortable position of seeming like a contrarian to two opposing groups: authoritarian conspiracy theorists who assume all scientists are malicious liars with an agenda7, and the intellectual, educated but non-scientific class that has confused the platonic concept of “science” with the practice of scientific inquiry, and therefor defends anything with the smell of science as unquestionable and unassailable. It’s much easier, in fact, to deal the first group. It’s the second one that depresses me.

Call them what you like, the intellectuals, the elite, the educated class, whatever, I’m not necessarily a fan of any of those terms. They are relatively affluent, internet-savvy people, who, in their attempt to fight back against anti-reason trends in the West (often socially conservative right-wing groups, although by no means all), have grossly overcorrected and now defend any output of human research as sacrosanct truth beyond reproach.

I won’t mince words, this is worship of authority. People confuse the day to day practice with the scientific method itself, and treat the results of the practice as though it was perfect and the output guaranteed by the sanctity of the lab coat and the scatterplot.

In another response to the Susan Fiske article titled “Weapons of math destruction”, NeuroAnaTody writes:

Science is moving forward so quickly that I don’t even think it’s necessary to point out ways in which the article is wrong. I will instead list a some elements of the scientific revolution that trouble me, even though I consider myself a proud (if quite junior) member of the data police.

  1. Belief in published results. I have so little of it left.
  2. Belief in the role of empirical research. Getting to otherwise hidden truths was our thing, the critical point of departure from philosophy.
  3. Belief in the scientific method. I was taught there is such a thing. Now it seems every subfield would have been better off developing its own methods, fitted to its own questions and data.
  4. Belief in statistics. I was taught this is the way to impartial truths. Now I’m a p-value skeptic.
  5. Belief in the academic system. It incentivizes competition in prolifically creating polished narratives out of messy data.

Emphasis mine, because point 1 is the headline for me. Belief in published results, I have so little of it left. That is how studying cognitive psychology has changed my view on things. Whenever I hear of a study that “showed something”, and especially if it’s in the field of psychology, my assumption is that it’s spurious.

So what am I telling you? Science is permanently broken and we are left rudderless in a sea of claims and counter-claims?

No, that would be confusing the practice of science with the scientific process as it should be, the same mistake I think the study-worshipper makes. The scientific method is still the best way we have to approach the truth about the world, we just need to set up the incentives to encourage following it better.

In the meantime, I think you should be extremely skeptical of everything you hear, which is is an uncomfortable position but is not an option at this point. The “study” goes through many stages on its way from having touched the truth and transformed into data, to your eyes. It has gone through an experimental design (done by a human), data collection (done by a different human), analysis (possibly done by a third different human), write-up (one or more humans), review (I will stop mentioning that things are done by humans), and interpretation by a journalist or reader.

In all of these stages, a human makes a judgement call to the best of their abilities, and as all other humans, they operate under pressures and incentives. Speaking of incentives, I am also telling you that academia is not the world of enlightened philosopher kings and queens operating outside the realm of dirty wants and desires the rest of us live in. Academics operate within a terrible, broken system of incentives, and you must keep that in mind whenever you’re consuming their research.

The other message I want to leave you with is that academia is broken and I don’t see it being fixed any time soon. It won’t be fixed until academics are evaluated based on more than their number of publications. It won’t be fixed until hiring committees stop looking at how many papers you’ve published and start looking at the quality of your contribution to knowledge. Yes, it’s much harder to decide whether you created knowledge and contributed to theory than it is to look at your impact factor, but that’s what has to happen. That’s it.

  1. Someone else pointed out that it would be difficult for me to answer that question because I didn’t know an alternative life where I didn’t study cognitive science and what my view of humanity would be in that world. But the question is still valid because I can compare to before I started studying cognitive science, or reflect on how my views changed during the study. 

  2. My personal academic experience is in psychology, but the points I make in this post generalize to all disciplines as far as I know. 

  3. Many. The ratio is quite bad. You shouldn’t necessarily trust the numbers from any of those articles, but you can conclude that the picture is bleak for anyone who is in the crème de la crème of their field, and hopeless for everyone else. 

  4. A Post Doc is not a student. They are an employee who is often an independent researcher in a lab and get paid way too little. 

  5. There might be exceptions to this, but they are exceptional exceptions. 

  6. I have personally received this piece of advice, explicitly, more times than I can count. 

  7. I think scientists do have an agenda that we ought to acknowledge more. They are, after all, people. 

7252031a80a7ba1c329fbf1b108bd13853ab56fe
Recently: New World

Recently I changed jobs, cities, and countries. Everything is new.

Code

I write a lot more lines of code per day than I used to, and it’s never been less public. That’s one of the many new things I’m getting used to. I’m having a lot of fun, and I’m trying to figure out how to bring some of that back here.

Reading

Articles

Books

e0b9c2ea7ae8340f472ae59cc582f4322bae4404
'Take no one's word for it' in 2016

“has it been a year already!?”

I hear James Stacey say into my ears as I stare at this draft wondering how to start.1

Unlike with most previous years, I did not have this experience with 2016. This has been a good year for me; I did a lot, and made plenty of progress personally and professionally. My subjective experience is not that it went by too fast, but that the passage of time feels just right.

The knock on new year’s resolutions is that they encourage you to wait until a seemingly arbitrary moment in time before you make a big change or do something to make your life better. Another knock is that this encourages you to attempt large changes instead of piecemeal changes, which increases the amount of discipline required for success, and therefore increases the chances of failure. Larger changes would happen less frequently, and that makes error-correction harder.

I think there is truth in there, but as with a lot of things people criticize today, the criticism loses a lot of nuance or selectivity and becomes absolute. You shouldn’t wait until new year’s to make your life better, but setting checkpoints for retrospectives and projections at regular intervals is useful. New year’s is arbitrary, but no more arbitrary than any other time or date if you don’t have better reasons for them. Just make sure you’re not using it as an excuse to procrastinate.

Personally I think an annual cycle is too infrequent for most stock-taking and revising stops. You can start a cycle on January 1st, but make it triannual or quarterly. Or choose your own date if January 1st is too problematic for you.

Last year I said I wanted to write more, and that’s the closest I’ve come to making a “resolution”. I like writing for what it helps me learn and get better at, including writing itself, and quantity should only increase when it’s a mean, not an end.

I’m happy with how 2016 turned out for Take no one’s word for it, and am tickled pink to share the visualizations for the year.

Posts by month and year

Total posts by year

Words by month and year

Total words by year

Future

I’m moving to a new country and starting in a new research scientist role in 2017, and one way or another I think that will affect my writing here. What I hope will happen is that I’ll be able to write more about science and data as I learn more things faster in my new position.

I’m excited.

  1. I don’t usually listen to podcasts when I write, but I wanted to get myself into a certain mindset. 

1e60a8c9f421c188563c9b785cff6c02a224074f
tmux workspace scripts

tmux describes itself as a “terminal multiplexer”. Pleasantly, it goes on to explain what that means:

It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal. And do a lot more.

The way I would describe it is that tmux runs terminal sessions independently of the terminal window you’re viewing those sessions in. This means that you can do some work in a tmux session, close the terminal window, or “detach” from the tmux window, and later reattach to the tmux session and find your work, tmux windows, and tmux panes exactly as you left them.

tmux windows and panes1 are the other features I really appreciate about tmux, in addition to the great ability to detach and close terminal windows without killing the work or processes running in the tmux session. Each pane within each window is a separate shell session.

I’ve been using tmux for a few years (I think), but until recently, my use had reached a plateau: I would manually start a tmux session, and start creating windows and manually splitting them into panes as I need for my work. When done, I would, inefficiently, start entering a bunch of exit commands to close all the panes one by one, until closing the last one kills the tmux session.

I was setting up a complicated workspace for simplestatistics when I thought to look into the possibility of writing a script that I could run to set up all the windows and panes I need. Unsurprisingly, it is possible, and great.

This is the finished simplestatistics tmux workspace script in its current form. You can find an up-to-date version of it here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# !/usr/local/bin/fish

# detach from a tmux session if in one
tmux detach > /dev/null ^ /dev/null

# don't set up the workspace if there's already a simplestatistics session running
if tmux list-sessions -F "#{session\_name}" | grep -q "simplestatistics";
	echo "simplestatistics session already running"
else
# okay no simplestatistics session is running

cd ~/projects/simplestatistics
tmux new -d -s simplestatistics

# window 0 - main
tmux rename-window main

# set up window 1 - documentation
# - index.rst
# - README.md
# - __init__.py
# fourth empty pane
tmux new-window -n documentation

tmux split-window -h -p 45
tmux select-pane -t 0
tmux split-window -v
tmux select-pane -t 0
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/" C-m
tmux send-keys "vim __init__.py" C-m

tmux select-pane -t 1
tmux send-keys "cd ~/projects/simplestatistics/" C-m
tmux send-keys "vim README.md" C-m

tmux select-pane -t 2
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/" C-m
tmux send-keys "vim index.rst" C-m
tmux split-window -v

# set up window 2 - changelogs
tmux new-window -n changelogs
tmux send-keys "cd ~/projects/simplestatistics/" C-m
tmux send-keys "vim changelog.txt" C-m

tmux split-window -h
tmux send-keys "cd ~/projects/simplestatistics/" C-m
tmux send-keys "vim HISTORY.rst" C-m

# back to window 0 - main
# 2 vertical panes: both will be used to edit main statistics functions
tmux select-window -t 0
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/statistics" C-m
tmux send-keys "ls" C-m
tmux split-window -h
tmux send-keys "cd ~/projects/simplestatistics/simplestatistics/statistics" C-m

tmux select-pane -t 0
tmux split-window -v
tmux send-keys "cd ~/projects/simplestatistics" C-m
tmux send-keys "bpython" C-m
tmux select-pane -t 0

tmux attach-session -t simplestatistics
end

If you attempt to start a session within a session, tmux warns you that sessions should be nested with care, and nesting sessions is not something I want to do anyway, but I want the ability to start session Y and attach to it while in session X. So lines 3 ➝ 4 attempt to detach from a tmux session, and sends normal and error output to /dev/null. If I’m attached, it detaches me before creating the session, and if I’m not, it fails silently.

Lines 6 ➝ 9 check to see if there’s already a running session named simplestatistics and stop execution with a message that reads "simplestatistics session already running" if it does find it.

Lines 12 ➝ 65 do the work of creating the workspace, which is made up of three windows.

window 1 - documentation

The second window (tmux windows are zero-indexed) contains the panes I use to edit and generate documentation for simplestatistics. The right pane is created with 45% of the window width.

Clockwise from top left:

  • __init__.py To add the new function I’m working on.
  • index.rst The main documentation page for Sphinx.
  • README.md
  • A shell for generating documentation.

window 2 - changelogs

Opens two versions of the changelogs in vim:

  • changelog.txt A Markdown-based changelog for all reasonable persons and machines.
  • HISTORY.rst A restructured version for PyPi.

window 0 - main editing

The layout is a bit unusual. The top left and entire right are listings of the directory that contains the function files. I use the big right pane to work on the new function, and the left one for general shell work and references.

The bottom left pane runs bpython for interactive testing.

Closing notes

If you work in the terminal and don’t use tmux, consider using it. It’s so nice to have several workspaces that never die until you kill them. If you do use tmux and often end up with complicated workspaces, consider scripting them!

  1. The terminology here is confusing: windows are actually tabs, their names appear at the bottom of the window, and they contain panes arranged in different layouts. It would make more sense to rename windows ➝ tabs, and rename panes ➝ windows. 

74a29356d1ec438ddeb8872d79cb1b50da030085
Sanitizing dirty Medium links on Pinboard with R

I’ve been on a Pinboard API roll lately. In hindsight it’s not surprising since I use Pinboard so much. Today’s post is another one in which I use R and the Pinboard API to fix a wrong in the world.

Problem

Have you ever noticed those Medium post links? Here’s an example:

https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43#.ncvbvfg3r

See that #.ncvbvfg3r tacked on the end? I noticed it a while ago, and I’m not the only one. That appendage tracks referrals, and I can imagine it allows Medium to build quite the social graph. I don’t like it for two reasons:

  1. Hey buddy? Don’t track me.
  2. It makes it difficult to know if you’ve already bookmarked a post because it’s likely that if you come across the post again, its url is not the same as the one you already saved. When you try to save it to your Pinboard account, it won’t warn you that you already saved it in the past.

You can find a discussion about this on the Pinboard Google Group.

Maciej Cegłowski, creator of Pinboard, was reassuringly himself about the issue:

I think the best thing in this situation is for Medium to die.

Should that happen I will shed few tears. I don’t want Medium to die, but they need to get better. In the meantime, they exist and I have to fix things on my end.

(½) Solution

I wrote a script that downloads all my Pinboard links, and removes that hash appendage before saving them back to my Pinboard account.

This is half a solution because it only solves reason 1, the tracking. Each time I visit or share a sanitized link, a new appendage will be generated, breaking its connection to how I came across the link in the first place.

It doesn’t solve reason 2 – if I had already saved a link to my Pinboard account, and then come across it again and try to save it, having forgotten that I already did so in the past, Pinboard won’t match the urls since the one it has is sanitized. Unless Maciej decides to implement a Medium-specific feature to strip those tracking tokens, there’s not much I can do about that.

First, let’s load some libraries and get our Pinboard links.

library(httr)
library(magrittr)
library(jsonlite)
library(stringr)

# My API token is saved in an environment file
pinsecret <- Sys.getenv('pin_token')

# GET all my links in JSON
pins_all <- GET('https://api.pinboard.in/v1/posts/all',
                query = list(auth_token = pinsecret,
                             format = 'json'))

pins <- pins_all %>% content() %>% fromJSON()

I load my API token from my .Renviron file, use the GET() function from the httr package to sent the GET request for all my links in JSON format, and then convert the returned data into a data frame by using content() from httr and piping the output to the fromJSON() function from jsonlite package.

Let’s examine the pins dataframe:

pins %>% 
    select(href, time) %>% 
    head() %>%  
    knitr::kable()

Which gives us:

href time
https://twitter.com/Samueltadros/status/800208013709688832 2016-11-20T14:23:11Z
http://gizmodo.com/authorities-just-shut-down-what-cd-the-best-music-torr-1789113647 2016-11-19T15:21:06Z
http://www.theverge.com/2016/11/17/13669832/what-cd-music-torrent-website-shut-down 2016-11-19T15:18:33Z
http://www.rollingstone.com/music/news/torrent-site-whatcd-shuts-down-destroys-user-data-w451239 2016-11-19T15:16:16Z
https://twitter.com/whatcd/status/799751019294965760 2016-11-18T23:56:23Z
https://twitter.com/sheriferson/status/799761561149722624/photo/1 2016-11-18T23:49:49Z

Let me break down that last command:

  • Start with pins dataframe.
  • Pipe that into select(), selecting the “href” and “time” columns.
  • Pipe the output into head() which selects the top (latest, in this case) 5 rows.
  • Pipe the output into kable() function from the knitr package, which converts the dataframe into a Markdown table.

That last part is very handy.

Now we have all our links, let’s select the ones for Medium links.

medium <- pins %>%
    filter(str_detect(href, 'medium.com'))

Again, let’s break it down

  • Store into medium the output of…
  • Piping pins into the filter() function from dplyr package.
  • Piping the output of that into filter() function, which is using str_detect() from the stringr package to search for “medium.com” in the “href” column.

Checking the medium dataframe shows…

href time
https://medium.com/something-learned/not-imposter-syndrome-621898bdabb2 2016-10-25T18:50:36Z
https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43#.ncvbvfg3r 2016-10-11T06:15:48Z
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.by7z0gq33 2016-10-02T01:07:24Z
https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db#.nmev5f200 2016-09-19T23:35:16Z
https://medium.com/@akelleh/a-technical-primer-on-causality-181db2575e41 2016-09-07T16:30:57Z

Now, this looks like it worked, but I’m paranoid. It’s possible that the filtering caught links that have domains that end with “medium.com” but are not Medium links.

I want to be more careful, so I’ll use a function that I used before to extract the hostname from links.

get_hostname <- function(href) {
  tryCatch({
    parsed_url <- parse_url(href)
    if (!parsed_url$hostname %>% is.null()) {
      hostname <- parsed_url$hostname %>% 
        gsub('^www.', '', ., perl = T)
      return(hostname)  
    } else {
      return('unresolved')
    }
    
  }, error = function(e) {
    return('unresolved')
  })
}

pins$hostname <- map_chr(pins$href, .f = get_hostname)

medium <- pins %>%
    filter(hostname == 'medium.com')

This is dataframe of Medium links that I am more confident about.1

Now! Let’s remove that gunk.

medium$cleanhref <- sub("#\\..{9}$", "", medium$href)

That’s all. A quick regex substitution to remove the trailing hash garbage.

Old links Clean links
https://medium.com/something-learned/not-imposter-syndrome-621898bdabb2 https://medium.com/something-learned/not-imposter-syndrome-621898bdabb2
https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43#.ncvbvfg3r https://medium.com/@timmywil/sign-your-commits-on-github-with-gpg-566f07762a43
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.by7z0gq33 https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471
https://medium.com/@joshuatauberer/civic-techs-act-iii-is-beginning-4df5d1720468 https://medium.com/@joshuatauberer/civic-techs-act-iii-is-beginning-4df5d1720468
https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db#.nmev5f200 https://medium.com/@schtoeffel/you-don-t-need-more-than-one-cursor-in-vim-2c44117d51db
https://medium.com/@ESAJustinA/ant-to-advance-data-equality-in-america-join-us-were-hiring-developers-and-data-scientists-147f1bfedcb5#.mh8dpuqz9 https://medium.com/@ESAJustinA/ant-to-advance-data-equality-in-america-join-us-were-hiring-developers-and-data-scientists-147f1bfedcb5

Now we need to put this data back into the I N T E R N E T.

As far as I can tell reading the Pinboard API2, there’s no way to update a bookmark in-place with a new url. The best way to do this is to delete the old bookmarks and add the new ones with the tags, shared, to-read status, and date-time information of the old ones.

This is the dangerous part. I want to be as careful as possible. I want to store the https responses for each deletion and addition, and just so I don’t anger the rate-limiting gods, I will inject a 5 second delay between requests. 5 seconds is probably overkill, but this isn’t production code, it’s a personal thing and I don’t mind waiting.

medium$addition_response <- vector(length = nrow(medium))
medium$deletion_response <- vector(length = nrow(medium))

for (ii in 1:nrow(medium)) {
    deletion <- GET('https://api.pinboard.in/v1/posts/delete',
                    query = list(auth_token = pinsecret,
                                 url = medium$href[ii]))
    
    medium$deletion_response[ii] <- deletion$status_code
    
    addition <- GET('https://api.pinboard.in/v1/posts/add',
                    query = list(auth_token = pinsecret,
                                 url = medium$cleanhref[ii],
                                 description = medium$description[ii],
                                 extended = medium$extended[ii],
                                 tags = medium$tags[ii],
                                 dt = medium$time[ii],
                                 shared = medium$shared[ii],
                                 toread = medium$toread[ii]))
    
    medium$addition_response[ii] <- addition$status_code
    
    Sys.sleep(5)
}

A quick inspection of the deletion and addition response codes reveals nothing but sweet, sweet 200s. A quick inspection of the Medium links on my Pinboard account reveals clean, shiny, spring-scented urls.

The full code is available as a gist here and embedded below:

  1. The dataframe created using the hostname extraction function has the same number of rows as the one created with a simple grep of “medium.com”, which means it probably wouldn’t have been a problem to stick with the earlier solution. The second solution is still a lot better. 

  2. … which is a link that must be a record-holding in the number of times I’ve linked to it from this site. 

0ded8ca060c80472124c6a6eb18ec0881ea8d917
Solving my read later problem

Attention conservation notice: A post on writing a small technical hack to improve what ideally I could do without needing a hack.

I do most of my learning by reading articles, guides, and blog posts online, and I manage this using Pinboard.1 All the links I’ve read or want to read in the future live there.

Problem

My read later list was growing a lot faster than I could go through it.

I rarely felt like going to my account to choose an article to read. When I did, I faced choice paralysis. I would scan the links and not feel like starting any of them. The problem was static friction.

One way I tried to solve this was to use Pinboard’s “random article” bookmarklet, which opened a randomly chosen unread link from your account. This worked to an extent, but I would sometimes land at an article that needed more time or attention than I had and I would click the bookmarklet again. Once you start making exceptions and spinning again, it becomes easy to do what is effectively scanning many articles before actually reading one.

I realized what I wanted was somewhere in between: I want to see some options that were randomly chosen.

Solution

My solution is punread which is built on top of BitBar.

BitBar (by Mat Ryer - @matryer) lets you put the output from any script/program in your Mac OS X Menu Bar.

Go to the link to see some screenshots and examples. The idea is that you write a script that produces an output and tell BitBar how often you want it run. There’s a lot of syntax available for you to control the output, how it looks, what happens when you click it, etc.

punread shows the number of unread bookmarks in my menu bar, and when I click on the number, I see 30 randomly chosen links. I can click on one, read it in the browser, and then mark it as read using another one of Pinboard’s bookmarklets.

punread is two files, the first is punread.30m.sh, which is the shell script BitBar wants to have:

#!/bin/bash
# <bitbar.title>punread</bitbar.title>
# <bitbar.version>v1.0</bitbar.version>
# <bitbar.author>Sherif Soliman</bitbar.author>
# <bitbar.author.github>sheriferson</bitbar.author.github>
# <bitbar.desc>Show pinboard unread count</bitbar.desc>
# <bitbar.dependencies>python</bitbar.dependencies>
# <bitbar.abouturl>https://github.com/sheriferson/punread</bitbar.abouturl>

links=$(/usr/local/bin/python3 /Users/sherif/projects/punread/punread.py)
echo "$links"

echo "---"
echo "📌 Random article | href=https://pinboard.in/random/?type=unread"

It doesn’t do much. It runs the second file, punread.py and shows its output. It also tacks on a final menu item that will show me a random unread article in case I didn’t like any of the 30 already listed. I don’t think I’ve ever used that option.

The second file is punread.py, which does most of the work. It talks to the Pinboard API, saves some state, and returns the 30 links for BitBar to display.

import json
import os.path
import pickle
import random
import re
import requests
import sys
import time

# get the path to punread.py
pathToMe = os.path.realpath(__file__)
pathToMe = os.path.split(pathToMe)[0]

last_updated_path = os.path.join(pathToMe, 'lastupdated.timestamp')
unread_count_path = os.path.join(pathToMe, 'unread.count')
links_path = os.path.join(pathToMe, 'links')
api_token_path = os.path.join(pathToMe, 'api_token')
last_run_path = os.path.join(pathToMe, 'lastrun.timestamp')

backup_file = '/Users/sherif/persanalytics/data/unread_pinboard_counts.csv'

def print_random_unread_links(count, unread, n = 30):
    count = str(count) + ' | font=SourceSansPro-Regular color=cadetblue\n---\n'
    sys.stdout.buffer.write(count.encode('utf-8'))
    random_unread_indexes = random.sample(range(1, len(unread)), 30)
    for ii in random_unread_indexes:
        description = unread[ii]['description']
        description = description.replace("|", "|")
        link_entry = '📍 ' + description + " | href=" + unread[ii]['href'] + " font=SourceSansPro-Regular color=cadetblue\n"
        sys.stdout.buffer.write(link_entry.encode('utf-8'))

def log_counts(total_count, unread_count):
   """
   A function to write the time, total bookmark count, and unread bookmark count
   to a csv file.
   """
   now = int(time.time()) 
   row = str(now) + ',' + str(total_count) + ',' + str(unread_count) + '\n'

   with open(backup_file, 'a') as bfile:
       bfile.write(row)

# check if there's a lastrun.timestamp, and if it's there
# check if the script ran less than 5 mins ago
# if yes, quit
if os.path.isfile(last_run_path):
    last_run = pickle.load(open(last_run_path, 'rb'))
    if time.time() - last_run < 300:
        unread_count = pickle.load(open(unread_count_path, 'rb'))
        links = pickle.load(open(links_path, 'rb'))
        unread = [link for link in links if (link['toread'] == 'yes')]
        print_random_unread_links(unread_count, unread)
        exit()
    else:
        pickle.dump(time.time(), open(last_run_path, 'wb'))
else:
    pickle.dump(time.time(), open(last_run_path, 'wb'))

with open(api_token_path, 'rb') as f:
    pintoken = f.read().strip()

par = {'auth_token': pintoken, 'format': 'json'}

if os.path.isfile(last_updated_path) and os.path.isfile(unread_count_path):
    last_updated = pickle.load(open(last_updated_path, 'rb'))
    unread_count = pickle.load(open(unread_count_path, 'rb'))
    links = pickle.load(open(links_path, 'rb'))
else:
    last_updated = ''
    unread_count = 0

last_updated_api_request = requests.get('https://api.pinboard.in/v1/posts/update',
        params = par)

last_updated_api = last_updated_api_request.json()['update_time']

if last_updated != last_updated_api:
    r = requests.get('https://api.pinboard.in/v1/posts/all',
            params = par)

    links = json.loads(r.text)

    unread = [link for link in links if (link['toread'] == 'yes')]
    total_count = len(links)
    unread_count = len(unread)

    pickle.dump(last_updated_api, open(last_updated_path, 'wb'))
    pickle.dump(unread_count, open(unread_count_path, 'wb'))
    pickle.dump(links, open(links_path, 'wb'))

    log_counts(total_count, unread_count)
    print_random_unread_links(unread_count, unread)
else:
    unread = [link for link in links if (link['toread'] == 'yes')]
    print_random_unread_links(unread_count, unread)

There are too many lines of code for me to walk through this step by step, but I’ll paint a general picture.

Some notes and things I had to keep in mind while writing the script:

  • The Pinboard API has rate limits. I can’t hit the posts/all method more than once every five minutes.
  • Pinboard recommends you use the API token to authenticate, rather than regular HTTP auth. I keep my API token in a file that I added to .gitignore so I don’t accidentally publish it somewhere.
  • I wanted to keep track of the total number of bookmarks and unread bookmarks over time (see below).
  • I wanted to minimize the number of times I used the posts/all method. The Pinboard API makes this easy: the posts/update returns the timestamp of the last update to any of your bookmarks. My script saves the last value returned by this method, and if the next time it runs it gets the same value, it never tries to use posts/all.
  • The thing I struggled with, by far, was string output. If you see some ‘squirrely’ things like sys.stdout.buffer.write(count.encode('utf-8')) and wonder why I don’t just print(), it’s because I ran into a lot of trouble with Python3’s string encoding and BitBar’s understanding or lack thereof of what I was giving it. It took me a long time to arrive at this solution.
  • You might also notice description = description.replace("|", "|"). The pipe character is the one character I had to avoid in my output, as it has special meaning to Unix and BitBar. The code is replacing the classic pipe character “|” with what is officially called “FULLWIDTH VERTICAL LINE”.2 It maintains the appearance of pipes in article titles without tripping BitBar up.

Results

This was a fun project, and I think it achieved what I wanted from it. I’ve put a serious dent into the number of unread links since I started using punread.

I’m not a big fan of seeing a lot of metrics. I disable most red iOS notification bubbles. But the reason I do that is exactly why I think punread works for me: I haven’t trained myself to see and ignore a lot of numbers. I see punread’s unread count in the menu bar, and I stick to a plan of not letting it climb a lot over time.

This wouldn’t be a Take no one’s word for it post if it didn’t have a plot or two.

and a zeroed out y-axis for the fundamentalists

The rapid buildup of unread links led me to raise my threshold of what’s good or relevant enough for me to read, and I’ve been deleting any articles that failed to reach that threshold. We can see that in this plot which marks deletions with red points and corresponding labels.

I couldn’t get the text labels to work without it being a mess, so here’s a version without the labels.

and the useless zeroed y-axis version

I know the plots are not beautiful.

Each red point is a measurement that was lower than the one before it, with a total of 110 deleted articles. This way of measurement can miss some deletions if between time t and time t+1 I deleted an article and added a new article; in that instance the measurement would not register a change. I’m aware of at least one case of that happening. It doesn’t make a big difference, but it’s good to be aware of when your measurement has faults or blind spots.

I’m sure that in addition to punread helping me, I was also motivated by the idea of using software that I wrote for myself, and by wanting to see that number and line plot go down. Regardless of how the variables interact to produce the final result, I declare it a success.

  1. “a bookmarking website for introverted people in a hurry” 

  2. Unicode: U+FF5C, UTF-8: EF BD 9C 

d8b6727e6ec5b77a354be4f79d50032894adeb6d
Archive