I was working on a project yesterday where I needed to amortize out a bunch of loans to calculate the total interest a borrower would pay if he or she paid the minimum monthly payment for the full term of the loan. I couldn’t find any package in R that already contained the necessary math, so I looked around and found this post as well as this one. They both presented the R code to do the basic math involved in amortization, but each function was built to handle only one loan at a time. I had well over 100,000 loans I needed to go through, and loops aren’t all that efficiently implemented in R.
I have mixed feelings about my Ph.D. in anthropology. I often suspect I wasted a lot of time and money getting that degree. I studiously avoid using most of the theory I picked up in my formal education, I use methods that many (of course not all) anthropologists seem to view as quite un-anthropological, and anthropology as an academic discipline sometimes seems bent on making a poor name for itself among the general population.
Reading a news article about politicians mocking the idea that anthropological training is an employable skill, my initial reaction was “well, take out the snotty attitude and they may have some fair points.” Most of the reactions I saw among other anthropologists, however, seemed to mirror this response (from a LinkedIn thread about the article):
I wrote a couple days about about importing Excel files into R. There are lots of ways to do this, but all the ways that use only R have drawbacks (as I outlined in my last post), and all the other ways require installation of programs other than R. I’m not opposed to using programs other than R – it’s easy enough to weave, for example, Python and R code into each other. But I’d become curious about the possibility of solving this problem without the need for added programs, so I did some more searching. Turns out you can import an .xlsx document into pretty much anything that can parse XML, because that’s all an .xlsx document is.
I’ve realized recently how the last few years have changed the way I think about my work. This post is an attempt to put that thinking into writing.
I left grad school feeling I wanted to do more “applied” work than what academia usually offers, but I still assumed that application was a matter of doing an analysis and then letting people who make decisions consume and implement the lessons of that analysis. I created a lot of those for-application sorts of analyses for the U.S. Department of the Army, but left feeling I wanted to be part of the decision making process rather than just producing fodder for it. My current employer gave me the opportunity to work interactively with decision makers to clarify their goals and adapt my analyses to their needs, and also to be somewhat involved in the implementation side of things. So my career has followed a path of closer and closer integration of my analytic work with the decisions and implementation that my work is supposed to facilitate. I think that process has helped me better define how I think about “application.”
I suppose most companies use the Microsoft Office suite of programs, and my office is no exception. It easy to import data from an API or a database into R, but importing data from an Excel workbook is a different story. There are a few R packages for reading Excel files, but I’ve had problems with all of them:
- `read.xlsx` (`gdata` package): pretty convenient to run in R, but requires Perl which for some reason I have a hard time installing on my Windows machine…that might just be an issue with me, not the machine.
- `odbcConnectExcel2007` (`RODBC` package): from what I’ve seen on the listservs, this one has a hard time reading xlsx files because of a driver mismatch – you have to access the files through 32-bit R, which is annoying.
- `readWorksheetFromFile` (`XLConnect` package): uses Java, easy to install, and has tons of functionality to write in addition to read, but I don’t really need the write functionality and for large files especially XLConnect is very slow.
So I set off in search of a faster way to pull information out of an Excel file. The gist below shows what I came up with. Excel already has Visual Basic capabilities built in. So I stole a little VB script from here and stuck it in a function that writes the script to a temporary file, calls the script from the command line, and then outputs the contents of the formerly-Excel file.
Go here to see the interactive maps and indices.
According to the World Tourism Organization, there were over 1 billion international tourists in 2012. Many, but not all, of those travellers began their trips by acquiring a tourist visa. So to a degree, tourist visas play an important role in determining where you can and cannot easily travel around the world. About a month ago, a friend and classmate of mine was planning to go on a Kennedy School trip to Israel and Palestine. She got her tickets and applied for her visa to Israel and was all set to go. But when she arrived at Logan Airport and tried to check in to her flight, she was told she wouldn’t be allowed on the flight. The flight was connecting through Canada, and she’s a citizen of India and didn’t have a tourist visa for Canada. Who knew you couldn’t land at a Canadian airport without a visa if you have an Indian passport!?
The privilege to travel internationally is an awesome one, and one that can be easy for citizens of wealthier countries to take for granted. So we decided to see if we could find some data to explore differences in requirements for tourist visas across countries. There’s an index of economic freedom, an index of corruption perceptions, and lots of others as well, so why not an index of tourist inequality?
You can see the interactive maps and indices here.
Stories of recent fraudulent science seem uncomfortably common. In many of those cases the scientists are blamed, and rightly so. Sometimes criticisms identify more systemic problems like current scientific practice, or scientific institutions like the NSF or a university, or academia in general. Blame is also often laid on pop science and the popular science writers who try to tell a counterintuitive and interesting story, or who are under pressure to write frequently and under a deadline.
All of these are valid targets of criticism. But not enough attention is paid to the scientifically-inclined and interested public who are supposedly the victims of the fraudulent findings and stories. Given the incredible specialization of contemporary science, everyone is an untrained and naive reader in some, if not most, areas of science. But despite technical ignorance, the activity of reading about science can be done usefully and well, and doing it well means maintaining a general attitude of scientific doubt.
If the talk about a shortage of faculty positions is dispiriting, articles like this are energizing. Data science has emerged as a hopeful and interesting alternative to academic social science. But one of the biggest drawbacks has to be that many data science positions are shaped so exclusively by computer science, engineering, or some other area of science that isn’t primarily social. Those areas of work are great, integral and critical, but the result of the skew is that descriptions of “data science” can lose sight of the real human behavior and social phenomena behind the data being analyzed.
I finally had the chance to catch up on my reading this morning, and at the top of the list was this “We Aren’t the World” article. As Schaun pointed out in his last post, the basic narrative behind the piece (and a lot of the discussion around Henrich’s work) is that science is moving away from the view that humans have more or less universal cognitive faculties. This old view assumed everyone would respond similarly to basic stimuli. But then Henrich and others came along and showed that people respond differently to those stimuli. So now we know that cognition itself is shaped by “culture, environment, etc.”