Thinking. Really. Hard.

Blogging about my research and others', with the occassional tutorial or technology-inspired rant.

Everyone should have an elevator pitch about their research. When we start out, that pitch is very focused on a particular project, problem, or technique. As time passes, we form a more expansive research vision that can encompass new domains, related problems, and complementary techniques, but still ties everything together in a coherent philosophy, with a long-term goal. My pitch is: programming languages (PL) and software engineering (SE) form the methodological foundation for next-generation advancements in data-driven scientific inquiry. Just as scientific instruments made new experiments and discoveries possible, so too will new programming languages, systems, frameworks, and platforms.

Therefore, my perspective is to work on the fundamental tooling that allows us to answer a variety of research questions. I’m guided by the following questions:1

Abstractions

Throughout my research career, I have been interested in developing domain-specific languages (DSLs) for tasks when appropriate. Some common DSLs that people may know are SQL, CSS, and even spreadsheets like Excel! What differentiates DSLs from so-called “general-purpose programming languages” is that their design is tailored to a specific task: they may certain things much easier to do than others. We refer to the set of things that are easy to do as the base set of abstractions, and the language restricts or enables how those abstractions may combine.

A critical feature of language design is recognizing whether it is germane to the problem in the first place! Not all problems or domains necessitate a new language. Instead, in my research, I work to understand the problem or domain first, and then identify and encode (typically in consultation with a domain expert) language abstractions and rules for combining those abstractions. Sometimes this process of formalization can reveal interesting properties about the domain not previously considered. Sometimes these properties are a byproduct of moving a perviously manual or offline task to an encoded, automated one.

Identifying Abstractions: SurveyMan columns

For an example of this approach, see my SurveyMan work.

While the SurveyMan project has been on hold for several years, I am looking to work with students on some related work/research questions in both programming languages (specifically related to blocks languages) and computational social science (specifically related to crowdsourcing). I am particularly interested in students who would like to address security questions (related to adversarial behavior during data collection) and privacy questions (related to protecting participants’ information). Please reach out if you are interested!

Back

Language Design and System Building

Once we have identified the core set of abstractions and how they combine, my work usually involves implementing them in an actual system! This step of the research often involves some iteration with the abstraction design process, and is coding-heavy. One of the concrete benefits to students who work with me is that they have the opportunity to hone their programming skills while also doing research.2 For larger systems, the software architecture can become a research contribution, especially if that software architecture generalizes to related problems.

Generalizing Software Architecture for Explanation This is an example of ongoing work on the software architecture for explanation systems for autonomous agents.

This is ongoing on work that includes opportunities for students interested in probabilistic programming languages. Please reach out if you are interested!

Back

Correctness

The advantage of encoding abstractions is that they facilitate proving properties about programs. We can use the base abstractions and the rules for how they combine to show inductively that properties of interest hold. Traditionally this has meant devising a type system, which categorizes pieces of programs into sets, and showing that the system is sound, according to some predefined notion of “soundness.” Students familiar with types in programming languages may recognize type safety as a kind of soundness, where the procedure that categorizes these pieces of programs never “gets stuck” and always arrives at the same answer.

There are many properties that can be proven about programs, other than type safety. For example, static taint analysis uses information flow to prove that private data does not leak, while type state provides the abstractions necessary to reason about interactions between a program and the persistent mutable state of the executing machine (e.g., a file is properly closed). Both of these examples are about properties not normally associated with type inference, but address the correctness of programs nonetheless.

The PLAID Lab has expertise in proving properties of programs, in the domains of type safety, security, privacy, and fairness. My future collaborations will reflect the expertise present here at UVM.

Back

People-Centric Evaluation

During the design phase, I work with domain experts to understand what tools potential users prefer. While this process of including users in the design loop has been informal, I look forward to working with collaborators and students who are interested in taking a more human-factors approach to the evaluation of new systems. I am particularly interested in better understanding the usability of the language, tools, and systems I design.

Output of the PlanAlyzer System For example, the output of my PlanAlyzer static analysis tool is either the above “pretty-printed” text, or csvs that can be loaded into a database. The above notation uses the conventions of experimentalists, but this output does not scale, and may not be understood by non-experts.

I am particularly interested in collaborating with folks who study, or are interested in studying, how to better present large volumes of information to users, especially when users must make decisions in the presence of these data.

Back

Applications

Finally, since I am interested in designing better platforms for facilitating basic research, I enjoy working with folks who have interest or expertise in specific application areas. In a way, my collaborators could be seen as “clients” of my software, except that there is the goal of discovering novel problems at the intersection of software and their domains, rather than simply delivering solutions. This means that my collaborations are cooperative.

Back

  1. This very question-first approach to research is deeply influenced by one of my PhD supervisors, Dr. David Jensen, who has always been an exemplar of how to view computing as science

  2. Often students expect research to involve more programming; while it can, students may go for long periods of time without practicing. This can become a problem if, years into a PhD program, you realize you need to write software to do your research, but lack the experience or skills to do it yourself.