R in a big data pipeline
Download slides |
luigi
into a heterogeneous workflow of different applications. This is especially useful when R needs to be integrated with hadoop/hdfs based technologies, such as Spark and Hive. Luigi is not unlike Make, which Kirill presented at our last meeting in June. In a configuration file Yuki specified the various workflow steps and dependencies between the jobs.Kicking off the luigi script starts the workflow, and
luigid
server allows Yuki to monitor the various parts of the dependency graph visually. Thus, he can see the progress of his workflow in real time and identify quickly, when and where a sub process fails. As Yuki pointed out, this becomes critical in production systems, where failures need to be known and fixed quickly, unlike when ones carries out an explorative analysis in a development/research environment. See also Yuki's blog post for further details.Shiny + Shinyjs
Download presentation files |
shinyjs
, a package written by Dean Attali. The name suggests already that the package provides additional JavaScript functionality. Indeed, it does, but without the need to learn JavaScript, as those functions are wrapped into R. Paul showed us an example of a shinyapp that depending on the user plotted a different graph. Behind the scene his script would either hide or shows those plots, conditioned on the user. With only a few lines in R it allowed him to develop a user specific application. To achieve this he created a login screen that checks for user name and password. In his example he had hard coded the login credentials, instead of using a secure connection via a professional shiny server instance. However this was sufficient for his purpose, where he tests how students react to different economic scenarios in a lab environment at university.
Experience vs. Data
Download slides |
I presented some Bayesian ideas to analyse risks with little data. I used the wonderful "Hit and run accident" example from Daniel Kahneman's book Thinking, fast and slow to explain Bayes' formula, introduced Bayesian belief networks for a claims analysis and discussed the challenge of predicting events when they haven't happened yet (also in Stan). Along the way I mentioned a few ideas on communicating risk, which I learned from David Spiegelhalter earlier this year.
Next Kölner R meeting
The next meeting will be scheduled in December. Details will be published on our Meetup site. Thanks again to Revolution Analytics/Microsoft for their sponsorship.Please get in touch, if you would like to present at the next meeting.
0 Response to "Notes from the Kölner R meeting, 18 September 2015"
Post a Comment