The Cycle of Deviant Behavior by Howard B. Kaplan and Glen C. Tolle,Jr. published by springer press. 2006

I came across this book on a pertinent topic. The book is an excellent example on how to conduct research studies. It is a longitudinal prospective study. The gold standard of this type of research study. It is not retrospective like so many studies are. The book has a thorough literature review. Well documented handling of missing values. Uses structural equation modeling. The book provides a template for systematic logic of procedures for analysis of bi variate relationships.

]]>**http://shop.oreilly.com/product/9780123970336.do**

Measuring Data Quality for Ongoing Improvement by Laura Sebastian-Coleman.

This the kind of book I like finding and recommending. DQAF, Data Quality Measurement Framework. The book brings quality control to data management. The first section of the book thoroughly covers definitions. It then goes on to explain measure. Continuing on to the how part of data quality. The book is is a useful reference book. I know that I will use it as a reference for upcoming writing and talks.

]]>Handbook of Partial Least Squares, Concepts, Methods and Applications. Edited by V. Esposito Vinzi, W.W. Chin, J Hensley and H. Wang. Published by Springer.

This Handbook is a book of 33 papers selected from three rounds of peer review process. There is a lot of very good material in this book. I wish that I had delved into it earlier

Chapter 28 which is a paper on How to Write Up and Report PLS Partial Least Squares Analyses, discusses Sample size and goes into detail about how and why you can use a smaller sample size with PLS. That alone is enough reason to read this book. Add to it the tables and examples there is enough material to keep me really busy reading the next time I need to do a questionnaire.

]]>

Smoothing Spline ANOVA Models by Chong Gu, published by Springer

I am intrigued by this book. Splines were my favorite thing in graduate school. I have made a lot of ANOVA models. It is fun to see what I tried to do finally achieved.

Smoothing Splines ANOVA Models uses R as the programming language. Great to see in a book are in depth proofs and R code.

Chapter 3.3 shows how to draw Bayesian confidence intervals in R.

Chapter 3.10.1 discusses the difference between natural splines and B splines. That B splines have different boundary conditions.

There is code for doing cubic splines with a jump. Something that you run into with real data.

In Chapter 8.63 about hazard functions and the Weibull family has code for cubic spline Weibull regression with censored and truncated data.

I am enjoying reading this book. The code works and the examples are easy to understand.

Linear Mixed-effects Models Using R by Andrzej Galecki and Tomasz Burzkowski, published by Springer is a book that covers in dept a lot of material on linear models.

The book has clear instructions on how to program in R.

The book in chapter 4 covers model reduction using a null model and alternative model, which are nested models. Model reduction is a topic that needs to be discussed by coders. I have talked with many people who have put everything in a regression model just because they could.

Section 5.2 has the proper form for model formulas

R expression ~ term.1 + term.2 + …+ term.k

It is nice to see this spelled out so clearly.

Chapter 8 shows how to use the nlme package.

Part Three covers Lm’s that allow the relaxing of the assumptions of independence and variance of homogeneity. This a topic that I needed information on.

This is a good reference book.

]]>Machine Learning for Hackers gets you started using R for machine learning. The book does a good job telling you how to install R and where to find help.

All the code and data for this book is on https://github.com/johnmyleswhite/ML_for_Hackers.git

Sadly there is not an R package.

There are lots examples on how to explore data using ggplot2. Other package covered include plyr which they equal to map reduce. tm package which is used in polynomial regression. glmnet and the Lamda function. K-Nearist neighbor algorithm which uses the class package.

Also good information on how to work with api’s and json using RCurl. RJSONIO and igraph.

This book is written for hackers, people who already know how to code. The theory is found in other books. More detail on specific techniques and R code is in other books. This book is a good starting point for machine learning and R.

]]>I took a break from trying to figure how to get the data that goes along with the books that I am reading, to read a Springer Book

Fisher, Neyman, and the Creation of Classical Statistics; by Erich Lehman.

The book was a nice break, I enjoyed reading about the Human traits of the founders of modern classical statistics. The author put a lot of work into finding and citing the writings from Fisher and Neyman.

I learned that Ronald Aylmer Fisher was a wrangler, a student doing the best in examinations. I have been puzzled by the term data wrangler, thinking about rodeos and the west. It makes more sense to be the best student. Although a lasso might come in handy when fetching data.

It was fun to read about the silver jubilee of my dispute with Fisher by Neyman. Twenty five years of arguments. Wow that is a conflict.

The book ends with a discussion on the irony of Bayesian Inference.

This is a well done book that I recommend reading. I also think that it would make a great graphic novel.

The point of this post is to remind myself to keep a list of the packages that I am using. When I upgraded R didn’t keep all the packages. At first I was puzzled and surprised. Then I figured it out. That upgrading into a new folder was part of the problem.

I am going to solve the problem by starting up my other computer the MAC book and compare packages. I try to keep my windows and MAC R environments the same.

Next time I upgrade I am going to write down a list of packages. ]]>