2014年10月2日星期四

R Training: Chapter 1. Introduction to R


Chapter 1. Introduction to R
Saturday, September 27, 2014
R is, at its heart, an elegant and beautiful language, well tailored for data analysis and statistics. --- Hadley Wickham
For introduction of R language, you are recommended to read the first chapter of R in Action and the introduction part of Advanced R.

R Installation

Install R
You can follow my instruction which is described below to install and upgrade R on Windows.
First, you need to download R and RStudio and install them. After the installations, run the following codes to set up a global library.
chooseCRANmirror() # Choose XMU
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
Old.R.RunMe()

Upgrade R
Once you have done these, from now on, whenever you want to update to a new version of R in the future, all you will need to do are the following TWO steps:
  1. Download and install the new version of R
  2. Open your new R and run the following codes
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()

RStudio settings
Open up your RStudio. In RStudio, Tools --> Global Options --> Code Editing/Appearance. See Customizing RStudio for details.
Most used RStudio keyboard shortcuts:
Description
Keyboard(Windows)
Clear console
Ctrl+L
Interrupt currently executing command
Esc
Run current line/selection
Ctrl+Enter
Run current document
Ctrl+Alt+R
Find and Replace
Ctrl+F
Find in Files
Ctrl+Shift+F
Comment/uncomment current line/selection
Ctrl+Shift+C
Check Spelling
F7
Undo
Ctrl+Z
Redo
Ctrl+Shift+Z
Delete Line
Ctrl+D
Indent
Tab (at beginning of line)
Show help for function at cursor
F1
Show source code for function at cursor
F2
Attempt completion
Tab or Ctrl+Space
See Keyboard Shortcuts for more details.

Getting help
Use Ctrl+Enter to run the selected codes or the line where you cursor on. The output is shown in the Console window.
help.start()   # general help
help(plot)      # help about function plot
?plot          # same thing 
apropos("plot") # list all functions containing string plot
example(plot)   # show an example of function plot

# search for plot in help manuals and archived mailing lists
RSiteSearch("plot")

# get vignettes on using installed packages
vignette()      # show available vingettes
vignette("knitr-html") # show specific vignette

Manage your workspace
Now please create a file in you computer system as your workpalce. Such as E:\Project\WISE R Club\LearnR.
R gets confused if you use a path in your code like
c:\mydocuments\myfile.txt
This is because R sees "\" as an escape character.
Instead, you should use
c:\\my documents\\myfile.txt
c:/mydocuments/myfile.txt
Either will work.
getwd() # print the current working directory - cwd 
ls()    # list the objects in the current workspace

setwd("E:/Project/WISE R Club/LearnR"# note / instead of \ in windows 

# view and set options for the session
help(options) # learn about available options
options() # view current option settings
optio#ns(digits=3) # number of digits to print on output

# work with your previous commands
history() # display last 25 commands
history(max.show=Inf) # display all previous commands

# save your command history 
savehistory(file="myfile") # default is ".Rhistory" 

# recall your command history 
loadhistory(file="myfile") # default is ".Rhistory"

# save the workspace to the file .RData in the cwd 
save.image()

# save specific objects to a file
# if you don't specify the path, the cwd is assumed 
save(object list, file="myfile.RData")
save(x, file="mydata.RData")

# load a workspace into the current session
# if you don't specify the path, the cwd is assumed 
load("mydata.RData")

q() # quit R. You will be prompted to save the workspace.

Script input/output
By default, R provides an interactive session with input from the keyboard and output to the screen. However, you can have input come from a script file and direct output to a variety of destinations.

Input
# source a script
source("myfile.R")
source("myfile.R", print.eval = TRUE)
source("myfile.R", echo = TRUE, print.eval = TRUE)

Output
The sink( ) function defines how to print the output.
# direct output to a file 
sink("output_file", append=FALSE, split=FALSE)

# return output to the terminal 
sink()
The append option controls whether output overwrites or adds to a file. The split option determines if output is also sent to the screen as well as the output file.
Here are some examples of the sink() function.
# output directed to myfile.txt in cwd. output is appended to existing file. output also send to terminal. 
sink("output_file.txt", append=FALSE)
x <- 1:5
cat("x: \n")
x
cat("Mean: \n")
mean(x)
cat("Variance: \n")
var(x)

cat("\n")
source("myfile.R", echo = TRUE, print.eval = TRUE)

sink()
When redirecting output, use the cat( ) function to annotate the output.

Packages
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library.
.libPaths() # get library location 
library()   # see all packages installed 
search()    # see packages currently loaded
A complete list of contributed packages is available from CRAN.
You can add packages from the Tools --> Install Packages or run code like install.packages("ggplot2"). You can update packages from Tools --> Check for Packages Updates or use update.packages()

Reusing results
One of the most useful design features of R is that the output of analyses can easily be saved and used as input to additional analyses. Please see the following examples.
lm(mpg~wt, data=mtcars)

fit <- lm(mpg~wt, data=mtcars)

str(fit) # view the contents/structure of "fit"

# plot residuals by fitted values
plot(fit$residuals, fit$fitted.values)

# produce diagnostic plots
plot(fit) 

An example
An example at the end.
setwd("E:/Project/WISE R Club/LearnR")
install.packages("ggplot2")
library(ggplot2)
help(package = "ggplot2")
vignette(package = "ggplot2")
?qplot
str(diamonds)
example(qplot) # example of qplot function
qplot(color, price/carat, data = diamonds, geom="jitter", alpha = I(1/5))
Notices: R is a case sensitive language.

References and resources

References

Resources
The following online resources are also very helpful for R language learning. I suggest you explore some of them by yourself.

2014年9月25日星期四

打造全能Mac自带Dictionary

对于大家来说,苹果电脑自带的Dictionary往往只是个摆设,基本是英-英词典,没有对汉语的释义,所以用户往往会选择安装其他词典程序,比如金山词霸。 其实苹果自带的Dictionary异常强大,前提是需要给它添加一些词典包,你可以随心所欲的添加牛津、朗文、英汉、汉英、法汉、德汉、汉法、汉德、日汉、汉日;甚至康熙大辞典,以及湘雅医学专业词典。只要你能找到合适的词典包即可。 [下载地址已经更新,2014/7月 可用] 词典转换工具地址New:http://code.google.com/p/mac-dictionary-kit/downloads/list 词典包地址New: http://abloz.com/huzheng/stardict-dic/zh_CN/ 1.下载DictUnifier 1.1 2.下载tarbal词典包 3.打开DictUnifier 1.1,点“choose”,选中你需要转换入Dictionary的词典包,比如“stardict-langdao-ce-gb-2.4.2.tar.bz2”。然后点击“convert”,请你用足够的耐心等待它转换吧,字典较大或者机器较老可能需要很长时间,千万不要以为程序死了。 4.词典包安装后,Dictionary会自动打开,这时你会在“全部”后面发现你新添加的词典的名称,哈哈,enjoy! 5. 特别提醒:有些Mac OS版本,需要在Dictionary的偏好设置里面(command加逗号),把需要的词典勾选一下~ 对于不想要的词典: 1. 在Dictionary界面不显示——在Dictionary的“系统设置”里面,把不想用的词典前面的“对勾”去掉 2. 想完全删去——在下面这两个文件夹内,找到相应的.dictionary文件,delete a) Macintosh HD/Library/Dictionaries文件夹 b) Macintosh HD/Users/你的用户名/Library/Dictionaries文件夹

2014年9月18日星期四

【转】:R语言学习由浅入深路线图

注:此文转载自网络,原作者邓一硕(据说),原链接已经找不到了。这篇文章对大家从入门开始如何学习R语言也许会有用,贴在这里各取所需吧。##后内容是我的一些解释和补充。文内部分书籍我添加了下载链接,仅供学习和交流,不用于商业用途,如有不恰当处,请告知我。
现在对R感兴趣的人越来越多,很多人都想快速的掌握R语言,然而,由于目前大部分高校都没有开设R语言课程,这就导致很多人不知道如何着手学习R语言。
对于初学R语言的人,最常见的方式是:遇到不会的地方,就跑到论坛上吼一嗓子,然后欣然or悲伤的离去,一直到遇到下一个问题再回来。当然,这不是最好的学习方式,最好的方式是——看书。目前,市面上介绍R语言的书籍很多,中文英文都有。那么,众多书籍中,一个生手应该从哪一本着手呢?入门之后如何才能把自己练就成某个方面的高手呢?相信这是很多人心中的疑问。有这种疑问的人有福了,因为笔者将根据自己的经历总结一下R语言书籍的学习路线图以使Ruser少走些弯路。
本文分为6个部分,分别介绍初级入门,高级入门,绘图与可视化,计量经济学,时间序列分析,金融等。
1.初级入门
An Introduction to R》,这是官方的入门小册子。其有中文版,由丁国徽翻译,译名为《R导论》。《R4 Beginners》,这本小册子有中文版应该叫《R入门》。除此之外,还可以去读刘思喆的《153分钟学会R》。这本书收集了R初学者提问频率最高的153个问题。为什么叫153分钟呢?因为最初作者写了153个问题,阅读一个问题花费1分钟时间,全局下来也就是153分钟了。有了这些基础之后,要去读一些经典书籍比较全面的入门书籍,比如《统计建模与R软件》,国外还有《R Cookbook》和《R in action》(## 图灵社区翻译了中文版本:R实战,)本人没有看过,因此不便评论。
最后推荐,《R in a Nutshell》。对,“果壳里面的R”!当然,是开玩笑的,in a Nutshell是俚语,意思大致是“简单的说”。
(##:据知友反应,下面的几本书也不错:
2.高级入门
读了上述书籍之后,你就可以去高级入门阶段了。这时候要读的书有两本很经典的。《Statistics with R》和《The R book》。之所以说这两本书高级,是因为这两本书已经不再限于R基础了,而是结合了数据分析的各种常见方法来写就的,比较系统的介绍了R在线性回归、方差分析、多元统计、R绘图、时间序列分析、数据挖掘等各方面的内容,看完之后你会发现,哇,原来R能做的事情这么多,而且做起来是那么简洁。读到这里已经差不多了,剩下的估计就是你要专门攻读的某个方面内容了。下面大致说一说。
3.绘图与可视化
亚里斯多德说,“较其他感觉而言,人类更喜欢观看”。因此,绘图和可视化得到很多人的关注和重视。那么,如何学习R画图和数据可视化呢?再简单些,如何画直方图?如何往直方图上添加密度曲线呢?我想读完下面这几本书你就大致会明白了。
首先,画图入门可以读《R Graphics》,个人认为这本是比较经典的,全面介绍了R中绘图系统。该书对应的有一个网站,google之就可以了。更深入的可以读《Lattice:Multivariate Data Visualization with R》。上面这些都是比较普通的。当然,有比较文艺和优雅的——ggplot2系统,看《ggplot2:Elegant Graphics for Data Analysis》。还有数据挖掘方面的书:《Data Mining with Rattle and R》,主要是用Rattle软件,个人比较喜欢Rattle!当然,Rattle不是最好的,Rweka也很棒!再有就是交互图形的书了,著名的交互系统是ggobi,这个我已经喜欢两年多了,关于ggobi的书有《Interactive and Dynamic Graphics for Data Analysis With R and GGobi》,不过,也只是适宜入门,更多更全面的还是去ggobi的主页吧,上面有各种资料以及包的更新信息!
特别推荐一下,中文版绘图书籍有谢益辉的《现代统计图形》。
4.计量经济学
关于计量经济学,首先推荐一本很薄的小册子:《Econometrics In R》,做入门用。然后,是《Applied Econometrics with R》,该书对应的R包是AER,可以安装之后配合使用,效果甚佳。计量经济学中很大一部分是关于时间序列分析的,这一块内容在下面的地方说。
5.时间序列分析
时间序列书籍的书籍分两类,一种是比较普适的书籍,典型的代表是:《Time Series Analysis and Its Applications :with R examples》。该书介绍了各种时间序列分析的经典方法及实现各种经典方法的R代码,该书有中文版。如果不想买的话,建议去作者主页直接下载,英文版其实读起来很简单。时间序列分析中有一大块儿是关于金融时间序列分析的。这方面比较流行的书有两本《Analysis of financial time series》,这本书的最初是用的S-plus代码,不过新版已经以R代码为主了。这本书适合有时间序列分析基础和金融基础的人来看,因为书中关于时间序列分析的理论以及各种金融知识讲解的不是特别清楚,将极值理论计算VaR的部分就比较难看懂。另外一个比较有意思的是Rmetrics推出的《Time Series FAQ》,这本书是金融时间序列入门的东西,讲的很基础,但是很难懂。对应的中文版有《金融时间序列分析常见问题集》,当然,目前还没有发出来。经济领域的时间序列有一种特殊的情况叫协整,很多人很关注这方面的理论,关心这个的可以看《Analysis of Integrated and Cointegrated Time Series with R》。最后,比较高级的一本书是关于小波分析的,看《Wavelet Methods in Statistics with R》。附加一点,关于时间序列聚类的书籍目前比较少见,是一个处女地,有志之士可以开垦之!
6.金融
金融的领域很广泛,如果是大金融的话,保险也要被纳入此间。用R做金融更多地需要掌握的是金融知识,只会数据分析技术意义寥寥。我觉得这些书对于懂金融、不懂数据分析技术的人比较有用,只懂数据分析技术而不懂金融知识的人看起来肯定如雾里看花,甚至有人会觉得金融分析比较低级。这方面比较经典的书籍有:《Advanced Topics in Analysis of Economic and Financial Data Using R》以及《Modeling Financial Time Series With S-plus》。金融产品定价之类的常常要用到随机微分方程,有一本叫《Simulation Inference Stochastic Differential Equations:with R examples》的书是关于这方面的内容的,有实例,内容还算详实!此外,是风险度量与管理类。比较经典的有《Simulation Techniques in Financial Risk Management》、《Modern Actuarial Risk Theory Using R》和《Quantitative Risk Management:Concepts, Techniques and Tools》。投资组合分析类和期权定价类可以分别看《Portfolio Optimization with R》和《Option Pricing and Estimation of Financial Models with R》。
7.数据挖掘
这方面的书不多,只有《Data Mining with R: learing with case studies》。不过,R中数据挖掘方面的包已经足够多了,参考包中的帮助文档就足够了。
8.附注
出于版权等事宜的考虑,我无法告知你说在“新浪爱问”等地方可以直接免费下载到上面提到的这些书,但是,我想你可以发挥自己的聪明才智去体悟!

A Brief History of Modern Growth Theory