19 Dec 2011

Blog Statistics with StatCounter & R

If you're interested in analysing your blog's statistics this can easily be done with a web-service like StatCounter (free, only registration needed, quite extensive service) and with R.
After implementing the StatCounter script in the html code of a webpage or blog one can download and inspect log-files with R with some short lines of code (like below) and then inspect visitor activity..

17 Dec 2011

Function to Collect Geographic Coordinates for IP-Addresses

I added the function IPtoXY to theBioBucket-Archives which collects geographic coordinates for IP-addresses.
It uses a web-service at http://www.datasciencetoolkit.org// and works with the base R-packages.

# System time to collect coordinates of 100 IP-addresses:
> system.time(sapply(log$IP.Address[1:100], FUN = IPtoXY))
       User      System verstrichen
       0.05        0.02       33.10

15 Dec 2011

Conversion of Several Variables to Factors

..often needed when preparing data for analysis (and usually forgotten until I need it for the next time).
With the below code I convert a set of variables to factors - it could be that there are slicker ways to do it (if you know one let me know!)

> dat <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10])))
> str(dat)
'data.frame':   4 obs. of  10 variables:
 $ A: int  5 34 3 15
 $ B: int  28 25 17 24
 $ C: int  2 12 10 32
 $ D: int  16 27 29 14
 $ E: int  40 7 4 31
 $ F: int  22 30 6 18
 $ G: int  33 36 35 38
 $ H: int  19 21 37 8
 $ I: int  20 11 9 26
 $ J: int  39 13 1 23
> id <- which(names(dat)%in%c("A", "F", "I"))
> dat[, id] <- lapply(dat[, id], as.factor)
> str(dat[, id])
'data.frame':   4 obs. of  3 variables:
 $ A: Factor w/ 4 levels "3","5","15","34": 2 4 1 3
 $ F: Factor w/ 4 levels "6","18","22",..: 3 4 1 2
 $ I: Factor w/ 4 levels "9","11","20",..: 3 2 1 4

12 Dec 2011

Default Convenience Functions in R (Rprofile.site)

I keep my blog-reference-functions, snippets, etc., at github and want to source them from there. This can be achieved by utilizing a function (source_https, customized for my purpose HERE). The original function was provided by the R-Blogger Tony Breyal - thanks Tony! As I will use this function quite frequently I just added the function code to my Rprofile.site and now am able to source from github whenever running code from the R-console. This is very handy and I thought it might be worth to share..

Function for Adding Transparency to JPEG (Output = PNG)

..see the function-code HERE.

Animation Newby Excercise with R-Package {animation}

Try this very simple & illustrative example for creating an animation with the animation package:

myfun <- function ( ) {
             n = ani.options("nmax")
             x = sample(1:n)
             y = sample(1:n)

             for (i in 1:n) {
                plot(x[i], y[i], cex = 3, col = 3, pch = 3, , lwd = 2,
                     ylim = c(0, 50),
                     xlim = c(0, 50))

par(mar = c(3, 3, 1, 0.5), mgp = c(1.5, 0.5, 0), tcl = -0.3)

7 Dec 2011

A Word Cloud with Spatial Meaning

..Some time ago I did a word cloud for representing a Google Scholar search result. Tal Galili pointed me at a post by Drew Conway that expanded on the topic of word clouds lacking spatial meaning. In fact the spatial ordering of words in a word cloud is arbitrary and meaningless..

As I am an ecologist, I soon came to the idea that text could be treated as a multivariate data set - assuming that words can be treated as species and sentences being similar to samples. So, presuming that it makes sense to put sentences and words in a cross-table as I similarly would do with a species / samples matrix, it may also be sensible to analyze such a matrix by ordination-methods for multivariate data, mostly used by ecologist recently. I chose NMDS ordination, as it is robust and quite easy to compute with R-package {vegan}.

1 Dec 2011

Producing Google Map Embeds with R Package googleVis

(1) for producing html code for a Google Map with R-package googleVis do something like:

df <- data.frame(Address = c("Innsbruck", "Wattens"),
                 Tip = c("My Location 1", "My Location 2"))
mymap <- gvisMap(df, "Address", "Tip", options = list(showTip = TRUE, mapType = "normal",
                 enableScrollWheel = TRUE))
plot(mymap) # preview
(2) then just copy-paste the html to your blog or website after customizing for your purpose..

Line Slope Calculation in ArcGis 9.3 (using XTools)

You have a polyline, say a path, river, etc., and want to know average slope of each single line.

Use z-values of nodes of polyline and calculate percentual slope by
(line-segment shape_length / z-Difference) * 100

28 Nov 2011

Retrieve GBIF Species Occurrence Data with Function from dismo Package

..The dismo package is awesome: with some short lines of code you can read & map species distribution data from GBIF (the global biodiversity information facility) easily:

27 Nov 2011

..A Quick Geo-Trick for GoogleMaps in R (using dismo)

... I thought this geocoding-bit might be worth sharing (found HERE when searching the web for dismo-documentation).

25 Nov 2011

..Some More Regex Examples Added to Collection

Find the below examples added to my list of regex-examples HERE.

24 Nov 2011

A Function for Adding up Matrices with Different Dimensions

I couldn't find a function that can handle matrices with different dimensions and thus coded one myself.  It can sum up matrices and also copes with matrices with different dimensions.

16 Nov 2011

Using SyntaxHighlighter and R Brush in Blogger

If you're thinking it is time to give the code examples in your blog a more readable look, you may follow this path and use the SyntaxHighlighter

First thing: check the SyntaxHighlighter Website for the basics.

14 Nov 2011

How to Download and Run Google Docs Script in the R Console

...There is not much to it:
upload a txt file with your script, share it for anyone with the link, then simply run something like the below code. 

ps: When using the code for your own purpose mind to change "https" to "http" and to insert your individual document id.
pss: You could use download.file() in this way for downloading any file from Google Docs..

In Reply to Ben Bolker's Post "Google Scholar (still) sucks"

Replying to Ben Bolker's post Google Scholar (still) sucks:


thanks for illustrating the issue in your post!

The main purpose of my function GScholarScraper is to retrieve titles - just because this is the best we can get from Google Scholar. Abstracts are truncated and thus shouldn't be used for meta-analysis. Also titles are truncated, as you said, and there is no way around. Though, this is not as often and severe as with abstracts, i.e.

The CSV is optional, the df with word frequencies and the word cloud are always returned - for any other output one can easily add some appropriate lines to the script.

My opinion:
My function is good for a quick summary and illustration of a query-result.

Tony's function is evidently better if you want to pull all fields of a given query (authors, titles, abstracts,..)

I wonder if people came across ROpenSci? I guess that might be very interesting in this context!

Last remark: Of course, a Google Scholar API would resolve all our problems in this regard..


10 Nov 2011

An Image Crossfader Function

Some project offspin, the jpgfader-function (the jpgfader-function in funny use can be viewed HERE):

Dear Silvio!..

R-Code to produce this nice gif-animated greeting card can be viewed HERE.


Convert Date Field into Year in ArcGis

Objective: a date field [date] with format dd.mm.yyyy, i.e.,  should be converted to format yyyy.

(1) Add an integer field [YEAR] to the attribute table.
(2) Field Calculator: YEAR =

9 Nov 2011

Add Transparency to JPEG - Yes, We Can!

...Just read in your JPEG and add an alpha channel manually, then assign values for transparency. Of course for printing you need to use a device that accepts alpha.

See how it's done HERE.

R-Function GScholarScraper to Webscrape Google Scholar Search Result

NOTE: You'll find the update HERE and HERE.

NOTE: The script is currently not working because the code of the Google-Scholar site has changed...
I'll see for this as soon as I find some spare time for it!

NOTE: If you try to access GoogleScholar programatically consider this words of caution:

Based on my previous post on Web Scraping I coded and uploaded the Function "GScholarScraper" HERE for testing!
The function will pull all (!) results, processing pages in chunks of 100 results/titles, and return a file with all titles, links, etc. It will also produce a word cloud using the words in the publication titles.

Please try your own search strings and report errors, etc.!

Build and run properly under:
R version 2.13.0 (2011-04-13) and R version R-2.13.2 (2011-09-30)

Platform: i386-pc-mingw32/i386 (32-bit) locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] stringr_0.5 tm_0.5-6 wordcloud_1.2 Rcpp_0.9.7

loaded via a namespace (and not attached):
[1] plyr_1.5.1 slam_0.1-23

PS: Errors reported lately (see comments) were resolved, the source code was updated..

5 Nov 2011

Next Level Web Scraping

The outcome presented above will not be very useful to most of you - still, this could be a good example for what possibly can be done via webscraping in R.

Background: TIRIS is the federal geo-statistical service of North-Tyrol, Austria. One of many great things it provides are historical and recent aerial photographs. These photographs can be addressed via URL's. This is the basis of the script: the URL's are retrieved, some parameters are adjusted, using the customized addresses images are downloaded and animated by saveHTML from the Animation-Package. The outcome (HTML-Animation) enables you to view and skip through aerial photographs of any location in North-Tyrol, from the year 1940 to 2010, and see how the landscape, buildings, etc. have changed...

View the script HERE.

3 Nov 2011

Some Simple but Propably Useful Regex Examples with R-Package stringr...

I found that examples for the use of regex in R are rather rare. Thus, I will provide some examples from my own learning materials - mostly stolen from the help pages, with small but maybe illustrative adaptions. ps: I will extent this list of examples HERE occasionally..

1 Nov 2011

Webscraping Google Scholar & Show Result as Word Cloud Using R

NOTE: Please see the update HERE and HERE!

...When reading Scott Chemberlain's last post about web-scraping I felt it was time to pick up and complete an idea that I was brooding over for some time now:

When a scientist aims out for a new project the first thing to do is to evaluate if other people already have come along to answer the very questions he is about to work on. I.e., I was interested if there has been done any research regarding amphibian diversity at regional/geographical scales correlated to environmental/landscape parameters. Usually I would got to Google-Scholar and search something like - intitle:amphibians AND intitle:richness OR intitle:diversity AND environment OR landscape - and then browse thru the results. But, this is often tedious and a way for a quick visual examination would be of great benefit.

31 Oct 2011

Using IUCN-Data, ArcMap 9.3 and R to Map Species Diversity

..I'm overwhelmed by the ever-growing loads of data that's made available via the web. I.e., IUCN collects and hosts spatial species data which is free for download. I'm itching to play with all this data... And, in the end there may arise some valuable outcome:

In the below examples I made a map for amphibian species richness - without much effort. I used a 5x5 km grid (constant areas, provided by Statistik Austria) and amphibian species ranges and intersected this data to yield species numbers per grid-cell. For aggregation of the tables from the polygons produced by the intersection I needed to switch from ArcMap to R because ESRI's "Summary Statistics" only cover MIN, MAX, MEAN, FIRST/LAST... and there is no easy way to apply custom functions. So, I just accessed the dbf that I wanted to aggregate with R and did the calculations there.. then reloaded the layer in ArcMap and was done..

24 Oct 2011

A Simple Example for the Use of Shapefiles in R

A simple example for drawing an occurrence-map (polygons with species' points) with the R-packages maptools and sp using shapefiles.
HERE is the example data.

23 Oct 2011

A Little Webscraping-Exercise...

In R it's quite easy to pull out anything from a webpage and I'll show a little exercise in doing so. Here I retrieve all blog addresses from R-bloggers by the function readLines() and some subsequent data processing.

12 Oct 2011

Yet Another One.. Animation with saveHTML / saveVideo from Package ANIMATION

...some more playing with saveHTML, as.raster() and rasterImage(), producing a "flickering screen":

How to Link to Files at Google Docs for Direct Download

...Google doesn't tell you this, as far as I know - so, if you want to link to a file at Google Docs for direct download you can use the following address scheme:


You can copy your individual file id from within the "Share..."-dialogue. Here you also need to put the share settings to "public" or to "anyone with the link"

a HTML example for this link to a pdf-file:

10 Oct 2011

Plot Animation with Imported Images

...I really dig the animation package! ..so here's the outcome of my firsts encounters with saveHTML() - I produced an animation with pre-existing images by utilizing the functions readJPEG() and rasterImage() from the R-packages jpeg and ReadImages. Credit goes out to xingmowang (nzprimarysectortrade-blog) from whom I picked up the concept of putting images to the plot region of a graph produced with the animation-functions.

23 Sept 2011

Nice Species Distribution Maps with GBIF-Data in R

Here's an example of how to easily produce real nice distribution maps from GBIF-data in R with package maps...

ArcGis Style with Marker Symbols for Tetrao tetrix & T. urogallus

...lately I designed two nice symbol markers for Tetrao spp. in vector fromat - they are for free download here.

20 Sept 2011

Picture Marker Symbols in ArcGis with Vector Format

Ever wanted to add pictures as marker symbols in ArcGis? ..then you may have noticed that raster images look poor in an ArcGis map no matter how high the resolution of the input image is...

Use of Classification Trees to Investigate Traits of Invasive Species

Which traits make an alien species invasive?
Due to what traits an alien species becomes established in a foreign flora?

This kind of questions could be analysed by the use of recursive partitioning and classification trees..
(the below example also includes some useful data manipulation techniques)...

16 Sept 2011

Match Words in MS-Word File with Words in another File and Apply New Format Using VBA

I present a macro that I wrote for re-formatting scientific species names (it is common to use italic fonts for that) in a Word file. Therefore I used a database of central European species names - this is compared with the words in my file and matches are re-formatted...

12 Sept 2011

Search Google Definition for Words in an Excel-File Using VBA

I have a glossary of words held in an excel-workbook. For getting instant definitions from Google I wrote a small macro which does this for me with one click.

This is how it is done:

29 Aug 2011

Comparing Two Distributions

Here I compare two distributions, flowering duration of indigenous and allochtonous plant species. The hypothesis is that alien compared to indigenous plant species exhibit longer flowering periods.

11 Aug 2011

Test Difference between Two Proportions & Plot Confidence Intervals

..an illustrative example for testing proportions and presenting the results.

the data: number of indigenous and alien plant species with and without vegetative reproduction (N = 3399, mid-european species, data-courtesy: BiolFlor) . Hypothesis: The proportion of species with vegetative reproduction is different between alien and indigenuos plant species.

result:  the prop. of plants with veg. reproduction is sign. lower for alien compared to indigenous plant species. this is simply due to the large number of agricultural weeds and contaminants within alien species - these species almost always reproduce by seeds.

8 Aug 2011

Two-Way PERMANOVA (adonis, vegan-Package) with Customized Contrasts

...say you have a multivariate dataset and a two-way factorial design - you do a PERMANOVA and the aov-table (adonis is using ANOVA or "sum"-contrasts) tells you there is an interaction - how to proceed when you want to go deeper into the analysis?
You could, however somewhat tedious, customize contrasts for the PERMANOVA and check for differences between certain level combinations.

15 Jun 2011

Fast Correction of Typos in MS Word with VBA

..I use a macro to quickly fix misspelled words. More precisely, the misspelled word is replaced by the first suggestion from the spelling checker. The macro is called by hitting "strg+shift+q" just after a typo occurred.

14 Jun 2011

Multiple Comparisons for GLMMs using glmer() & glht()

...here's an example of how to apply multiple comparisons to a generalised linear mixed model (GLMM) using the function glmer from package lme4 & glht() from package multcomp. Also, I present a nice example for visualizing data from a nested sampling design with lattice-plots! 

1 Jun 2011

How to List all Functions of an R-Package

   [1] "-"                                  "-.Date"                            
   [3] "-.POSIXt"                           "!"                                 
   [5] "!.hexmode"                          "!.octmode"                         
   [7] "!="                                 "$"                                 
   [9] "$.DLLInfo"                          "$.package_version"                 
  [11] "$<-"                                "$<-.data.frame"                    
  [13] "%%"                                 "%*%"                               
  [15] "%/%"                                "%in%"                              
  [17] "%o%"                                "%x%"                               
  [19] "&"                                  "&&"                                
  [21] "&.hexmode"                          "&.octmode"                         
  [23] "("                                  "*"    

- : function (e1, e2)  
-.Date : function (e1, e2)  
-.POSIXt : function (e1, e2)  
! : function (x)  
!.hexmode : function (a)  
!.octmode : function (a)  
!= : function (e1, e2)  
$ : .Primitive("$") 

Drawing Grids in R

Here's an example of how to draw a grid in R and how to fill it.
I did use the grid-package and its functions for displaying species cover values at squares of a recording frame...

30 May 2011

Legendre & Borcard: Nested ANOVA by Permutation

..here's a very illustrative R-Script example by Legendre & Borcard showing how a Nested ANOVA can be done by permutation.

25 May 2011

Species Recording Form

When recording species abundance data you often have endless rows with species names and one gets swollen eyes from searching the right line for each data-entry. Therefore I made myself a handy xls-file to collect species abundance data.
In it a VBA-macro is called with a keybord short-cut which matches a species short name you type in the active cell and jumps to the line with that species, the search entry is deleted automatically and you are in the cell where you want to add data... (download).

Neighborhood-Statistics for Samplepoints Based on Raster-Data

There's no ArcGis-tool for calculating neighborhood-statistics for samplepoints based on raster-data: say you had a raster with presence/absence data of a given feature and for a set of samplepoints you wanted to know the feature's occurrence rate (percentage of cells with presence of that feature) in the samplepoints neighborhood, then you will find this model (saved in toolbox NESTPTS.tbx) useful.

23 May 2011

Summarize Data by Several Variables

Here's an example how to conveniently summarize data with the cast function (package reshape). By the way you see how this could be done "in-conveniently" by hand. You also see how a for-loop works and how a matrix is constructed and filled. In addition this serves as an illustrative example how flexible "indexing" in R works, as seen in the below loop! (download data) (this example is adapted from https://stat.ethz.ch/pipermail/r-sig-ecology/2011-May/002174.html)

2 May 2011

Import dbf to R, Manipulate Strings with grep & sub Function

Here's a set of historical species presence records of a certain geographical region (data-link). I wanted to manipulate / simplify strings (species names) and get an overview of the data. 
...The tasks were to split genera and epitheta, to exclude species with specific strings included and to get rid of unwanted text (author names). For graphical presentation of the species record history I did a plot with segments indicating the first and last year of a species record:

29 Apr 2011

Visual Basic - IF THEN Usage in ArcGis Field Calculator

Recently I had to do some statistics on a numerical field in an attribute table and I had to classify this field according to a set of intervals for this purpose. I had to learn that there is no tool for this. So I had to use the fieldcalculator with the appropiate IF THEN VBA-Syntax:

Here is how it can be done (one of many ways):

28 Apr 2011

26 Apr 2011

Adonis (PERMANOVA) - Assumptions

Before you use PERMANOVA (R-vegan function adonis) you should read the user notes for the original program by the author (Marti J. Anderson) who first came up with this method. An important assumtption for PERMANOVA is same "multivariate spread" among groups, which is similar to variance homogeneity in univariate ANOVA.

I'll show why you may draw the wrong conclusions if this assumption is not met:

21 Apr 2011

Permutation Test with Stratified Data and Repeated Measurements

This is an example for a permutation test on stratified samples with repeated measurements. Samples are interdependent firstly because they come from several sites and secondly because the sampling was repeated a second time. That is samples of the same sites are dependent and sample t1 and sample t2, taken from the very same places are dependent. 

What I want to test is whether there is a difference between timepoint one (t1) and two (t2) or not. A hypothesis could be: the average difference t1-t2 is sign. larger than zero (a one-sided test). Another hypothesis could be: the average difference is sign. different from zero, either larger or smaller (a two-sided test).

If you deal with a distribution of your data that ordinary Linear Mixed Models (LMMs) or Generalized LMMs (GLMMs) can handle you should vote for this option - but sometimes you deal with awkard data and permutation tests may the only thing to bail you out...

20 Apr 2011

Bootsrap Confidence Intervals, Stratified Bootstrap

 Here's a worked example for comparing group averages with bootstrap confidence intervals and allowing for different subsample sizes by calling the strata argument within the bootstrap function.
The data (simulated) is set up analogous to an before-after impact experiment conducted on plots across 4 levels of a grouping factor ('stage'). Similarities were calculated for each composition before and after an impact and will be averaged over the grouping factor. Our hypothesis was that the levels of the grouping factor would show significantly different average similarities - that is, a higher/lower impact on composition. As plots were aggregated in different sites within the 'stages', this dependency had to be allowed for by use of the "strata" argument in the boot.ci call.

The conclusion from this simulated example would be that the averages similarities at stages C and D are significantly different from stages A and B. That is, as the similarities are higher in C and D than in A and B, impact on composition is significantly lower in C and D.

Custom Labels for Ordination Diagram

Here is how you do custom labels, hull, spider in a vegan ordination diagram:

19 Apr 2011

Lattice Plots - Usage of Panel Functions - Different Axes For Panel-Rows - Alternating Axis Titles

I present code for a stacked graph with common axes only for panels of the same row and with axis titles at different sides. This admittedly took me days (because i had not much of a clue how to use lattice), but eventually I did it and maybe someone can use this for his/her own purpose:

18 Apr 2011

Test Difference Between Diversity-Indices of Two Samples with Abundance Data

I adapted a permutation test from the PAST Software (Hammer & Harper, http://folk.uio.no/ohammer/past/diversity.html) that tests difference between diversity-indices of two samples with abundance data in R... See the example below:

Multivariate Repeated Measurements with adonis():

Please check the updated code in the comment by Wallace Beiroz, 26 January 2015 at 13:37!

Lately I had to figure out how to do a repeated measures (or mixed effects) analysis on multivariate (species) data. Here I share code for a computation in R with the adonis function of the vegan package. Credit goes to Gavin Simpson providing most of the important pieces of the below code in R-Help.