<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>English Blog on Daijiang Li</title>
    <link>https://blog.dlilab.com/en/</link>
    <description>Recent content in English Blog on Daijiang Li</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 31 Jan 2017 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://blog.dlilab.com/en/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>macOS update issue</title>
      <link>https://blog.dlilab.com/en/2025/07/31/macos_update/</link>
      <pubDate>Thu, 31 Jul 2025 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2025/07/31/macos_update/</guid>
      <description>
        

&lt;p&gt;After updating macOS to Sequoia Version 15.6 (24G84), I noticed that I cannot interact with the Finder from any other Applications. For example, I cannot export files from &lt;code&gt;Photos.app&lt;/code&gt;, I cannot upload files to &lt;code&gt;Google Chrome&lt;/code&gt; or &lt;code&gt;Safari&lt;/code&gt;, I cannot exclude folders from &lt;code&gt;TimeMachine&lt;/code&gt; when backup the computer. Whenever I click on the uploading or exporting or excluding box, there is a spin of the colorful ball, and after about 10 seconds, nothing happended.&lt;/p&gt;

&lt;p&gt;I got really frustrated, and was almost ready to downgrade the whole system back to an earlier version. After doing some search (AI answer did not help), I found this one solved my issue: &lt;a href=&#34;https://apple.stackexchange.com/a/474043/&#34; target=&#34;_blank&#34;&gt;https://apple.stackexchange.com/a/474043/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Essentially, I just need to remove the cache of &lt;code&gt;FileProvider&lt;/code&gt; (the name makes sense now after came across with this issue).&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;rm -r ~/Library/Application\ Support/FileProvider/*
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then restart the Applications, and it now can interact with the Finder file system!!! Saved me a day.&lt;/p&gt;

&lt;p&gt;And, btw, we have finally arrived at Madison, WI yesterday. I may can write a short blog post about the whole process later.&lt;/p&gt;

&lt;h2 id=&#34;calendar-sync&#34;&gt;Calendar Sync&lt;/h2&gt;

&lt;p&gt;It has bothered me for a while that the Calendar on my mac does not sync all the things on my work outlook calendar. I tried to make sure the outlook exchange account is logged in, still did not sync. Then, I learned to restart the &lt;code&gt;exchangesyncd&lt;/code&gt; process: open Activity Monitor, find the &lt;code&gt;exchangesyncd&lt;/code&gt; process, and force quit it. It will restart automatically, which resolved the sync issues!&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2025/07/31/macos_update/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>RStudio Server</title>
      <link>https://blog.dlilab.com/en/2024/08/14/rstudio-server/</link>
      <pubDate>Wed, 14 Aug 2024 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2024/08/14/rstudio-server/</guid>
      <description>
        &lt;p&gt;After moving the workstation from LSU to University of Arizona, I could not get access to it after plugging it to the Ethernet. So, I asked the university ITS group to help. They opened the &lt;code&gt;8787&lt;/code&gt; portal on the university network, I then can access it via &lt;code&gt;https:://ip_address:8787&lt;/code&gt; with the university VPN. Everything worked well.&lt;/p&gt;

&lt;p&gt;Until we need to move it to a nearby room as it was initially put in another professor&amp;rsquo;s office. After moving to my office, I could not get access to it anymore. We then tried to put it back to that professor&amp;rsquo;s office, and still no luck!! This is quite puzzling as we did not change anything.&lt;/p&gt;

&lt;p&gt;UITS team came and made sure that it is all good on the university network side. So it must by the problem from the workstation. Sure it was. And the following command solved it! Sort of weird as we did not change anything!&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;sudo firewall-cmd --permanent --add-port 8787/tcp
&lt;/code&gt;&lt;/pre&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2024/08/14/rstudio-server/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Brief notes of the iDigBio workshop</title>
      <link>https://blog.dlilab.com/en/2024/06/10/idigbio-workshop/</link>
      <pubDate>Mon, 10 Jun 2024 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2024/06/10/idigbio-workshop/</guid>
      <description>
        

&lt;h2 id=&#34;advances-in-digital-media-workshop-series-yale-https-www-idigbio-org-wiki-index-php-advances-in-digital-media-workshop-series-yale&#34;&gt;&lt;a href=&#34;https://www.idigbio.org/wiki/index.php/Advances_in_Digital_Media_Workshop_Series:_Yale&#34; target=&#34;_blank&#34;&gt;Advances in Digital Media Workshop Series: Yale&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Here are just some of my very brief notes (pretty much just keywords).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LightningBug:

&lt;ul&gt;
&lt;li&gt;digitizing specimen labels using ML&lt;/li&gt;
&lt;li&gt;Meta&amp;rsquo;s Segmentation tool Segment Anything Model (SAM) is good and faster than r-cnn&lt;/li&gt;
&lt;li&gt;200k images, 6.9k specimens&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Heritage Science

&lt;ul&gt;
&lt;li&gt;NSF Mid-scale research program&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.morphosource.org/&#34; target=&#34;_blank&#34;&gt;MorphoSource&lt;/a&gt;: 3D, 2D, AV media data repository

&lt;ul&gt;
&lt;li&gt;Maybe a good place to look for exemplary sites for PhenoBase&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Audiovisual Core&lt;/li&gt;
&lt;li&gt;Expanding LeafMachine2: new training data, models, and methods for processing herbarium specimens; Will Weaver, PhD Candidate, University of Michigan&lt;/li&gt;
&lt;li&gt;Detectron by facebook to detect objects from images&lt;/li&gt;
&lt;li&gt;Imageomics ?&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;Phylogeny-guided neural network (phylo-NNs) Elhamod et al, KDD 2023&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://iiif.io/&#34; target=&#34;_blank&#34;&gt;IIIF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Phenotypic diversity

&lt;ul&gt;
&lt;li&gt;Phenological diversity&lt;/li&gt;
&lt;li&gt;Phenome space&lt;/li&gt;
&lt;li&gt;Segament Anything Model (SAM) + Grounding DINO&lt;/li&gt;
&lt;li&gt;t-SNE visualization for clustering data&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;More training data is not always better for ML models

&lt;ul&gt;
&lt;li&gt;If additional related but not present images are added&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Multimodel AI models

&lt;ul&gt;
&lt;li&gt;CLIP&lt;/li&gt;
&lt;li&gt;LMMs as effective rerankers&lt;/li&gt;
&lt;li&gt;INQUIRE: text-to-image search of iNaturalist images&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;2D to 3D reconstruction

&lt;ul&gt;
&lt;li&gt;Surface-to-volume ratio seems to be well preserved in shark, snakes
&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2024/06/10/idigbio-workshop/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Running R on HiperGator</title>
      <link>https://blog.dlilab.com/en/2024/05/06/hipergator-r/</link>
      <pubDate>Mon, 06 May 2024 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2024/05/06/hipergator-r/</guid>
      <description>
        

&lt;h2 id=&#34;the-problem&#34;&gt;The problem&lt;/h2&gt;

&lt;p&gt;How can I run R on HiperGator within my terminal? The interactive RStudio server works okay, but whenever you request a longer running time or more memory, you will wait much longer in queue. I would prefer to just run &lt;code&gt;R CMD BATCH&lt;/code&gt; within my terminal.&lt;/p&gt;

&lt;h2 id=&#34;solution&#34;&gt;Solution&lt;/h2&gt;

&lt;p&gt;Update: just use &lt;code&gt;module load R&lt;/code&gt;. No longer need to next steps.&lt;/p&gt;

&lt;p&gt;It is probably documented somewhere by HiperGator. I just could not find it easily.&lt;/p&gt;

&lt;p&gt;Here are the steps I followed.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Login to HiperGator terminal, install &lt;a href=&#34;https://github.com/conda-forge/miniforge&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;miniforge&lt;/code&gt;&lt;/a&gt; and &lt;code&gt;mamba&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Exit terminal and login back again&lt;/li&gt;
&lt;li&gt;In terminal, run &lt;code&gt;mamba create -n nameofmyenvi r-essentials r-base&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;add additional packages you want to install, e..g, &lt;code&gt;r-torch&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;or install later with &lt;code&gt;mamba install cuda-toolkit=11.8 pytorch&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mamba activate nameofmyenvi&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;In terminal, type &lt;code&gt;R&lt;/code&gt; and now you should be able to open R

&lt;ul&gt;
&lt;li&gt;If running long time jobs, use &lt;code&gt;tmux&lt;/code&gt; with &lt;code&gt;module load tmux&lt;/code&gt;, then &lt;code&gt;tmux&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2024/05/06/hipergator-r/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Problems with installing R package `arrow`</title>
      <link>https://blog.dlilab.com/en/2023/08/14/arrow-r/</link>
      <pubDate>Mon, 14 Aug 2023 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2023/08/14/arrow-r/</guid>
      <description>
        

&lt;h2 id=&#34;the-problem&#34;&gt;The problem&lt;/h2&gt;

&lt;p&gt;On the Linux server, I have recently upgraded R version to 4.3.1. Today, when I try to use the &lt;code&gt;arrow&lt;/code&gt; R package to read some large data files, I got the following error:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;Error: package or namespace load failed for ‘arrow’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object &#39;/home/dli/R/arrow/libs/arrow.so&#39;:
  libcrypto.so.1.1: cannot open shared object file: No such file or directory
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It seems that the file &lt;code&gt;libcrypto.so.1.1&lt;/code&gt; is missing (not sure why as I did not change the OS in the past couple of months).&lt;/p&gt;

&lt;h2 id=&#34;solution&#34;&gt;Solution&lt;/h2&gt;

&lt;p&gt;It seems that &lt;code&gt;libcrypto.so.1.1&lt;/code&gt; is included in the &lt;code&gt;libssl1.1&lt;/code&gt; program. I browsed the options at &lt;a href=&#34;http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/?C=M;O=D&#34; target=&#34;_blank&#34;&gt;http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/?C=M;O=D&lt;/a&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb
sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Use the above command to install the missing program. Problem solved. Sign&amp;hellip; :smiling&lt;em&gt;face&lt;/em&gt;with&lt;em&gt;tear: :smiling&lt;/em&gt;face&lt;em&gt;with&lt;/em&gt;tear:&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2023/08/14/arrow-r/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Git used wrong path of `gh`</title>
      <link>https://blog.dlilab.com/en/2023/06/26/git-gh/</link>
      <pubDate>Mon, 26 Jun 2023 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2023/06/26/git-gh/</guid>
      <description>
        

&lt;h2 id=&#34;the-problem&#34;&gt;The problem&lt;/h2&gt;

&lt;p&gt;On the Linux server, I have installed &lt;code&gt;homebrew&lt;/code&gt; to manage software and installed &lt;code&gt;gh&lt;/code&gt; to manage GitHub authorizations. It used to work well. Today, I am trying to use &lt;code&gt;git push&lt;/code&gt; to push commits to GitHub there after a while without using it. However, it complained that it cannot find the &lt;code&gt;gh&lt;/code&gt; bin.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;/home/linuxbrew/.linuxbrew/Cellar/gh/2.14.3/bin/gh auth git-credential get: 1: /home/linuxbrew/.linuxbrew/Cellar/gh/2.14.3/bin/gh: not found
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It seems that this is the version issue for &lt;code&gt;gh&lt;/code&gt; as brew has updated it to a later version. Yet, somehow the &lt;code&gt;git push&lt;/code&gt; is still using the old path.&lt;/p&gt;

&lt;h2 id=&#34;solution&#34;&gt;Solution&lt;/h2&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;gh auth setup-git
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Use the above command to set or update &lt;code&gt;git&lt;/code&gt; to use GitHub CLI &lt;code&gt;gh&lt;/code&gt; as the credential helper for all authenticated hosts. Problem solved.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2023/06/26/git-gh/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Tensorflow and R set up on server</title>
      <link>https://blog.dlilab.com/en/2022/11/04/tensorflow-r/</link>
      <pubDate>Fri, 04 Nov 2022 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2022/11/04/tensorflow-r/</guid>
      <description>
        &lt;p&gt;I am trying to set up Tensorflow and Keras on a Ubuntu server. And I want to interact with them through R. I came across some errors such as&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Error: Valid installation of TensorFlow not found.

ModuleNotFounderror: No Module named &#39;_ctypes&#39;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After gooling, here is the code I used to solve this issue, following the instructuin &lt;a href=&#34;https://github.com/pyenv/pyenv/wiki#suggested-build-environment&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;sudo apt update

sudo apt install make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev

# you may need to use apt-get
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, in R:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(reticulate)
path_to_python &amp;lt;- install_python(force = T)
virtualenv_create(&amp;quot;r-reticulate&amp;quot;, python = path_to_python)
install.packages(&amp;quot;keras&amp;quot;)
install_keras(envname = &amp;quot;r-reticulate&amp;quot;)
tensorflow::tf_config()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It seems to work now.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2022/11/04/tensorflow-r/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Library not found for `-lgfortran`</title>
      <link>https://blog.dlilab.com/en/2022/11/01/lgfortran/</link>
      <pubDate>Tue, 01 Nov 2022 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2022/11/01/lgfortran/</guid>
      <description>
        &lt;p&gt;After updating to macOS 13.0 (Ventura), somehow I got the following error when compile an R package with C++ code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ld: warning: directory not found for option &#39;-L/usr/local/gfortran/lib&#39;    
ld: library not found for -lgfortran macos ventura
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Probably it is because the homebrew installed &lt;code&gt;gfortran&lt;/code&gt; cannot be found by the system after the upgrading. I was in rush and did not have the time to figure out this. Instead, just went to &lt;a href=&#34;https://github.com/fxcoudert/gfortran-for-macOS/releases&#34; target=&#34;_blank&#34;&gt;this webpage&lt;/a&gt; and downloaded the latest gfortran package and installed it manually. After installation, I was able to compile the package again. Problem solved for now.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2022/11/01/lgfortran/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Host R packages on r-universe</title>
      <link>https://blog.dlilab.com/en/2022/09/30/r-universe/</link>
      <pubDate>Fri, 30 Sep 2022 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2022/09/30/r-universe/</guid>
      <description>
        

&lt;h1 id=&#34;the-problem&#34;&gt;The problem&lt;/h1&gt;

&lt;p&gt;Here is the problem: I am developing an R package &lt;a href=&#34;https://github.com/daijiang/rtrees&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;rtrees&lt;/code&gt;&lt;/a&gt;, which depends on a data package &lt;a href=&#34;https://github.com/daijiang/megatrees&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;megatrees&lt;/code&gt;&lt;/a&gt; with size around 100 Mb. It is not possible to submit the data package to CRAN given its large size. In addition, CRAN does not allow packages with &lt;code&gt;Remotes&lt;/code&gt; field (i.e., your package cannot depends on a package on GitHub). Therefore, I cannot submit &lt;code&gt;rtrees&lt;/code&gt; to CRAN.&lt;/p&gt;

&lt;h1 id=&#34;solution&#34;&gt;Solution&lt;/h1&gt;

&lt;p&gt;After searching around, I came across the &lt;a href=&#34;https://r-universe.dev/search/&#34; target=&#34;_blank&#34;&gt;R-universe program&lt;/a&gt; by rOpenSci. R-unierse allow us to build binary files for R packages and host it online; basically, we can have our own personal CRAN-like repo to host binaries for R packages without much trouble by following &lt;a href=&#34;https://ropensci.org/blog/2021/06/22/setup-runiverse/&#34; target=&#34;_blank&#34;&gt;its instruction&lt;/a&gt;. Now, my data package is on &lt;a href=&#34;https://daijiang.r-universe.dev/ui#packages&#34; target=&#34;_blank&#34;&gt;my r-universe&lt;/a&gt;. And in the &lt;code&gt;DESCRIPTION&lt;/code&gt; file of &lt;code&gt;rtrees&lt;/code&gt;, I can replace &lt;code&gt;Remotes&lt;/code&gt; with the following line:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Additional_repositories: 
    https://daijiang.r-universe.dev
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I think this should allow me to submit &lt;code&gt;rtrees&lt;/code&gt; to CRAN in the future. Since R-universe build binaries for the R packages we put there (Mac and Windows), it is now pretty fast to install large packages.&lt;/p&gt;

&lt;h1 id=&#34;shinny-app&#34;&gt;Shinny App&lt;/h1&gt;

&lt;p&gt;When I deploy the &lt;a href=&#34;https://djli.shinyapps.io/rtrees_shiny/&#34; target=&#34;_blank&#34;&gt;Shiny app of &lt;code&gt;rtrees&lt;/code&gt;&lt;/a&gt;, shinyapps.io does not recognize r-universe and returned an error. To deploy it, I need to reinstall the package from GitHub using &lt;code&gt;remotes::install_github()&lt;/code&gt;. This is because when deploying the shinny app, R will use the same way that you have installed the packages locally. If I installed the package from r-universe, R will try to do the same thing when deploying the shinny app; if I installed the package from GitHub, R will also install it from GitHub when deploying the shinny app.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2022/09/30/r-universe/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Weird R issue caused by messed up BLAS/LAPACK libraries</title>
      <link>https://blog.dlilab.com/en/2022/08/22/blas-lapack/</link>
      <pubDate>Mon, 22 Aug 2022 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2022/08/22/blas-lapack/</guid>
      <description>
        &lt;p&gt;Today, the server had some really bizarre behavior: when run a simple linear regression using R multiple times, the results are totally different! How is this possible? There is no randomness in the linear regression, it is deterministic!!&lt;/p&gt;

&lt;p&gt;I had no idea what is going on. Therefore, I posted in on &lt;a href=&#34;https://stackoverflow.com/questions/73451244/why-running-the-same-r-lm-code-gives-different-results/73452500#73452500&#34; target=&#34;_blank&#34;&gt;Stack Overflow&lt;/a&gt;. With some help from others, I though the issue may be from the &lt;a href=&#34;https://csantill.github.io/RPerformanceWBLAS/&#34; target=&#34;_blank&#34;&gt;BLAS/LAPACK libraries&lt;/a&gt; on the server.&lt;/p&gt;

&lt;p&gt;Currently, I have the Intel MLK version on the computer.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, I changed it to the &lt;code&gt;OpenBLAS&lt;/code&gt; library as it &lt;a href=&#34;https://csantill.github.io/RPerformanceWBLAS/&#34; target=&#34;_blank&#34;&gt;has similar speed as MLK&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After restarting R, the problem is solved!! What a weird one.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2022/08/22/blas-lapack/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Useful Vim commands</title>
      <link>https://blog.dlilab.com/en/2022/03/09/vim-command/</link>
      <pubDate>Wed, 09 Mar 2022 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2022/03/09/vim-command/</guid>
      <description>
        

&lt;h1 id=&#34;vim-commands&#34;&gt;Vim commands&lt;/h1&gt;

&lt;h2 id=&#34;normal-mode-to-insert-mode&#34;&gt;Normal mode to insert mode&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;i&lt;/code&gt;: insert text just before the cursor.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;I&lt;/code&gt;: insert text at the start of the line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;a&lt;/code&gt;: append text just after the cursor.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;A&lt;/code&gt;: append text at the end of the line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;o&lt;/code&gt;: open a new line below.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;O&lt;/code&gt;: open a new line above.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s&lt;/code&gt;: substitute the current character.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;S&lt;/code&gt;: substitute the current line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;r&lt;/code&gt;: replace the current character.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;R&lt;/code&gt;: replace continuous characters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;move-cursor-around&#34;&gt;Move cursor around&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt;: move to the start of the line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;^&lt;/code&gt;: move to the first non-blank character of the line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$&lt;/code&gt;: move to the end of the line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-b&lt;/code&gt;: move back one screen.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-f&lt;/code&gt;: move forward one screen.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;H&lt;/code&gt;: jump as high as possible, i.e. the first line of the window.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;M&lt;/code&gt;: jump to the middle of the window.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;L&lt;/code&gt;: jump to the lowest line of the window.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;G&lt;/code&gt;: jump to the end of the file.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1G&lt;/code&gt; or &lt;code&gt;gg&lt;/code&gt;: jump to the start of the file.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;30G&lt;/code&gt;: jump to line 30.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;w&lt;/code&gt;: move to the start of next word. 2w move two words.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;e&lt;/code&gt;: move to the end of next word.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;b&lt;/code&gt;: move backward one word.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;(&lt;/code&gt;: move to previous sentence.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;)&lt;/code&gt;: move to next sentence.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;{&lt;/code&gt;: move to previous paragraph&lt;/li&gt;
&lt;li&gt;&lt;code&gt;J&lt;/code&gt;: move to next paragraph.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-o&lt;/code&gt;: jump backward to previous location.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-i&lt;/code&gt;: jump forward to next location.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ma&lt;/code&gt;: mark current position, then move to other places and use &lt;code&gt;&#39;a&lt;/code&gt; (single quote then a) to come back to the start of the marked line, or you can use &amp;lsquo;&lt;code&gt;a&#39; (backtick and a) to jump to exact place. You can also use&lt;/code&gt;mb&lt;code&gt;and &#39;&lt;/code&gt;b&amp;rsquo;, i.e. any letters a-z A-Z.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;%&lt;/code&gt;: jump to corresponding item, e.g. from left brace to the right brace&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;visual-mode&#34;&gt;Visual mode&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;v&lt;/code&gt; then &lt;code&gt;ap&lt;/code&gt;: (a paragraph) choose a paragraph where the curson is on.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;y&lt;/code&gt; then &lt;code&gt;aw&lt;/code&gt;: choose the word where the cursor is on.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;u&lt;/code&gt; then &lt;code&gt;a&amp;quot;&lt;/code&gt;: choose the whole quoted word/sentence where the cursor is within.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;u&lt;/code&gt; then &lt;code&gt;ab&lt;/code&gt;: choose a block of text, i.e. within parathese, brakets, etc.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;~&lt;/code&gt;: switch cases of letters, i.e. upper to lower, lower to upper.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;V&lt;/code&gt;: visual mode with lines.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;edit-text&#34;&gt;Edit text&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;d&lt;/code&gt;: delete and put text into clipboard.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dd&lt;/code&gt;: delete the current line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dl&lt;/code&gt; or &lt;code&gt;x&lt;/code&gt;: delete the current letter where the cursor is.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dw&lt;/code&gt;: delete a word where the cursor is.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;d$&lt;/code&gt;: delete text after the cursor of the current line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;d0&lt;/code&gt;: delete text before the cursor of the current line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dh&lt;/code&gt;, &lt;code&gt;dj&lt;/code&gt;, &lt;code&gt;dk&lt;/code&gt;, &lt;code&gt;d2ap&lt;/code&gt;, &lt;code&gt;d2w&lt;/code&gt;, &lt;code&gt;d31&lt;/code&gt;, &lt;code&gt;24h&lt;/code&gt;, &lt;code&gt;d5j&lt;/code&gt;, etc. Combine number with options.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;y&lt;/code&gt;: yank or copy text&lt;/li&gt;
&lt;li&gt;&lt;code&gt;yy&lt;/code&gt;: yank the current line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;yap&lt;/code&gt;: yank the current paragraph&lt;/li&gt;
&lt;li&gt;&lt;code&gt;p&lt;/code&gt;: paste text after cursor position.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;P&lt;/code&gt;: paste text before cursor position.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;xp&lt;/code&gt;: cut then paste after cursor, so swap two characters.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dwwP&lt;/code&gt;: cut one word, move to next word, paste before that word, so swap two words&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.&lt;/code&gt;: repeat last action. If you want to repeat a series of actions, use &lt;code&gt;qa&lt;/code&gt; to start recording a macro, then do changes, then press &lt;code&gt;q&lt;/code&gt; to stop recording, in another line, use &lt;code&gt;Ca&lt;/code&gt; to do same changes on that line. Or &lt;code&gt;qb&lt;/code&gt;, &lt;code&gt;qc&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;u&lt;/code&gt;: undo the last change.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-r&lt;/code&gt;: redo the undo.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:earlier 5m&lt;/code&gt;: back to five minutes ago, i.e. time machine..&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:later 45s&lt;/code&gt;: forward in time&amp;hellip;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:undo 5&lt;/code&gt;: undo the last five changes.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:noh&lt;/code&gt;: no highlight after search.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:undolist&lt;/code&gt;: view the undo tree.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;search&#34;&gt;Search&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/word&lt;/code&gt;: to move to the first occurrence of word.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;n&lt;/code&gt;: go to next occurrence.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;N&lt;/code&gt;: go to previous occurrence.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/\&amp;lt;word\&amp;gt;&lt;/code&gt;: search word exactly.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/\d\*&lt;/code&gt;: search 0 or more digit (s).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;search-and-replace&#34;&gt;Search and replace&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;:S/search/replace/g&lt;/code&gt;: search and replace in current line.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:%S/search/replace/g&lt;/code&gt;: search and replace in all lines.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:%S/search/replace/gc&lt;/code&gt;: ask for confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;multiple-files&#34;&gt;Multiple files&lt;/h2&gt;

&lt;h3 id=&#34;multiple-sections&#34;&gt;Multiple sections&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;:set foldmethod=indent&lt;/code&gt;, then at indent line, &lt;code&gt;ze&lt;/code&gt; to close the fold (compress), &lt;code&gt;zo&lt;/code&gt; to open the fold, or &lt;code&gt;za&lt;/code&gt; to switch between close and open, alternate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;multiple-files-1&#34;&gt;Multiple files&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;:edit file1&lt;/code&gt;, &lt;code&gt;:e file2&lt;/code&gt;, then use &lt;code&gt;:b 1&lt;/code&gt; to go to file l, &lt;code&gt;:b 2&lt;/code&gt; to file 2. &lt;em&gt;b&lt;/em&gt; means buffer. &lt;code&gt;:ls&lt;/code&gt; to show all editing files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;multiple-windows&#34;&gt;Multiple windows&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;:new&lt;/code&gt;: to open a new window.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-w h/j/k/l&lt;/code&gt; or &lt;code&gt;ctrl-w ctrl-w&lt;/code&gt;: to move among windows.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:sp&lt;/code&gt;: to split current window. or use &lt;code&gt;ctrl-w s&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:vsp&lt;/code&gt;: to split vertical window, or use &lt;code&gt;ctrl-w v&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-w r&lt;/code&gt;: to rotate positions of windows.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-w K&lt;/code&gt;: move current window to topmost position.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:resize 10&lt;/code&gt; or &lt;code&gt;resize -10&lt;/code&gt;: change window size to display 10 more/less lines.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-w _&lt;/code&gt;: increase current window size as much as possible.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-w =&lt;/code&gt;: make all windows same size.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;multiple-tabs&#34;&gt;Multiple tabs&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;:tabnew&lt;/code&gt;: to open a new tab&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gt&lt;/code&gt;: go to next tab.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gT&lt;/code&gt;: go to previous tab.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;:tabmove&lt;/code&gt;: to reorder tabs, e.g. &lt;code&gt;:tabmove O&lt;/code&gt; moves the current tab to the first position.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;others&#34;&gt;Others&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;:!&lt;/code&gt;: to run shell commands within vim.&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2022/03/09/vim-command/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Add Multiple Passport Photos on One Page using R</title>
      <link>https://blog.dlilab.com/en/2018/01/10/add-multiple-passport-photos-on-one-page-using-r/</link>
      <pubDate>Wed, 10 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2018/01/10/add-multiple-passport-photos-on-one-page-using-r/</guid>
      <description>
        &lt;p&gt;In this post, I document how to put multiple photos on one page to save paper.&lt;/p&gt;

&lt;p&gt;First, after taking the photos, I edited them with &lt;a href=&#34;https://www.gimp.org&#34; target=&#34;_blank&#34;&gt;GIMP&lt;/a&gt;: adjust light, color, crop to the desired area. Then we need to &lt;a href=&#34;https://docs.gimp.org/en/gimp-image-scale.html&#34; target=&#34;_blank&#34;&gt;scale the cropped photo to the specified size&lt;/a&gt;. To do this, in the GIMP, I first selected &lt;code&gt;image/scale image&lt;/code&gt;. This will allow us to scale the photo to the size required; it also allow us specify the resolution in different units (e.g. pixel/inch, pixel/mm). If you have photos for more than one kid, then make sure that both photos have the same size and resolution. This will make later steps easier. It would also be useful to check (and scale if necessary) the &lt;a href=&#34;https://docs.gimp.org/en/gimp-image-print-size.html&#34; target=&#34;_blank&#34;&gt;print size&lt;/a&gt; too. After scaling the photo, &lt;a href=&#34;https://docs.gimp.org/en/gimp-export-dialog.html&#34; target=&#34;_blank&#34;&gt;export&lt;/a&gt; it as an external file. Since I have two kids, I got two photos (same size and resolution) in my folder after these steps.&lt;/p&gt;

&lt;p&gt;Time to use R. Specifically, the &lt;a href=&#34;https://cran.r-project.org/web/packages/magick/index.html&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;magick&lt;/code&gt;&lt;/a&gt; package did all the heavy lifting.&lt;/p&gt;

&lt;p&gt;First, read the photos into R.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(magick)
pic1 = image_read(&amp;quot;pic1.jpg&amp;quot;)
pic2 = image_read(&amp;quot;pic2.jpg&amp;quot;)
image_info(pic1) # size in pixel
image_info(pic2) # both should have the same size
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To put multiple photos together, we can use the &lt;code&gt;magick::image_append()&lt;/code&gt; function. This function, however, does not have an argument to specify the space between photos. Thus we need to create a blank image as a separator.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;sep = image_graph(width = 100, height = image_info(pic1)$height, 
                  bg = &amp;quot;white&amp;quot;)
plot(1, type = &amp;quot;n&amp;quot;, axes = F, xlab = &amp;quot;&amp;quot;, ylab = &amp;quot;&amp;quot;)
dev.off()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Great, now we are ready to put them together.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;pic1s = image_append(c(pic1, sep, pic1, sep, pic1, sep, pic1))
pic22 = image_append(c(pic2, sep, pic2, sep, pic2, sep, pic2))
# stack both
both = image_append(c(pic1s, pic2s), stack = TRUE)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here I put four photos for each of them. You can adjust the above code if you want different numbers.&lt;/p&gt;

&lt;p&gt;Finally, save the image to the disk.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;image_write(both, path = &amp;quot;both.jpg&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Check it out! We have multiple photos in one page now. One additional step (optional) is to open the new &lt;code&gt;both.jpg&lt;/code&gt; in GIMP and &lt;a href=&#34;https://docs.gimp.org/en/gimp-image-resize.html&#34; target=&#34;_blank&#34;&gt;set the cavans size&lt;/a&gt;. I set it to 6 by 4 inches and exported it out.&lt;/p&gt;

&lt;p&gt;That&amp;rsquo;s it. Super simple but very useful.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2018/01/10/add-multiple-passport-photos-on-one-page-using-r/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Fetching phylogenies from Phylomatic with R</title>
      <link>https://blog.dlilab.com/en/2017/08/25/r-phylomatic/</link>
      <pubDate>Fri, 25 Aug 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/08/25/r-phylomatic/</guid>
      <description>
        &lt;p&gt;It is usually a good idea to control for species evolutionary history if we want to get robust results. This is because species are not independent from each other, thus violate the independence assumption of data for most statistical models. Fortunately, with growing available genetic data and softwares, building phylogenies are getting easier and easier.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://phylodiversity.net/phylomatic/&#34; target=&#34;_blank&#34;&gt;Phylomatic&lt;/a&gt; is an easy way to fetch phylogenies for species, especially plants, on line. Thanks to packages developed by &lt;a href=&#34;https://ropensci.org&#34; target=&#34;_blank&#34;&gt;rOpenSci&lt;/a&gt;, we can now use Phylomatic within R. One big advantage of this is reproducibility, which means that we can regenerate the phylogeny whenever we want without click on buttons on the website. In addition, because most ecologists are using R for downstream analyses, fetching phylogenies within R will make the workflow much natural and easy to follow.&lt;/p&gt;

&lt;p&gt;The basic procedure for fetching phylogenies with Phylomatic using R will be:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compile the species names we want to include in the phylogeny; and clean if necessary (&lt;code&gt;taxize&lt;/code&gt; package, &lt;code&gt;rotl::tnrs_match_names()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Clean and prepare species names in the format to be used with Phylomatic (&lt;code&gt;brranching::phylomatic_names()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Query Phylomatic and return the phylogeny (&lt;code&gt;brranching::phylomatic()&lt;/code&gt;; if you have hundreds species, it is better to use Phylomatic locally with &lt;code&gt;brranching::phylomatic_local()&lt;/code&gt;)&lt;sup class=&#34;footnote-ref&#34; id=&#34;fnref:Another-option-t&#34;&gt;&lt;a rel=&#34;footnote&#34; href=&#34;#fn:Another-option-t&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It is possible to merge step 2 and 3, but I prefer to separate them.&lt;/p&gt;

&lt;p&gt;I assume that you already have a list of species, named as &lt;code&gt;sp_list&lt;/code&gt;. Then we can use the &lt;code&gt;phylomatic()&lt;/code&gt; function from the &lt;code&gt;brranching&lt;/code&gt; package. If you do not have it installed, install it first with &lt;code&gt;install.packages(&amp;quot;brranching&amp;quot;)&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;sp_list = c()
tree = brranching::phylomatic(sp_list)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you have few species, this will likely give you a phylogeny with all species. However, in practice, it is quite possible that you will get a warning like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;NOTE: 3 taxa not matched: NA/genus/species, ...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this case, we may try to prepare species names first with &lt;code&gt;brranching::phylomatic_names()&lt;/code&gt;. The default database will be &lt;code&gt;ncbi&lt;/code&gt;, but if you have hundreds of species, this can be slow. Instead, I would suggest to use &lt;code&gt;ape&lt;/code&gt; first because it is much faster (this is the default within &lt;code&gt;brranching::phylomatic()&lt;/code&gt;). Then filter out those species have &lt;code&gt;NA&lt;/code&gt; as family and try &lt;code&gt;ncbi&lt;/code&gt; or &lt;code&gt;itis&lt;/code&gt; (these are the three database supported). Sometimes, your species names are not clean, e.g. with synonyms, then the R package &lt;code&gt;taxize&lt;/code&gt; will be really handy. In addition, I find &lt;code&gt;rotl::tnrs_match_names()&lt;/code&gt; is also good to check and solve names. This function will compare with &lt;a href=&#34;https://tree.opentreeoflife.org/opentree/argus/opentree9.1@ott93302&#34; target=&#34;_blank&#34;&gt;Open Tree of Life&lt;/a&gt; to check species names.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;sp_list_phylocom = brranching::phylomatic_names(sp_list, 
                                                format = &amp;quot;isubmit&amp;quot;, 
                                                db = &amp;quot;ncbi&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, let&amp;rsquo;s try to fetch the phylogeny again, with the updated species list.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;tree = brranching::phylomatic(sp_list_phylocom)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As mentioned eariler, it is possible to merge these two steps into one with &lt;code&gt;tree = brranching::phylomatic(sp_list_phylocom, db = &amp;quot;ncbi&amp;quot;)&lt;/code&gt; but I prefer to solve species names first.&lt;/p&gt;

&lt;p&gt;The default backbone phylogeny is the APG III &lt;code&gt;R20120829&lt;/code&gt;. We can use the Zanne et al. 2014 phylogeny.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;tree = brranching::phylomatic(sp_list_phylocom, 
                              storedtree = &amp;quot;zanne2014&amp;quot;)
plot(tree)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally, I have one reproducible example that shows how to use the &lt;code&gt;brranching&lt;/code&gt; package to get phylogeny for plants at &lt;a href=&#34;https://github.com/daijiang/New_Phytologist_Appendix/blob/master/1-data.R&#34; target=&#34;_blank&#34;&gt;Github&lt;/a&gt;. Feel free to check it out (and the associated &lt;a href=&#34;http://onlinelibrary.wiley.com/doi/10.1111/nph.14397/abstract&#34; target=&#34;_blank&#34;&gt;paper&lt;/a&gt; if you are interested in)!&lt;/p&gt;
&lt;div class=&#34;footnotes&#34;&gt;

&lt;hr /&gt;

&lt;ol&gt;
&lt;li id=&#34;fn:Another-option-t&#34;&gt;Another option to use Phylomatic locally is to download Phylocom, which can also be used within R using package &lt;a href=&#34;https://github.com/ropensci/phylocomr&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;phylocomr&lt;/code&gt;&lt;/a&gt; &lt;a class=&#34;footnote-return&#34; href=&#34;#fnref:Another-option-t&#34;&gt;↩&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/08/25/r-phylomatic/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>List of functions from tidyverse that I do not use often</title>
      <link>https://blog.dlilab.com/en/2017/08/05/func-tidyverse/</link>
      <pubDate>Sat, 05 Aug 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/08/05/func-tidyverse/</guid>
      <description>
        &lt;p&gt;I do not use these functions often, but they can be really useful for some tasks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ggplot2&lt;/code&gt; package:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;coord_cartesian(xlim = , ylim = )&lt;/code&gt; to &lt;em&gt;zoom in&lt;/em&gt; a part of a figure, which is different from &lt;code&gt;xlim()&lt;/code&gt; or &lt;code&gt;scale_x_continuous(limits = )&lt;/code&gt;. The later will simply toss data points.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cut_width()&lt;/code&gt;, &lt;code&gt;cut_interval()&lt;/code&gt;, &lt;code&gt;cut_number()&lt;/code&gt; to convert a continous variable to groups.&lt;/li&gt;
&lt;li&gt;ggplot by default will drop categories without any value, to avoid this, use &lt;code&gt;... + geom_bar() + scale_x_discrete(drop = FALSE)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;reorder factor according to an numerical variable: &lt;code&gt;ggplot(data, aes(num_var, forcats::fct_reorder(factor_var, num_var))) + geom_point()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;remove legend: &lt;code&gt;... + guides(fill = FALSE)&lt;/code&gt; or &lt;code&gt;... + guides(color = FALSE)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;change legend rows: &lt;code&gt;... + guides(fill = guide_legend(nrow = 1))&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;change legend title: &lt;code&gt;... + labs(fill = &amp;quot;title&amp;quot;)&lt;/code&gt; or &lt;code&gt;... + labs(color = &amp;quot;title&amp;quot;)&lt;/code&gt; or &lt;code&gt;... + scale_fill_xxx(name = &amp;quot;title&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;change axes tick labels: e.g. &lt;code&gt;... + scale_x_log10(labels = scales::dollar, labels = scales::wrap_format(10), breaks = ...)&lt;/code&gt;. Package &lt;code&gt;scales&lt;/code&gt; can be useful.&lt;/li&gt;
&lt;li&gt;draw maps: &lt;code&gt;... + geom_polygon(aes(group = group)) + coord_map(projection = &amp;quot;albers&amp;quot;, lat0 = 39, lat1 = 45)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;when write a function for plotting, &lt;code&gt;aes_string()&lt;/code&gt; can be useful.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;scale_x_continuous(expand = c(.1, .1))&lt;/code&gt; to expand the plot to avoid cutoff of labels.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;scale_x_discrete(limits = rev(level(grp)))&lt;/code&gt; to reverse the order of a factor.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;p + xlab(NULL)&lt;/code&gt; to remove x labels and its space.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tidyr&lt;/code&gt; package:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;complete()&lt;/code&gt; complete a data frame with missing combinations of data. Turns implicit missing values into explicit missing values.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fill()&lt;/code&gt; Fills missing values in using the previous entry. Useful if repeated values are omitted. Last observation carried forward.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;convert = TRUE&lt;/code&gt; within &lt;code&gt;gather()&lt;/code&gt; and &lt;code&gt;spread()&lt;/code&gt; to convert the generated column into correct types.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;extract()&lt;/code&gt; with regular expressions to extract part of a column.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dplyr&lt;/code&gt; package

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;transmute()&lt;/code&gt; will only keep generated variables.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;count()&lt;/code&gt; count the number of observations.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;left_join(x, y, by = c(&amp;quot;a&amp;quot; = &amp;quot;b&amp;quot;))&lt;/code&gt; when key variable has different names in x and y.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bind_rows(list)&lt;/code&gt; = &lt;code&gt;plyr::ldply(list)&lt;/code&gt;: stack a list into a data frame (not always work, e.g. &lt;code&gt;bind_rows(list(1:2, 3:4))&lt;/code&gt; does not work but &lt;code&gt;ldply()&lt;/code&gt; works)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;stringr&lt;/code&gt; package

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;str_subset(words, &amp;quot;x$&amp;quot;)&lt;/code&gt; = &lt;code&gt;words[str_detect(words, &amp;quot;x$&amp;quot;)]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;str_count()&lt;/code&gt; will count how many matches resulted from &lt;code&gt;str_detect()&lt;/code&gt;. &lt;code&gt;str_count(&amp;quot;abababa&amp;quot;, &amp;quot;aba&amp;quot;)&lt;/code&gt; will return 2.&lt;/li&gt;
&lt;li&gt;When you use a pattern that’s a string, it’s automatically wrapped into a call to &lt;code&gt;regex()&lt;/code&gt;. See more options for &lt;code&gt;regex()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;forcats&lt;/code&gt; package

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fct_reorder()&lt;/code&gt;, &lt;code&gt;fct_reorder2()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fct_infreq()&lt;/code&gt;, &lt;code&gt;fct_rev()&lt;/code&gt;, &lt;code&gt;fct_recode()&lt;/code&gt;, &lt;code&gt;fct_collapse()&lt;/code&gt;, &lt;code&gt;fct_lump()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;purrr&lt;/code&gt; package

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;map(imput, fun)&lt;/code&gt;, similar as &lt;code&gt;lapply()&lt;/code&gt;; when input is a data frame, do something specified by &lt;code&gt;fun&lt;/code&gt; to each column and return as a list. If want to return vector, use &lt;code&gt;map_dbl()&lt;/code&gt;, &lt;code&gt;map_lgl()&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;when input is a list, same as &lt;code&gt;plyr::l_ply()&lt;/code&gt;; e.g. we can use &lt;code&gt;split(mtcars, mtcars$cyl)&lt;/code&gt; to get a list from a data frame.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;split(mtcars, mtcars$cyl) %&amp;gt;% map(~lm(mpg ~ wt, data = .))&lt;/code&gt; do a lm to each element of the list; &lt;code&gt;~&lt;/code&gt; is a shortcut for anonymous function, e.g. &lt;code&gt;split(mtcars, mtcars$cyl) %&amp;gt;% map( function(df) lm(mpg ~ wt, data = df))&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;a list of models from the above point named as &lt;code&gt;models&lt;/code&gt;, then &lt;code&gt;models %&amp;gt;% map(summary) %&amp;gt;% map_dbl(~.$r.squared)&lt;/code&gt; will extract &lt;code&gt;$R^2$&lt;/code&gt; of each model. We can do this by strings too: &lt;code&gt;models %&amp;gt;% map(summary) %&amp;gt;% map_dbl(&amp;quot;r.squared&amp;quot;)&lt;/code&gt;; can even use position sometimes, e.g. &lt;code&gt;map_dbl(list(list(1, 2, 3), list(4, 5, 6), list(7, 8, 9)), 2)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/08/05/func-tidyverse/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>fopenmp option of clang error</title>
      <link>https://blog.dlilab.com/en/2017/06/21/fopenmp-option-of-clang-error/</link>
      <pubDate>Wed, 21 Jun 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/06/21/fopenmp-option-of-clang-error/</guid>
      <description>
        &lt;p&gt;When I try to source a Rcpp file, I got the following error under macOS:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;clang: error: unsupported option &#39;-fopenmp&#39;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After a little bit Googling, I found &lt;a href=&#34;http://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/&#34; target=&#34;_blank&#34;&gt;this post&lt;/a&gt;, which at the end solved my problem. Briefly, I did the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Installed &lt;code&gt;xcode&lt;/code&gt; from Apple Store (a simplified version may be enough) or in Terminal with&lt;code&gt;xcode-select --install&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Installed &lt;code&gt;llvm&lt;/code&gt; via &lt;code&gt;brew install llvm&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Downloaded and installed the &lt;code&gt;gfortran&lt;/code&gt; binary installer from &lt;a href=&#34;https://gcc.gnu.org/wiki/GFortranBinaries#MacOS&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;. Note: You will need to download the OS X El Capitan &lt;strong&gt;gfortran 6.1&lt;/strong&gt; binaries regardless of whether or not you are on macOS Sierra, which presently only offers gfortran 6.3.&lt;/li&gt;
&lt;li&gt;Downloaded and extracted &lt;code&gt;clang&lt;/code&gt; to &lt;code&gt;/usr/local/clang&lt;/code&gt; (overwrite it if already exists, &lt;code&gt;sudo cp -r ~/Downloads/usr/local/clang4 /usr/local/clang4&lt;/code&gt;). See &lt;a href=&#34;http://thecoatlessprofessor.com/programming/openmp-in-r-on-os-x/#gui-clang4&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt; for more information.&lt;/li&gt;
&lt;li&gt;In terminal&lt;/li&gt;
&lt;/ol&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;   cat &amp;lt;&amp;lt;- EOF &amp;gt; ~/.R/Makevars
   # The following statements are required to use the clang4 binary
   CC=/usr/local/clang4/bin/clang
   CXX=/usr/local/clang4/bin/clang++
   LDFLAGS=-L/usr/local/clang4/lib
   # End clang4 inclusion statements
   EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, the error is fixed (for now)!&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/06/21/fopenmp-option-of-clang-error/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Blog posts with academic styles</title>
      <link>https://blog.dlilab.com/en/2017/05/20/academic-posts/</link>
      <pubDate>Sat, 20 May 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/05/20/academic-posts/</guid>
      <description>
        &lt;!-- BLOGDOWN-HEAD --&gt;
&lt;!-- /BLOGDOWN-HEAD --&gt;

&lt;!-- BLOGDOWN-BODY-BEFORE --&gt;
&lt;!-- /BLOGDOWN-BODY-BEFORE --&gt;
&lt;p&gt;Goal: I want to write academic style blog posts with: citations, cross-reference of tables and figures, and I want to manage figure path by myself. The default setting of blogdown can handle citations and cross-references pretty well thanks to &lt;a href=&#34;https://yihui.name&#34;&gt;Yihui&lt;/a&gt;’s awesome work on &lt;a href=&#34;https://github.com/rstudio/bookdown&#34;&gt;&lt;code&gt;bookdown&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://github.com/rstudio/blogdown&#34;&gt;&lt;code&gt;blogdown&lt;/code&gt;&lt;/a&gt; packages, but the figures are nested too deep. I just want to put all figures under &lt;code&gt;static/figures&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;After a bit of digging, &lt;a href=&#34;https://github.com/rbind/daijiang/blob/master/R/build_one.R&#34;&gt;I managed to do this&lt;/a&gt;. The main trick is to add a &lt;code&gt;knitr&lt;/code&gt; setup chunk to the Rmd file, and then parse it with &lt;code&gt;blogdown::render_page()&lt;/code&gt;, based on &lt;a href=&#34;https://github.com/rbind/yihui/tree/master/R&#34;&gt;Yihui’s set up&lt;/a&gt;. If a post does not have any figures, it will pass the first step and go directly with &lt;code&gt;blogdown::render_page()&lt;/code&gt;. I did not look through all functions available from the &lt;code&gt;blogdown&lt;/code&gt; package. But I am sure there must be a better way to do this. Anyway, I get what I want at this moment.&lt;/p&gt;
&lt;hr /&gt;
&lt;div id=&#34;citations&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Citations&lt;/h2&gt;
&lt;p&gt;For citations, put the bibtex file in the same folder as the post, and then add &lt;code&gt;bibliography: ref.bib&lt;/code&gt; in the yaml. You can even define the citation styles via &lt;code&gt;csl: url_of_csl_file&lt;/code&gt; in the yaml.&lt;a href=&#34;#fn1&#34; class=&#34;footnoteRef&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; Thousands of csl files are available at &lt;a href=&#34;https://github.com/citation-style-language/styles&#34;&gt;Github CSL repository&lt;/a&gt;. Go and find one you like and paste the url in the yaml.&lt;/p&gt;
&lt;p&gt;Testing paragraph: Invasion of non-native species, one of the most widespread and harmful consequences of global change, is causing worldwide ecosystem degradation and economic loss &lt;span class=&#34;citation&#34;&gt;(Vilà &lt;em&gt;et al.&lt;/em&gt;, &lt;a href=&#34;#ref-vila2011ecological&#34;&gt;2011&lt;/a&gt;; Simberloff &lt;em&gt;et al.&lt;/em&gt;, &lt;a href=&#34;#ref-simberloff2013impacts&#34;&gt;2013&lt;/a&gt;)&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;math-equations&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Math equations&lt;/h2&gt;
&lt;p&gt;Here is inline equations &lt;span class=&#34;math inline&#34;&gt;\(a^2 + b^2 = c^2\)&lt;/span&gt;; and display equations:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}\]&lt;/span&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-code-chunk&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;R code chunk&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(cars)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;including-plots-and-cross-refer-it-back&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Including plots and cross-refer it back&lt;/h2&gt;
&lt;p&gt;You can also embed plots and cross-refer it with &lt;code&gt;\@ref(fig:figure-label)&lt;/code&gt;, for example Figure &lt;a href=&#34;#fig:pressure&#34;&gt;1&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(pressure)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:pressure&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://blog.dlilab.com/figures/en/2017-05-20-ms/pressure-1.png&#34; alt=&#34;Here is figure caption.&#34; width=&#34;576&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Here is figure caption.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;table-and-cross-reference&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Table and cross-reference&lt;/h2&gt;
&lt;p&gt;You can also print tables and cross-refer it with &lt;code&gt;\@ref(tab:table-label)&lt;/code&gt;, for example Table &lt;a href=&#34;#tab:pressure2&#34;&gt;1&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;knitr::kable(head(pressure), caption = &amp;quot;Table legend.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:pressure2&#34;&gt;Table 1: &lt;/span&gt;Table legend.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;right&#34;&gt;temperature&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;pressure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0002&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;20&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0012&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;40&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0060&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;60&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;100&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2700&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That’s it.&lt;/p&gt;
&lt;p&gt;Any suggestions to improve this workflow? Comment below or send me a pull request. Thanks.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-simberloff2013impacts&#34;&gt;
&lt;p&gt;Simberloff, D., Martin, J.-L., Genovesi, P., Maris, V., Wardle, D.A., Aronson, J., Courchamp, F., Galil, B., García-Berthou, E., Pascal, M. &amp;amp; others (2013) Impacts of biological invasions: What’s what and the way forward. &lt;em&gt;Trends in ecology &amp;amp; evolution&lt;/em&gt;, &lt;strong&gt;28&lt;/strong&gt;, 58–66.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-vila2011ecological&#34;&gt;
&lt;p&gt;Vilà, M., Espinar, J.L., Hejda, M., Hulme, P.E., Jarošík, V., Maron, J.L., Pergl, J., Schaffner, U., Sun, Y. &amp;amp; Pyšek, P. (2011) Ecological impacts of invasive alien plants: A meta-analysis of their effects on species, communities and ecosystems. &lt;em&gt;Ecology letters&lt;/em&gt;, &lt;strong&gt;14&lt;/strong&gt;, 702–708.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;A local file in the same folder will work too.&lt;a href=&#34;#fnref1&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/05/20/academic-posts/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Reading notes: phylogenetic comparative models</title>
      <link>https://blog.dlilab.com/en/2017/05/13/reading-notes-phylogenetic-comparative-models/</link>
      <pubDate>Sat, 13 May 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/05/13/reading-notes-phylogenetic-comparative-models/</guid>
      <description>
        

&lt;blockquote&gt;
&lt;p&gt;Brief notes for &lt;em&gt;my own use&lt;/em&gt; from a short primer by &lt;a href=&#34;http://www.cell.com/current-biology/fulltext/S0960-9822(17)30348-2&#34; target=&#34;_blank&#34;&gt;Cornwell &amp;amp; Nakagawa, 2017, Current Biology&lt;/a&gt;. It is rather simple and basic, but can be a good intro/reminder about big pictures.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1 id=&#34;phylogenetic-comparative-methods&#34;&gt;Phylogenetic comparative methods&lt;/h1&gt;

&lt;p&gt;To explain the evolution of Earth&amp;rsquo;s diversity, phylogenetic comparative methods (PCM) often combine phylogeny with traits of species. The building of phylogenies (phylogenetics) is different from PCMs though they are not independent. PCMs are used to address the questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how did the characteristics of organisms evolve through time?&lt;/li&gt;
&lt;li&gt;what factors influenced speciation and extinction?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because species are not independent with each other, traditional linear regressions are not applicable. Felsenstein (1985), one of the first paper of PCMs, used phylogenetic independent contrasts to avoid this problem. Basically, instead of using species as data point, we can use the evolutionary branching point (divergence) as a replicate in the model.&lt;/p&gt;

&lt;h2 id=&#34;trait-evolution&#34;&gt;Trait evolution&lt;/h2&gt;

&lt;p&gt;We want to study the speed (tempo) and the manner (mode, e.g. slow and gradual, fast, with big jumps) of trait evolution. Common models are Briownian motion and Ornstein–Uhlenbeck models of trait evolution.&lt;/p&gt;

&lt;p&gt;We also want to study evolutionary links among traits and between traits and environmental variables. Advanced methods include generalized linear mixed models and structural equation models that account for species evolutionary relationships.&lt;/p&gt;

&lt;h2 id=&#34;lineage-diversification&#34;&gt;Lineage diversification&lt;/h2&gt;

&lt;p&gt;Why are some lineages more speciose than others of similar age? Where and when on the phylogeny were there shifts in diversification rate? And why did those shifts occur?&lt;/p&gt;

&lt;h2 id=&#34;pcms-in-different-disciplines&#34;&gt;PCMs in different disciplines&lt;/h2&gt;

&lt;p&gt;Other disciplines are also using PCMs, e.g. community ecology, linguistics, anthropology and paleobiology, by building phylogenies for e.g. languages.&lt;/p&gt;

&lt;h2 id=&#34;caveats-and-the-future-of-pcms&#34;&gt;Caveats and the future of PCMs&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Tree uncertainty. Species can be misplaced in a phylogenetic tree, ancestral nodes can be wrongly inferred, or more subtly, but more commonly, branch lengths are incorrect.&lt;/li&gt;
&lt;li&gt;Trait uncertainty. Traits are measured with error. And for most PCMs, we used representative values, but it is hard to define representative. For example, what is the representative value for human height?&lt;/li&gt;
&lt;li&gt;Model uncertainty. When we investigate trait evolution, we assume a certain model of evolution — most often, the Brownian motion model. However, a trait can evolve quite differently from such a simple model and there may be heterogeneity in the tempo and mode among the branches of the tree.&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/05/13/reading-notes-phylogenetic-comparative-models/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Notes of Data-driven Ecological Synthesis</title>
      <link>https://blog.dlilab.com/en/2017/05/09/notes-of-data-driven-ecological-synthesis/</link>
      <pubDate>Tue, 09 May 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/05/09/notes-of-data-driven-ecological-synthesis/</guid>
      <description>
        

&lt;p&gt;I went to the excellent data-driven ecological synthesis summer school at the &lt;a href=&#34;http://www.sbl.umontreal.ca/index.html&#34; target=&#34;_blank&#34;&gt;Station de Biologie des Laurentides (SBL)&lt;/a&gt; of the Université de Montréal, organized and taught by Timothée Poisot and Dominique Gravel. The station is one of the best research station I have ever been: great view, nice staffs, and excellent food! The teachers are very approachable and very knowledgeable. Classmates are very nice to each other and we had lots of fun together. For example:&lt;/p&gt;

&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Setted up a huge fire at the end of the day. &lt;a href=&#34;https://twitter.com/tpoi?ref_src=twsrc%5Etfw&#34;&gt;@tpoi&lt;/a&gt; &lt;a href=&#34;https://twitter.com/willvieira90?ref_src=twsrc%5Etfw&#34;&gt;@willvieira90&lt;/a&gt; &lt;a href=&#34;https://twitter.com/ernlarson?ref_src=twsrc%5Etfw&#34;&gt;@ernlarson&lt;/a&gt; &lt;a href=&#34;https://twitter.com/gwynmac?ref_src=twsrc%5Etfw&#34;&gt;@gwynmac&lt;/a&gt; &lt;a href=&#34;https://t.co/BaCnH5pUBp&#34;&gt;pic.twitter.com/BaCnH5pUBp&lt;/a&gt;&lt;/p&gt;&amp;mdash; Daijiang Li (@_djli) &lt;a href=&#34;https://twitter.com/_djli/status/860323102173134848?ref_src=twsrc%5Etfw&#34;&gt;May 5, 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;



&lt;p&gt;and&lt;/p&gt;

&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;Nice hiking after a day of open data/ project discussion. &lt;a href=&#34;https://twitter.com/tpoi?ref_src=twsrc%5Etfw&#34;&gt;@tpoi&lt;/a&gt; &lt;a href=&#34;https://t.co/5ZELNTKtWA&#34;&gt;pic.twitter.com/5ZELNTKtWA&lt;/a&gt;&lt;/p&gt;&amp;mdash; Daijiang Li (@_djli) &lt;a href=&#34;https://twitter.com/_djli/status/859926804995465216?ref_src=twsrc%5Etfw&#34;&gt;May 4, 2017&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;



&lt;p&gt;Thanks all for a great week.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Here is my very brief note during this one-week class.&lt;/p&gt;

&lt;h1 id=&#34;2017-05-01&#34;&gt;2017/05/01&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What is data? Observations of variables have value and unit.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;meta data: when, who, how, why, intel. property?&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Data plan? (NSF funded: data one, data life cycle &lt;a href=&#34;https://www.dataone.org/data-life-cycle&#34; target=&#34;_blank&#34;&gt;https://www.dataone.org/data-life-cycle&lt;/a&gt;) (talked about 50 mins)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;collect&lt;/li&gt;
&lt;li&gt;assure: quality control:&lt;/li&gt;
&lt;li&gt;describe: meta-data?&lt;/li&gt;
&lt;li&gt;preserve: backup, ask computer center of University; figshare, etc. Be careful with Dropbox if you have government data etc. long-term archive. Who can have access?&lt;/li&gt;
&lt;li&gt;discover: identify data you need, which not necessary collected by yourself.&lt;/li&gt;
&lt;li&gt;integrate: put different temporary/spatial scales data together&lt;/li&gt;
&lt;li&gt;analysis: overview of the data analyses to conduct.&lt;/li&gt;
&lt;/ol&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;exercise: 2-3 people/group, read a paper selected by themselves, discuss 2-3 steps of the data life cycle, how they did that? weakness? good? 20-30 mins.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Be serious about data archive/integration when applying for funding / writing grant reviewing.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004525&#34; target=&#34;_blank&#34;&gt;Ten Simple Rules for Creating a Good Data Management Plan&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097&#34; target=&#34;_blank&#34;&gt;Ten Simple Rules for Digital Data Storage&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Spreadsheet: flat files&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;type &lt;code&gt;SEP3&lt;/code&gt;, &lt;code&gt;sept03&lt;/code&gt;, or &lt;code&gt;sep03&lt;/code&gt;; and excel turned it into &lt;code&gt;3-Sep&lt;/code&gt; or &lt;code&gt;9/3/2017&lt;/code&gt;. Even save as csv file at the end, they are all &lt;code&gt;3-sep&lt;/code&gt;, not what you typed in.&lt;/li&gt;
&lt;li&gt;tidy data: every column as variable, every row as an observation.&lt;/li&gt;
&lt;li&gt;NO THINGS: no merging cells, no color, no blank cells (be explicit about missing data and other possible issues that will result in missing data), no single information (no multiple tables)&lt;/li&gt;
&lt;li&gt;dates: YYYY-MM-DD-HH-MM-SS-TZ or split into date, time, and time zone.&lt;/li&gt;
&lt;li&gt;Location coordinates: be explicit about the format.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Template: use template to input data at the beginning of projects; when explain the variables, be explicit about possible values or rules to record. For example, how to name a site; for species, use Latin names; format of dates; etc.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Exercise: everyone creates a template for their own projects. 30 mins.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;2017-05-02&#34;&gt;2017/05/02&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;OpenRefine (morning)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;explore different datasets: facets, transform of cells, filtering of rows, transform cells, explore scatter plots, e.g. &lt;code&gt;[value, cells[&amp;quot;mo&amp;quot;].value, cells[&amp;quot;dy&amp;quot;].value].join(&amp;quot;-&amp;quot;)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;input datasets by multiple urls.

&lt;ul&gt;
&lt;li&gt;json files, select &amp;ldquo;rows&amp;rdquo; instead of &amp;ldquo;records&amp;rdquo; to make life easier.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Jupyter notebook + R (afternoon)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a little bit of data manipulation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;parallel with plyr&lt;/code&gt;: &lt;code&gt;library(doMC); registerDoMC(detectCores() - 1); ddply(.parallel = T)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Book recommendation: &lt;a href=&#34;https://www.amazon.ca/Pragmatic-Programmer-Journeyman-Master/dp/020161622X&#34; target=&#34;_blank&#34;&gt;The Pragmatic Programmer: From Journeyman to Master&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;2017-05-03&#34;&gt;2017/05/03&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Morning

&lt;ul&gt;
&lt;li&gt;Group discussion about mandatory data sharing/open (for and against, 2 groups, morning 45 mins)

&lt;ul&gt;
&lt;li&gt;debate.&lt;/li&gt;
&lt;li&gt;For: drive to a better science system (system &amp;gt; individuals)&lt;/li&gt;
&lt;li&gt;Against: unfair (synthesis vs data collectors;)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Data sets and API (request/url and responses/json object); rOpenSci project/packages.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Afternoon

&lt;ul&gt;
&lt;li&gt;Discussion about possible projections till 3pm&lt;/li&gt;
&lt;li&gt;Dom gave a talk about how public data can do. (&lt;em&gt;Beyond the checklist: the biogeography of ecological interaction networks&lt;/em&gt;)

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;biogeograph&lt;/em&gt;: spatial and temporal distribution of species and abundance, including causes and consequences.&lt;/li&gt;
&lt;li&gt;the dominant conceptual tool in biogeograph: the niche.&lt;/li&gt;
&lt;li&gt;Is resource availability constant across gradients?&lt;/li&gt;
&lt;li&gt;predation pressure constant across gradients?&lt;/li&gt;
&lt;li&gt;how do covary interaction strength and pop abundance&lt;/li&gt;
&lt;li&gt;what about highly diverse communities?&lt;/li&gt;
&lt;li&gt;A community is more than a checklist&lt;/li&gt;
&lt;li&gt;how do we move from a regional meta web to a local web?&lt;/li&gt;
&lt;li&gt;revise &lt;em&gt;biogeograph&lt;/em&gt; by including species interaction&lt;/li&gt;
&lt;li&gt;Gravel et al 2011 Ecol. Lett.&lt;/li&gt;
&lt;li&gt;OBIS: marine occurrence data set.&lt;/li&gt;
&lt;li&gt;fishbase: fish characteristics.&lt;/li&gt;
&lt;li&gt;connectance very high in global Marian fish networks&lt;/li&gt;
&lt;li&gt;how do you control for data quality? &lt;em&gt;with huge datasets, the impact of errors may be not too problematic.&lt;/em&gt; More importantly, with complex pipeline of scripts, be careful about possible programming errors. &lt;em&gt;defensive programming&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;be careful about sensitivity of data analyses to data quality.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;talking about designing database.

&lt;ul&gt;
&lt;li&gt;be defensive when design: for example, set types of possible inputs (characters, small integers, etc. error control), api design (JavaScript), advantages of api: security, portability, remote working.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;2017-05-04&#34;&gt;2017/05/04&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Morning&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dom. Gravel suggested books

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;An Illustrated Guide to Theoretical Ecology&lt;/em&gt; by Ted J. Case&lt;/li&gt;
&lt;li&gt;&lt;em&gt;The Theoretical Biologist&amp;rsquo;s Toolbox: Quantitative Methods for Ecology and Evolutionary Biology&lt;/em&gt; by Marc Mangel&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Rational data bases&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;advantages: efficiency, security, remove redundancy, faster query, allow multiple users work on the dataset at the same time&lt;/li&gt;

&lt;li&gt;&lt;p&gt;SQL: structural query language&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SELECT sphote AS host, sppar AS parasite, COUNT(sppar) AS number, AVG(a) AS a
FROM morphometry 
WHERE host is &amp;quot;Disa&amp;quot;
GROUP BY sphote, sppar
HAVING number &amp;gt; 3
ORDER BY number DESC
LIMIT 4
&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://www.datacarpentry.org/sql-ecology-lesson/&#34; target=&#34;_blank&#34;&gt;SQL ecology data carpentry&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Afternoon&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brief about projects to work on. (4 projects, and I work on my own project)&lt;/li&gt;
&lt;li&gt;Work on projects&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;2017-05-05&#34;&gt;2017/05/05&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Morning

&lt;ul&gt;
&lt;li&gt;Git/Github: common words e.g. repository, stage, commit, branch, merge&lt;/li&gt;
&lt;li&gt;License: &lt;a href=&#34;https://choosealicense.com/appendix/&#34; target=&#34;_blank&#34;&gt;choose a license&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;github commit emoji &lt;code&gt;📚 comments here&lt;/code&gt;; &lt;code&gt;✨ comments&lt;/code&gt; &lt;a href=&#34;https://github.com/slashsBin/styleguide-git-commit-message&#34; target=&#34;_blank&#34;&gt;list of emoji&lt;/a&gt; 🐛&lt;/li&gt;
&lt;li&gt;Illustrate collaboration via github&lt;/li&gt;
&lt;li&gt;Optimization coding

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://dirk.eddelbuettel.com/papers/useR2010hpcTutorialHandout.pdf&#34; target=&#34;_blank&#34;&gt;R high performance tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://adv-r.had.co.nz/Profiling.html#performance-profiling&#34; target=&#34;_blank&#34;&gt;Hadley&amp;rsquo;s optimising code chapter&lt;/a&gt;: went through the R code a little bit.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Afternoon

&lt;ul&gt;
&lt;li&gt;work on project.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;2017-05-06&#34;&gt;2017/05/06&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Work on project the whole day.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;2017-05-07&#34;&gt;2017/05/07&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Morning

&lt;ul&gt;
&lt;li&gt;Work on project; started group presentations at 10:30am, till 12pm.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Afternoon

&lt;ul&gt;
&lt;li&gt;Back to Montreal at 3:30pm.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/05/09/notes-of-data-driven-ecological-synthesis/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>R packages installation issues</title>
      <link>https://blog.dlilab.com/en/2017/04/29/r-packages-installation-issues/</link>
      <pubDate>Sat, 29 Apr 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/04/29/r-packages-installation-issues/</guid>
      <description>
        

&lt;p&gt;Some R packages that require installation from source are hard to install. Here, I just record some of the problems and solutions I have came acroos when installing R packages on macOS.&lt;/p&gt;

&lt;h2 id=&#34;rgdal-package&#34;&gt;&lt;code&gt;rgdal&lt;/code&gt; package&lt;/h2&gt;

&lt;p&gt;It is kinda annoying to install this package. But I find &lt;a href=&#34;http://stackoverflow.com/a/26836125/3120725&#34; target=&#34;_blank&#34;&gt;this answer&lt;/a&gt; to be helful for me to install it.&lt;/p&gt;

&lt;p&gt;Basically, in terminal, install &lt;code&gt;GDAL&lt;/code&gt; first, which will take a while:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;brew install --with-postgresql gdal
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then in R:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;install.packages(&#39;rgdal&#39;, type = &amp;quot;source&amp;quot;, configure.args=c(&#39;--with-proj-include=/usr/local/include&#39;,&#39;--with-proj-lib=/usr/local/lib&#39;))
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;sf-package&#34;&gt;&lt;code&gt;sf&lt;/code&gt; package&lt;/h2&gt;

&lt;p&gt;According to its github &lt;a href=&#34;https://github.com/edzer/sfr&#34; target=&#34;_blank&#34;&gt;readme file&lt;/a&gt;, we may be able to install binary package for &lt;code&gt;sf&lt;/code&gt;. But this is not the case for me today. This may because that &lt;code&gt;R 3.4.0&lt;/code&gt; just released and they did not prepare a binary version on CRAN yet. So, I still need to install from source. In its readme file, we need to do this in terminal first (takes a while, ~10 minutes):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;brew unlink gdal
brew tap osgeo/osgeo4mac &amp;amp;&amp;amp; brew tap --repair
brew install proj 
brew install geos 
brew install udunits
brew install gdal2 --with-armadillo --with-complete --with-libkml --with-unsupported
brew link --force gdal2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then we can go to R and install it normally.&lt;/p&gt;

&lt;h2 id=&#34;to-be-updated&#34;&gt;to be updated&lt;/h2&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/04/29/r-packages-installation-issues/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Clipping shape files in R</title>
      <link>https://blog.dlilab.com/en/2017/04/12/clipping-shape-files-in-r/</link>
      <pubDate>Wed, 12 Apr 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/04/12/clipping-shape-files-in-r/</guid>
      <description>
        &lt;p&gt;Suppose we have two shape files: one larger (e.g. shapefile of ecoregions of North American) and one smaller (e.g. shapefile of US lower states). How can we get the shapefile of ecoregions for only the US lower states?&lt;/p&gt;

&lt;p&gt;After a little bit searching &lt;sup class=&#34;footnote-ref&#34; id=&#34;fnref:mainly-this-post&#34;&gt;&lt;a rel=&#34;footnote&#34; href=&#34;#fn:mainly-this-post&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;, I came with the following R function:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(rgeos)
library(sp)
clip_shp = function(small_shp, large_shp){
   # make sure both have the same proj
  large_shp = spTransform(large_shp, CRSobj = CRS(proj4string(small_shp)))
  cat(&amp;quot;About to get the intersections, will take a while...&amp;quot;, &amp;quot;\n&amp;quot;)
  clipped_shp = rgeos::gIntersection(small_shp, large_shp, byid = T, drop_lower_td = T)
  cat(&amp;quot;Intersection done&amp;quot;, &amp;quot;\n&amp;quot;)
  x = as.character(row.names(clipped_shp))
  # these are the data to keep, can be duplicated
  keep = gsub(pattern = &amp;quot;^[0-9]{1,2} (.*)$&amp;quot;, replacement = &amp;quot;\\1&amp;quot;, x)
  large_shp_data = as.data.frame(large_shp@data[keep,])
  row.names(clipped_shp) = row.names(large_shp_data)
  clipped_shp = spChFIDs(clipped_shp, row.names(large_shp_data))
  # combine and make SpatialPolygonsDataFrame back
  clipped_shp = SpatialPolygonsDataFrame(clipped_shp, large_shp_data)
  clipped_shp
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;By running &lt;code&gt;clip_shp()&lt;/code&gt; function, we will return a shapefile of the intersections between the two input files &lt;sup class=&#34;footnote-ref&#34; id=&#34;fnref:Of-course-you-ne&#34;&gt;&lt;a rel=&#34;footnote&#34; href=&#34;#fn:Of-course-you-ne&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Another problem is that such kind of shapefiles are too large to plot. &lt;code&gt;ggplot()&lt;/code&gt; may run forever with the data frame fortified from the shapefile. One solution is to first convert the shapefile into a data frame, then thin the data frame. Simply using &lt;code&gt;dplyr::sample_frac()&lt;/code&gt; won&amp;rsquo;t work though. Here is a function I wrote (though kind of slow):&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# the larger the tol is, the less rows the result will have
thin = function(x, tol = 0.01){
  id = unique(x$id)[1]
  x1 = x[, 1:2]
  names(x1) = c(&amp;quot;x&amp;quot;, &amp;quot;y&amp;quot;)
  x2 &amp;lt;-shapefiles::dp(x1, tol)
  data.frame(long = x2$x, lat = x2$y, id = id)
}

library(ggplot2)
library(dplyr)
# convert shapefile to data frame
shp_df = fortify(shp, region = &amp;quot;NAME&amp;quot;) # change the region accordingly
# for each group, thin it
shp_df_thin = select(shp_df, long, lat, id, group) %&amp;gt;%
  group_by(group) %&amp;gt;%
  do(thin(., tol = 0.02))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then we can use the thinned data frame to happily/fastly plot with &lt;code&gt;ggplot()&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;ggplot(data = shp_df_thin) + 
  geom_polygon(aes(x = long, y = lat, group = group), 
               color = &amp;quot;black&amp;quot;, fill = &amp;quot;white&amp;quot;) +
  coord_map() 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Post here in case it will be helpful (to someone else or future myself).&lt;/p&gt;
&lt;div class=&#34;footnotes&#34;&gt;

&lt;hr /&gt;

&lt;ol&gt;
&lt;li id=&#34;fn:mainly-this-post&#34;&gt;mainly this post: &lt;a href=&#34;https://philmikejones.wordpress.com/2015/09/01/clipping-polygons-in-r/&#34; target=&#34;_blank&#34;&gt;https://philmikejones.wordpress.com/2015/09/01/clipping-polygons-in-r/&lt;/a&gt; &lt;a class=&#34;footnote-return&#34; href=&#34;#fnref:mainly-this-post&#34;&gt;↩&lt;/a&gt;&lt;/li&gt;
&lt;li id=&#34;fn:Of-course-you-ne&#34;&gt;Of course, you need to read them first into R. E.g. &lt;code&gt;small_shp = rgdal::readOGR(&amp;quot;path/to/file&amp;quot;, layer = &amp;quot;file_name&amp;quot;)&lt;/code&gt; &lt;a class=&#34;footnote-return&#34; href=&#34;#fnref:Of-course-you-ne&#34;&gt;↩&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/04/12/clipping-shape-files-in-r/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Writing Academic Papers with Rmarkdown</title>
      <link>https://blog.dlilab.com/en/2017/04/05/writing-academic-papers-with-rmarkdown/</link>
      <pubDate>Wed, 05 Apr 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/04/05/writing-academic-papers-with-rmarkdown/</guid>
      <description>
        

&lt;blockquote&gt;
&lt;p&gt;TL;DR: Rmarkdown and bookdown are awesome; you should use it to write papers; and &lt;a href=&#34;https://github.com/daijiang/workflow_demo&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt; is a minimal example.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have been using LaTex for most of the &lt;a href=&#34;https://blog.dlilab.com/resume&#34;&gt;papers&lt;/a&gt; I have published so far (admittedly not that many), even though &lt;em&gt;all&lt;/em&gt; of my co-authors use Microsoft Word. Why? Several reasons for this.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When wrting, we should only focus on the content, not worrying about the typesetting, which we will take care later. Word, on the other hand, allows you to see what you get when you write. This makes people (me at least) hard to ignore the typesetting when writing.&lt;/li&gt;
&lt;li&gt;It is hard to update the figures and pictures inserted into the manuscript in Word. You need delete old ones and insert new ones whenever your figures are updated. Of course, you can say that do not insert figures until the submission. But wouldn&amp;rsquo;t it be easier to revise the manuscript when figures are included in the main text? Using LaTex, I can just put the path of figures there and not worry about replace them in the main text.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Literate_programming&#34; target=&#34;_blank&#34;&gt;Literature programming:&lt;/a&gt; LaTex allows us to mix code with text in the same file, which increases the reproducibility and decreases potential errors.&lt;/li&gt;
&lt;li&gt;Cross-references is easy in LaTex (just &lt;code&gt;\label&lt;/code&gt; and &lt;code&gt;\ref&lt;/code&gt;). With Word, it is painful to get the same thing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, LaTex has its learning curve and quirks. And even though it intends to make people to focus on content, we usually spend lots of time fighting with things like floats. Not to mention the collaboration barrier betwen its users to Word users. When I finished a draft of my paper, I need to convert it to Word using pandoc so my advisor can edit. Doing it this way, however, figures and tables are usually messed up, as well as cross-references. Tables will be just LaTex source codes there; cross-references will be replaced with their labels (e.g. see Table tab-labels instead of see Table 1). So everytime, I need to write something like &amp;ldquo;please do not care about the typesetting&amp;rdquo; in the email to my advisor.&lt;/p&gt;

&lt;p&gt;Until recently, I found that the convertion from LaTex and Rmarkdown to Word is reasonably good, thanks to &lt;a href=&#34;https://bookdown.org/yihui/bookdown/&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;bookdown&lt;/code&gt;&lt;/a&gt; by &lt;a href=&#34;www.yihui.name&#34; target=&#34;_blank&#34;&gt;Yihui&lt;/a&gt;. I just finished my first manuscript written in Rmarkdown 100%. Both my advisor and I are quite happy with it. Therefore, in this post, I am going to talk briefly the process of writing academic papers with Rmarkdown.&lt;/p&gt;

&lt;h1 id=&#34;markdown-and-rmarkdown&#34;&gt;Markdown and Rmarkdown&lt;/h1&gt;

&lt;p&gt;First, you need to know a little bit about the &lt;a href=&#34;https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf&#34; target=&#34;_blank&#34;&gt;syntax&lt;/a&gt;. Don&amp;rsquo;t worry, I am sure you will get it in five minutes. If you use Rstudio, this can be found under &amp;ldquo;Help&amp;rdquo; menu.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/U8RHPbm.png&#34; alt=&#34;rmarkdown references&#34; /&gt;&lt;/p&gt;

&lt;h1 id=&#34;packages-needed&#34;&gt;Packages needed&lt;/h1&gt;

&lt;p&gt;In this post, I have installed the following R packages: &lt;code&gt;bookdown&lt;/code&gt;, &lt;code&gt;rmarkdown&lt;/code&gt;, &lt;code&gt;tufte&lt;/code&gt;, and &lt;code&gt;knitr&lt;/code&gt;. If you do not want to produce pdf files, then you are ready to go. If you need pdf files, then you need to install Latex. Under Windows and Linus, &lt;a href=&#34;https://www.tug.org/texlive/&#34; target=&#34;_blank&#34;&gt;Texlive&lt;/a&gt; is good; under Mac, &lt;a href=&#34;http://www.tug.org/mactex/&#34; target=&#34;_blank&#34;&gt;Mactex&lt;/a&gt; is available. If you do not mind the time to download and install, I recommend to install the full version, which includes all LaTex packages.&lt;/p&gt;

&lt;h1 id=&#34;writing-with-rmarkdown&#34;&gt;Writing with Rmarkdown&lt;/h1&gt;

&lt;p&gt;After installing all dependencies, we can open Rstudio (or any text editor) and start writing. I use Rstudio to start the file and work with R code chucks. For the remaining, however, I use Sublime text or Atom. This is mainly because the lack of distraction free function in Rstudio.&lt;/p&gt;

&lt;h2 id=&#34;yaml-head&#34;&gt;Yaml head&lt;/h2&gt;

&lt;p&gt;Here is the Yaml head I am using now:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-yaml&#34;&gt;---
title: Your awesome tile
author: &amp;quot;Author one and Author Two&amp;quot;
date: &#39;`r format(Sys.time(), &amp;quot;%d %B, %Y&amp;quot;)`&#39;
output:
  bookdown::tufte_html2:
    number_sections: no
    toc: yes
  bookdown::word_document2: null
  bookdown::pdf_document2:
    includes:
      before_body: doc_prefix.tex
      in_header: preamble.tex
    keep_tex: yes
    latex_engine: xelatex
    number_sections: no
    toc: no
bibliography: path/to/ref.bib
fontsize: 12pt
link-citations: yes
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/global-ecology-and-biogeography.csl
---
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A few notes here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I use the &lt;code&gt;bookdown::pdf_document2&lt;/code&gt;, and other &lt;code&gt;bookdown::...document2&lt;/code&gt;, which allow cross-references and other features possible.&lt;/li&gt;
&lt;li&gt;For pdf files, I included some tex files (&lt;code&gt;preamble.tex&lt;/code&gt;, which includes some packages I use, e.g. lineno to add line numbers; and &lt;code&gt;doc_prefix.tex&lt;/code&gt;, which allows text align at left only).&lt;/li&gt;
&lt;li&gt;References are put in the bib file. You can use common reference management software to create a bib file. Or you can search through Google Scholar and click on &lt;code&gt;cite&lt;/code&gt; under the paper and choose &lt;code&gt;bibtex&lt;/code&gt; form, then copy and paste the information into a bib file.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;link-citations: yes&lt;/code&gt; allows you click on a citation/table/figure and jumpo to the corresponding location.&lt;/li&gt;
&lt;li&gt;csl files are journal style files and can be a url like here or a local file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;r-chunks&#34;&gt;R chunks&lt;/h2&gt;

&lt;p&gt;In my &lt;a href=&#34;https://github.com/daijiang/workflow_demo&#34; target=&#34;_blank&#34;&gt;project set up&lt;/a&gt;, I have &lt;code&gt;.Rproj&lt;/code&gt; file in the project folder, then I have &lt;code&gt;R&lt;/code&gt;, &lt;code&gt;Doc&lt;/code&gt;, etc. folders. R scripts are located within the &lt;code&gt;R&lt;/code&gt; folder while the Rmarkdown file for the manuscript is in the &lt;code&gt;Doc&lt;/code&gt; folder. By default, when you knit the Rmarkdown file with Rstudio, it will treat the folder where the Rmarkdown file seated as work space even though the &lt;code&gt;.Rproj&lt;/code&gt; file is in the folder one level up. This is a little bit annoying because the path will be different if you click the &lt;code&gt;knit&lt;/code&gt; button from running part of the chunks within Rstudio.&lt;/p&gt;

&lt;p&gt;To let Rstudio know that we want use the parent folder as the work space, we can add this chunk at the beginning of the file:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;```{r knitr_options, echo=FALSE}
library(knitr)
opts_knit$set(root.dir = normalizePath(&amp;quot;../&amp;quot;))
```
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then in a &lt;em&gt;separate&lt;/em&gt; R code chunk, we can use source R script with &lt;code&gt;source(&amp;quot;R/script.r&amp;quot;)&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&#34;citations&#34;&gt;Citations&lt;/h2&gt;

&lt;p&gt;After put the sources you want to cite in a &lt;code&gt;.bib&lt;/code&gt; file, we can cite them in the main text. The idea is that each source will have one unique key, and you can cite it with the key. See the &lt;a href=&#34;http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html&#34; target=&#34;_blank&#34;&gt;rmarkdown website&lt;/a&gt; for details and examples.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Here is a statement [@key1; @key2].
@key3 did something.
Some examples [e.g. @key4; @key5; but see @key6]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/EbKvlDr.png&#34; alt=&#34;example of citaitons in markdown&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;cross-references&#34;&gt;Cross-references&lt;/h2&gt;

&lt;h3 id=&#34;tables&#34;&gt;Tables&lt;/h3&gt;

&lt;p&gt;Insert tables by &lt;code&gt;knitr::kable&lt;/code&gt; function (&lt;code&gt;::&lt;/code&gt; tells that the &lt;code&gt;kable&lt;/code&gt; function is from &lt;code&gt;knitr&lt;/code&gt; package in R. Then cross-reference it back with: &lt;code&gt;see Table \@ref(tab:tableName)&lt;/code&gt;, which will return something like &lt;code&gt;see Table 1&lt;/code&gt;. The number of the table will depends on its order in the manuscript, therefore, whenever you reorder your tables, you do not need to worry about change their numbers by hand. And the R code chunk for the table looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;```{r tableName1,results=&#39;asis&#39;, echo=F}
knitr::kable(mtcars[1:5, 1:5], booktabs = T, caption = &amp;quot;Caption here.&amp;quot;)
```
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The awesome &lt;a href=&#34;https://CRAN.R-project.org/package=kableExtra&#34; target=&#34;_blank&#34;&gt;R package &lt;code&gt;kableExtra&lt;/code&gt;&lt;/a&gt; can be used to customize tables and figures. When render into Word document, if you used &lt;code&gt;library(kableExtra)&lt;/code&gt; earlier, this may &lt;a href=&#34;https://github.com/haozhu233/kableExtra/issues/477&#34; target=&#34;_blank&#34;&gt;mess up the tables&lt;/a&gt;. I added the following code in the set up chunk so that the package won&amp;rsquo;t be loaded if knit to Word document.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;if(knitr::is_latex_output() | knitr::is_html_output()){
  library(kableExtra)
} else {
  options(kableExtra.auto_format = FALSE) # for docx
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;figures&#34;&gt;Figures&lt;/h3&gt;

&lt;p&gt;Figures are very similar to cross-refer with tables. Basically, you use &lt;code&gt;Figure \@ref(fig:figName)&lt;/code&gt; to refer to it. And you put the lable (&lt;code&gt;figName&lt;/code&gt; here) and caption in the R code chunk:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;```{r figName, fig.width=7, fig.asp=1, fig.cap=&amp;quot;Your caption here.&amp;quot;}
plot(x, y)
```
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;See more examples in the &lt;a href=&#34;https://github.com/daijiang/workflow_demo/blob/master/Doc/ms.Rmd&#34; target=&#34;_blank&#34;&gt;github file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For complex figure or table captions, we can use &lt;a href=&#34;https://bookdown.org/yihui/bookdown/markdown-extensions-by-bookdown.html#text-references&#34; target=&#34;_blank&#34;&gt;text references&lt;/a&gt;. But make sure that &lt;em&gt;no space at the end of the text&lt;/em&gt;! Otherwise, the captions won&amp;rsquo;t be repaced by the text references in Word document.&lt;/p&gt;

&lt;p&gt;These pretty much cover most of the common features of scientific writing: citations, cross-references, tables, figures. You can checkout &lt;a href=&#34;https://bookdown.org/yihui/bookdown/&#34; target=&#34;_blank&#34;&gt;&lt;code&gt;bookdown&lt;/code&gt;&lt;/a&gt; website for details. Even though &lt;code&gt;bookdown&lt;/code&gt; is made for writing books, it is actually very good for writing papers too!&lt;/p&gt;

&lt;p&gt;How do you set up Rmarkdown for writing? What tips do you have? Issues? Comments are very welcome. Or even better, you can click the &lt;code&gt;pen on paper&lt;/code&gt; button at the top right of the post and edit it.
&lt;img src=&#34;https://i.imgur.com/INZSdHa.png?1&#34; alt=&#34;edit this page&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Thanks for reading and commenting!&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/04/05/writing-academic-papers-with-rmarkdown/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Updating website with Hugo and Blogdown</title>
      <link>https://blog.dlilab.com/en/2017/03/30/updating-website-with-hugo-and-blogdown/</link>
      <pubDate>Thu, 30 Mar 2017 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2017/03/30/updating-website-with-hugo-and-blogdown/</guid>
      <description>
        

&lt;p&gt;My personal website has been full of weeds since I did not update it for a really long time. As I am trying to put together a package to apply for jobs, I finally get some time to update my website. The previous version of my website was build with &lt;code&gt;Jekyll&lt;/code&gt;. However, it is a bit slow and whenever I want to creat a new post, I need to type the yaml head. (Yes, you can set up a snippet, but&amp;hellip;). Finally, &lt;a href=&#34;https://yihui.name&#34; target=&#34;_blank&#34;&gt;Yihui&lt;/a&gt; wrote an awesome R package &lt;a href=&#34;https://github.com/rstudio/blogdown&#34; target=&#34;_blank&#34;&gt;blogdown&lt;/a&gt; to creat personal website with &lt;code&gt;Hugo&lt;/code&gt; and &lt;code&gt;Rstudio&lt;/code&gt;, which makes it so much easier to update your website and to publish new blog posts. Here is a post to briefly record how I have done it. When it is not clear, the best way is to look at source code &lt;a href=&#34;https://github.com/rbind/yihui.name&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt; or &lt;a href=&#34;https://github.com/daijiang/website_hugo_source&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&#34;install-hugo-and-create-your-site&#34;&gt;Install Hugo and create your site&lt;/h2&gt;

&lt;p&gt;First, go to the &lt;code&gt;blogdown&lt;/code&gt; webpage and install it. For Mac users, you may need to install &lt;a href=&#34;https://brew.sh&#34; target=&#34;_blank&#34;&gt;homebrew&lt;/a&gt; first, which you definitely should.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;devtools::install_github(&#39;rstudio/blogdown&#39;)
blogdown::new_site()
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then go to the new website folder, make it to be a git repository with &lt;code&gt;git init&lt;/code&gt;. You should also make a &lt;code&gt;.gitignore&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Yihui&amp;rsquo;s modified &lt;code&gt;hugo-lithium-theme&lt;/code&gt; is simple and good, thus I decide to use it. You can install it with&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;blogdown::install_theme(&#39;yihui/hugo-lithium-theme&#39;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I did not install it this way, instead, I used&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git submodule add git@github.com:yihui/hugo-lithium-theme.git themes/hugo-lithium-theme
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This will clone the theme into your themes folder locally but won&amp;rsquo;t copy it when you push it into github (because of the &lt;code&gt;.gitmodules&lt;/code&gt; file created).&lt;/p&gt;

&lt;h2 id=&#34;tweak-your-website&#34;&gt;Tweak your website&lt;/h2&gt;

&lt;p&gt;Now you can put your old posts and webpages into the &lt;code&gt;content&lt;/code&gt; folder. If you are familiar with CSS and html, then it should be straightforward.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;config.yaml&lt;/code&gt; is the first file to change and it is self-explaining;&lt;/li&gt;
&lt;li&gt;logo picture should go into &lt;code&gt;static/images/logo.png&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;CNAME file also into &lt;code&gt;static&lt;/code&gt;, then Hugo will copy it into &lt;code&gt;public&lt;/code&gt; folder, which is the generated website at the end;

&lt;ul&gt;
&lt;li&gt;anything within &lt;code&gt;static&lt;/code&gt; folder will be copied to &lt;code&gt;public&lt;/code&gt; as is: i.e. &lt;code&gt;static/images/fig1.jpg&lt;/code&gt; will be copied as &lt;code&gt;public/images/fig1.jpg&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;tweak files in &lt;code&gt;layout&lt;/code&gt; folder, e.g. &lt;code&gt;partials/footer.html&lt;/code&gt; to change footers of your website;&lt;/li&gt;
&lt;li&gt;if you want to close comments on some pages, put &lt;code&gt;disable_comments: true&lt;/code&gt; in the yaml head.&lt;/li&gt;
&lt;li&gt;because my previous posts have slightly different syntax, I need to update them one by one using some &lt;a href=&#34;https://github.com/daijiang/website_hugo_source/blob/master/R/clean_blogs.R&#34; target=&#34;_blank&#34;&gt;R code&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you use Rstudio addins to creat a new post, set &lt;code&gt;options(blogdown.subdir = &amp;quot;content&amp;quot;)&lt;/code&gt; first, so then if you select subdirectory as &lt;code&gt;en&lt;/code&gt; and title as &lt;code&gt;title&lt;/code&gt;, then Rstudio will create the file as &lt;code&gt;content/en/date-title.md&lt;/code&gt;, otherwise, it will create as &lt;code&gt;content/post/en/date-title.md&lt;/code&gt;.

&lt;ul&gt;
&lt;li&gt;Now, you can just open Rstudio and start to write your blogs!&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;publish-your-website&#34;&gt;Publish your website&lt;/h1&gt;

&lt;h2 id=&#34;approach-1-push-website-into-github&#34;&gt;Approach 1: Push website into Github&lt;/h2&gt;

&lt;p&gt;Now, you need to create two repositories at Github: one to host the hugo folder and one to host the generated website (i.e. the &lt;code&gt;public&lt;/code&gt; folder). Suppose you have two repositories now: &lt;code&gt;username.github.io&lt;/code&gt; (to host generated website) and &lt;code&gt;website&lt;/code&gt; (to host hugo code). Within your website folder:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;rm -rf public # do not worry
git remote add origin https://github.com/username/website.git
git submodule add -f https://github.com/username/username.github.io.git public
# push website source code to github
git commit -am &amp;quot;Initial commit&amp;quot;
git push -u origin master
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, you can generate the website again, either use Rstudio or terminal.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;hugo
cd public
git add .
git commit -m &amp;quot;Build website&amp;quot;
git push -u origin master
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;approach-2-use-netlify&#34;&gt;Approach 2: Use Netlify&lt;/h2&gt;

&lt;p&gt;The free plan of &lt;a href=&#34;www.netlify.com&#34; target=&#34;_blank&#34;&gt;netlify&lt;/a&gt; can meet all my requirements: build my website from the source, https, and custom domain. So I have deployed my website there. The good thing is that I do not need to push the &lt;code&gt;public&lt;/code&gt; folder to github anymore. Whenever I change the source code of my website, netlify will automatically rebuild my website for me! How cool it that?&lt;/p&gt;

&lt;h2 id=&#34;issues&#34;&gt;Issues&lt;/h2&gt;

&lt;p&gt;Here are some issues I still have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I used to have my Chinese and English blogs separated; and I have two short names for Disqus comments for them. Now I have merged these two blogs into one folder, but I cannot merge their comments too. I can only choose one shortname. Any solutions?&lt;/li&gt;
&lt;li&gt;&lt;del&gt;In this setup (submodule for &lt;code&gt;public&lt;/code&gt; folder), whenever I rebuild the site, almost all webpages in the &lt;code&gt;public&lt;/code&gt; folder changed and need to commit and push to github?? Why?&lt;/del&gt;

&lt;ul&gt;
&lt;li&gt;It turns out that Hugo will rebuild webpages that have been changed (e.g. lists of blog posts) but not all of them. So, this is not an issue anymore.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;to be updated.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;useful-links&#34;&gt;Useful links&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://georgecushen.com/create-your-website-with-hugo/&#34; target=&#34;_blank&#34;&gt;Academia theme&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://nbari.com/post/hugo-hosting/&#34; target=&#34;_blank&#34;&gt;publish website&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2017/03/30/updating-website-with-hugo-and-blogdown/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Why no p-values in mixed models</title>
      <link>https://blog.dlilab.com/en/2015/06/22/why-no-p-values-in-mixed-models/</link>
      <pubDate>Mon, 22 Jun 2015 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2015/06/22/why-no-p-values-in-mixed-models/</guid>
      <description>
        &lt;p&gt;For many traditional statistic modeling techniques such as linear models fitted by ordinary least squares (e.g. t-tests, ANOVA), we can derive exact distributions (e.g. t-distribution) for some statistics calculated from the data under null hypothesis; and then use these distributions to perform hypothesis tests on the parameters or calculate confidence intervals. It is tempting to believe that all statistical tech should provide a packaged results (e.g. p-values), but they do not. For example, you may have noted that summaries for model objects fitted with &lt;code&gt;lmer&lt;/code&gt; list standard errors and t-statistics for the fixed effects, but no p-values. This is not without reason.&lt;/p&gt;

&lt;p&gt;Early mixed-effects model methods used many approximations based on analogy to fixed effects ANOVA. For example, variance components were often estimated by calculating certain mean squares and equating the observed mean square to the corresponding expected mean square. In this way, we cannot handle multiple factors such as subjects and items associated with random effects as well as unbalanced data. Fortunately, it is now possible to evaluate the maximum likelihood or the REML estimates of the parameters in mixed-effects models (this is the case for R package &lt;code&gt;lme4&lt;/code&gt;) to move further (e.g. handle unbalanced data, nested design, crossed random effects, etc.). However, the temptation to perform hypothesis tests using t-distribution or F-distributions based on certain approximation of the degrees of freedom in these distributions persists.&lt;/p&gt;

&lt;p&gt;An exact calculation may be possible for a comparatively simple model applied to exactly balanced data set. In real world, data often are unbalanced and models can be complicated. The distribution of the test statistic when the null hypothesis does not even have t-/F-distribution (or may not even know, &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q2/000904.html&#34; title=&#34;r mail list&#34; target=&#34;_blank&#34;&gt;1&lt;/a&gt;). The formulas for the degrees of freedom for inferences based on t-/F-distributions do not apply in such cases (or even meaningless). In &lt;code&gt;lme4&lt;/code&gt;, the numerators of the F-statistics are calculated as in a linear model. The denominator is the the penalized residual sum of squares divided by the REML degrees of freedom, which is n-p where n is the number of observations and p is the column rank of the model matrix for the fixed effects &lt;a href=&#34;https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html&#34; title=&#34;explained by Douglas Bates&#34; target=&#34;_blank&#34;&gt;(Douglas Bates)&lt;/a&gt;. All the F ratios use the &lt;strong&gt;same denominator&lt;/strong&gt;. There are many approximations in use for hypothesis tests in mixed models, each leading to a different p-value, but none of them is &amp;ldquo;correct&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Links&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html&#34; target=&#34;_blank&#34;&gt;Douglas Bates&amp;rsquo; explanation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q2/000904.html&#34; target=&#34;_blank&#34;&gt;Another explanation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://glmm.wikidot.com/faq&#34; target=&#34;_blank&#34;&gt;r-sig-mixed-model-FAQ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2015/06/22/why-no-p-values-in-mixed-models/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Why ANOVA is not the choice for non-normal data</title>
      <link>https://blog.dlilab.com/en/2015/06/19/glmms-for-non-normal-data/</link>
      <pubDate>Fri, 19 Jun 2015 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2015/06/19/glmms-for-non-normal-data/</guid>
      <description>
        &lt;blockquote&gt;
&lt;p&gt;Reading notes of Stroup, Walter W., &amp;ldquo;Rethinking the Analysis of Non-Normal Data in Plant and Soil Science&amp;rdquo;, Agronomy Journal 107, 2 (2015), pp. 811.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some history: Fisher and Mackenzie (1923) published the first ANOVA results. Nelder and Wedderburn (1972) introduced generalized linear models, a major departure in approaching non-normal data. Breslow and Clayton (1993) and Wolfinger and O&amp;rsquo;Connell (1993) integrated mixed models and generalized linear mode theory and methods. The following two decades saw intense development of GLMM theory and methods.&lt;/p&gt;

&lt;p&gt;ANOVA rests on &lt;em&gt;three&lt;/em&gt; assumptions: independent observations (vs correlated observations), normally distributed data (vs non-normal data), and homogeneous variance (vs heterogeneous variance). However, non-normal data are common in most cases, e.g. count (&lt;em&gt;Poission&lt;/em&gt; or &lt;em&gt;Negative binomial&lt;/em&gt;), time of flowing (&lt;em&gt;Exponential&lt;/em&gt; or &lt;em&gt;Gamma&lt;/em&gt;), continuous proportion such as leaf area affected (&lt;em&gt;Beta&lt;/em&gt;), quadrats observed out of &lt;em&gt;n&lt;/em&gt; quadrats (&lt;em&gt;Binomial&lt;/em&gt;). For all non-normal distributions, their variance depend on the mean. Thus, if data are non-normal, chances are their variance are not homogeneous. Traditionally, the Central Limit Theorem assures that sampling distribution of means will approximately normal if sample size is large enough. Standard &lt;em&gt;variance-stabilizing&lt;/em&gt; transformations are used to deal with heterogeneous variances, e.g. &lt;code&gt;log(count + 1)&lt;/code&gt;, &lt;code&gt;sqrt(small_count + 3/8)&lt;/code&gt;, &lt;code&gt;count^(2/3)&lt;/code&gt;, &lt;code&gt;asin(sqrt(proportion))&lt;/code&gt;. GLMMs extended the linear model theory to accommodate data the may be non-normal, have heterogeneous variance, and be correlated. On the GLMMs point of view, ANOVA is antiquated or even obsolete.&lt;/p&gt;

&lt;p&gt;Stroup (2015) showed that ANOVA with untransformed and log-/sqrt-transformed count data and GLMM all control Type I error adequately, but GLMMs have more power to detect treatment differences; for discrete proportion data, untransformed ANOVA yields estimates of the marginal &lt;code&gt;\(p_i\)&lt;/code&gt; but not the correct standard errors, the GLMM yields estimates of the conditional &lt;code&gt;\(p_i\)&lt;/code&gt; and correct standard errors, the arc sine transformed ANOVA does not provide estimates of either.&lt;/p&gt;

&lt;p&gt;Take a binomial example: the &lt;em&gt;i&lt;/em&gt;th treatment in the &lt;em&gt;j&lt;/em&gt;the block with &lt;code&gt;\(N_{ij}\)&lt;/code&gt; yes-no observations and probability &lt;code&gt;\(p_{ij}\)&lt;/code&gt; of a yes response on any given &lt;em&gt;ij&lt;/em&gt;th observation unit. Three distributions relevant to the analysis of these experimental data.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The distribution of block effects (random effects). Blocking is a design strategy to ensure that units within blocks are as similar as possible. Variability among blocks are expected and we assume the blocks are representative of blocks we could have used. Thus variation among blocks is assumed to be a normal distribution: &lt;code&gt;\(b_j\sim NI(0,\sigma_{B}^{2})\)&lt;/code&gt; (normal and independently).&lt;/li&gt;
&lt;li&gt;The distribution at the unit level: observations in the &lt;em&gt;ij&lt;/em&gt; unit ~ &lt;code&gt;\(Binomial(N,p_{ij})\)&lt;/code&gt;. This distribution conditional on the random effects. &lt;code&gt;\(y_{ij}|b_j\sim Binomial(N,p_{ij})\)&lt;/code&gt;: the distribution of the observations, conditional on the observation being in the &lt;em&gt;j&lt;/em&gt;th block, is binomial distributed (with N and &lt;code&gt;\(p_{ij}\)&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The actually observed distribution: the marginal distribution. When we say we have binomial data, we are referring to the distribution of the observations conditional on the &lt;em&gt;ij&lt;/em&gt;th unit. The distribution of observed data&amp;ndash;the marginal distribution&amp;ndash;is most likely not binomial distributed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;The first two distributions, we cannot observed directly. The only distribution we observed is the third one.&lt;/em&gt; This is not an issue if the first two are normal distributions as the third will also be normal. For all other non-normal data, the marginal distribution of the observed data is quite different. Our usual intuitions can betray and mislead. The &lt;strong&gt;fundamental&lt;/strong&gt; problem of analyzing non-normal data is that what we want to estimate or test (in this example, the treatment effects on &lt;code&gt;\(p_{ij}\)&lt;/code&gt; of binomial data) involves parameters of distributions that we &lt;em&gt;cannot&lt;/em&gt; directly observed. In another word, the information we want are camouflaged in a complex observed marginal distribution. GLMMs can extract the information we want from the observations we have but not ANOVA and regression.&lt;/p&gt;

&lt;p&gt;The GLMM conditional estimate asks: &amp;ldquo;if I take an average number of the population, which means a member of the population whose block effect &lt;code&gt;\(b_j=0\)&lt;/code&gt;, what is the estimated binomial probability?&amp;rdquo; (think about median value). The marginal estimate asks: &amp;ldquo;if I average across all the members of the population, what is the mean proportion?&amp;rdquo; (think about mean value). Which one to use depends on your questions.&lt;/p&gt;

&lt;p&gt;Stroup (2015) argues for binomial data, ANOVA with or without transformation should be considered unacceptable for publication. If the &lt;em&gt;marginal mean&lt;/em&gt; best address the research objectives, the correct approach requires an alternative formulation of the GLMM, that is generalized estimating equations (GEEs, Zeger et al. 1988). GEE replaces random effects in the linear predictor with working variance and correlation and replaces the distribution with a quasi-likelihood. Assuming equal N for all experimental units, the beta GLMM is the preferred method if the marginal mean is the appropriate target. For unequal N, use the GEE.&lt;/p&gt;

&lt;p&gt;In sum, Stroup&amp;rsquo;s (2015) main take-home message: for non-normal data, ANOVA, with or without transformed data, won&amp;rsquo;t work. The loss of accuracy and power are too great. GLMMs and, in some cases, GEEs are the methods of choice.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2015/06/19/glmms-for-non-normal-data/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Youtube view counts of Linear Algebra lectures</title>
      <link>https://blog.dlilab.com/en/2015/06/01/youtube_view_counts/</link>
      <pubDate>Mon, 01 Jun 2015 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2015/06/01/youtube_view_counts/</guid>
      <description>
        &lt;p&gt;I am learning linear algebra these days by watching the excellent series of lectures taught by Prof. Gilbert Strang at &lt;a href=&#34;https://www.youtube.com/playlist?list=PLE7DDD91010BC51F8&#34; target=&#34;_blank&#34;&gt;Youtube&lt;/a&gt;. During this journey, I think it would be interesting to look how many view count for all lectures. I expect the view counts will decline for later lectures.&lt;/p&gt;

&lt;p&gt;Alright, first load some R packages in order to get data from Youtube.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(plyr)
library(dplyr)
library(rvest) # for webpage scripting
library(stringr) # string handling
library(ggplot2) # plotting
library(knitr)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then I searched online to find out the url of the playlist for all lectures. To find the correct CSS part, I followed &lt;a href=&#34;http://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html&#34; target=&#34;_blank&#34;&gt;this tutorial&lt;/a&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# the playlist first
url = html(&amp;quot;https://www.youtube.com/playlist?list=PLE7DDD91010BC51F8&amp;quot;)
lectures = html_nodes(url, &amp;quot;.yt-uix-tile-link&amp;quot;)
# length(lectures) # 35 vedio
# get lecture names
lec_names = html_text(lectures) %&amp;gt;% 
  sapply(function(x) str_replace(x, &amp;quot;^.*Lec ([b0-9]*) .*&amp;quot;, &amp;quot;\\1&amp;quot;)) %&amp;gt;% 
  unname() %&amp;gt;% as.character()
lec_names[lec_names == &amp;quot;24b&amp;quot;] = 24.5
lec_names = as.numeric(lec_names)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, get urls for all lectures and extract their view counts.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# get url for all lectures
url_all = ldply(lectures, function(x){
  paste0(&amp;quot;https://www.youtube.com&amp;quot;, html_attr(x, name = &amp;quot;href&amp;quot;))
})

# for each lecture, get the view count
view_all = sapply(url_all$V1, function(x){
  print(x)
  xx = html(x)
  view_count = html_nodes(xx, &amp;quot;.watch-view-count&amp;quot;) %&amp;gt;% html_text() %&amp;gt;%
    gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;, .) %&amp;gt;% 
    as.numeric()
  lect_descrip = html_nodes(xx, &amp;quot;#eow-description&amp;quot;) %&amp;gt;% html_text() %&amp;gt;% 
    gsub(&amp;quot;^(.*)View the complete.*$&amp;quot;, &amp;quot;\\1&amp;quot;, .) %&amp;gt;% str_trim()
  print(lect_descrip)
  list(view_count, as.character(lect_descrip))
})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, combine lecture names with their view counts.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# combine lecture names with view count
view = unlist(view_all[(1:length(view_all)) %% 2 == 1])
# remove some notes that start with *.
descrip = unlist(view_all[(1:length(view_all)) %% 2 == 0])
descrip = sapply(descrip, function(x){
 if(str_detect(x, &amp;quot;\\*&amp;quot;)){
   str_replace(x, &amp;quot;^(.*)\\*+.*$&amp;quot;, &amp;quot;\\1&amp;quot;)
 } else{
   x
 }
})
dat = data_frame(lec = lec_names, view = view, description = descrip)
kable(data.frame(lec = lec_names, view = format(view, big.mark = &amp;quot;,&amp;quot;), description = descrip), format = &amp;quot;html&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;table&gt;
 &lt;thead&gt;
  &lt;tr&gt;
   &lt;th style=&#34;text-align:left;&#34;&gt; view &lt;/th&gt;
   &lt;th style=&#34;text-align:left;&#34;&gt; description &lt;/th&gt;
  &lt;/tr&gt;
 &lt;/thead&gt;
&lt;tbody&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; 1,471,018 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 1: The Geometry of Linear Equations. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   421,456 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 2: Elimination with Matrices. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   359,628 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 3: Multiplication and Inverse Matrices. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   304,766 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 4: Factorization into A = LU &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   210,564 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 5: Transposes, Permutations, Spaces R^n. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   199,004 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 6: Column Space and Nullspace. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   151,769 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 7: Solving Ax = 0: Pivot Variables, Special Solutions. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   138,087 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 8: Solving Ax = b: Row Reduced Form R. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   151,718 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 9: Independence, Basis, and Dimension. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   132,971 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 10: The Four Fundamental Subspaces. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   104,452 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 11: Matrix Spaces; Rank 1; Small World Graphs. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    82,273 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 12: Graphs, Networks, Incidence Matrices. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    77,250 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 13: Quiz 1 Review. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   108,408 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 14: Orthogonal Vectors and Subspaces. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    99,687 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 15: Projections onto Subspaces. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    96,329 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 16: Projection Matrices and Least Squares. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    95,593 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 17: Orthogonal Matrices and Gram-Schmidt. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    90,094 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 18: Properties of Determinants. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    79,339 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 19: Determinant Formulas and Cofactors. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    85,189 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 20: Cramer&#39;s Rule, Inverse Matrix, and Volume. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   159,954 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 21: Eigenvalues and Eigenvectors. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;   109,883 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 22: Diagonalization and Powers of A. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    84,893 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 23: Differential Equations and exp(At). &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    84,173 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 24: Markov Matrices; Fourier Series.* &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    36,172 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 24b : Quiz 2 Review.* &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    59,755 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 25: Symmetric Matrices and Positive Definiteness.* &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    62,566 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 26: Complex Matrices; Fast Fourier Transform. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    58,041 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 27: Positive Definite Matrices and Minima. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    70,082 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 28: Similar Matrices and Jordan Form. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    85,714 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 29: Singular Value Decomposition. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    99,162 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 30: Linear Transformations and Their Matrices. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    61,037 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 31: Change of Basis; Image Compression. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    36,158 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 32: Quiz 3 Review. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    55,596 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 33: Left and Right Inverses; Pseudoinverse. &lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt;    50,540 &lt;/td&gt;
   &lt;td style=&#34;text-align:left;&#34;&gt; Lecture 34: Final Course Review. &lt;/td&gt;
  &lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Finally, let&amp;rsquo;s plot it.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# plot
ggplot(dat, aes(x = lec, y = view)) +
  geom_point(color = &amp;quot;red&amp;quot;, size = 2) + 
  geom_line(color = &amp;quot;blue&amp;quot;) +
  labs(x = &amp;quot;Lectures&amp;quot;, y = &amp;quot;Youtube view count&amp;quot;,
       title = &amp;quot;Youtube view counts of Linear Algebra lectures taught by 
       Gilbert Strang, Srping 2005&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/DtGk7Rt.png&#34; alt=&#34;Imgur&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Wow, the first lecture has 1,471,030 by far (2015-06-21-23:00 Central Time)! However, the view count of the second lecture is about one million lower than the first one. It will be interesting to find out why lecture 21 and 22 have more view counts than their neighbors (I am getting their, at lecture 14 now! &amp;ndash; Eigenvalues!). The last lecture has about 50K views. Does this mean about 50K people finished all lectures?&lt;/p&gt;

&lt;p&gt;It clearly shows how hard it is to be persistent.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2015/06/01/youtube_view_counts/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Some useful keyboard shortcuts for Atom editor</title>
      <link>https://blog.dlilab.com/en/2015/04/10/useful-atom-shortcuts/</link>
      <pubDate>Fri, 10 Apr 2015 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2015/04/10/useful-atom-shortcuts/</guid>
      <description>
        

&lt;p&gt;I am trying to switch to Github&amp;rsquo;s new editor &lt;a href=&#34;https://atom.io/&#34; target=&#34;_blank&#34;&gt;Atom&lt;/a&gt;. Here is a note about things I found useful for me.&lt;/p&gt;

&lt;h3 id=&#34;packages&#34;&gt;Packages&lt;/h3&gt;

&lt;p&gt;To see all packages installed, run &lt;code&gt;apm list&lt;/code&gt; in your terminal. I used the following packages so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;atom-material-syntax&lt;/code&gt; # great syntax highlighting&lt;/li&gt;
&lt;li&gt;&lt;code&gt;atom-material-ui&lt;/code&gt; # great user interface&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autocomplete-bibtex&lt;/code&gt; # autocomplete citations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autocomplete-paths&lt;/code&gt; # autocomplete path of files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;file-icons&lt;/code&gt; # show file icons in the tree view&lt;/li&gt;
&lt;li&gt;&lt;code&gt;git-time-machine&lt;/code&gt; # compare git files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ink&lt;/code&gt; # for julia language&lt;/li&gt;
&lt;li&gt;&lt;code&gt;julia-client&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;language-julia&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;language-latex&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;language-markdown&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;markdown-preview-plus&lt;/code&gt; # render math equations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;markdown-writer&lt;/code&gt; # make writing in markdown easier&lt;/li&gt;
&lt;li&gt;&lt;code&gt;minimap&lt;/code&gt; # show minimap of your file&lt;/li&gt;
&lt;li&gt;&lt;code&gt;minimap-find-and-replace&lt;/code&gt; # show finded items in minimap&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pen-paper-coffee-syntax&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;project-manager&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terminal-panel&lt;/code&gt; # run terminal within Atom&lt;/li&gt;
&lt;li&gt;&lt;code&gt;typewriter&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vim-mode&lt;/code&gt; # I like the vim mode of moving cursor&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wordcount&lt;/code&gt; #&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Zen&lt;/code&gt; # distraction free&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To install all of them: &lt;code&gt;apm install atom-material-syntax atom-material-ui autocomplete-bibtex autocomplete-paths file-icons git-time-machine ink julia-client language-julia language-latex language-markdown language-r markdown-preview-plus markdown-writer minimap minimap-find-and-replace pen-paper-coffee-syntax project-manager terminal-panel typewriter vim-mode wordcount Zen&lt;/code&gt;&lt;/p&gt;

&lt;h3 id=&#34;shortcuts&#34;&gt;Shortcuts&lt;/h3&gt;

&lt;h4 id=&#34;multi-cursor&#34;&gt;Multi-cursor&lt;/h4&gt;

&lt;p&gt;I also like the &lt;strong&gt;multi-cursor&lt;/strong&gt; feature from &lt;em&gt;sublime text&lt;/em&gt;, which I feel is a must for me. Shortcuts within Atom:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ctrl-D&lt;/code&gt; if you select a world, then you hit &lt;code&gt;ctrl-D&lt;/code&gt; and Atom will select next same word for you. Then you can either type directly (which will replace the old word) or use left or right arrow to append things.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl-leftclick&lt;/code&gt; you can use this to select locations for multi-cursor wherever you want.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;shift-alt-down&lt;/code&gt; or &lt;code&gt;shift-alt-up&lt;/code&gt; to put multi-cursor at multiple lines. Or you can select multiple lines first, then &lt;code&gt;selection -- split into lines&lt;/code&gt; (in Mac, you can use &lt;code&gt;cmd-shift-L&lt;/code&gt;, sadly, for windows and linux so far, no similar shortcut for this [in sublime, we can use &lt;code&gt;ctrl-shift-L&lt;/code&gt;].).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These pretty much cover most of usage of multi-cursor, but I still missing &lt;code&gt;shift-rightclick_and_drag&lt;/code&gt; feature from &lt;em&gt;sublime text&lt;/em&gt;.&lt;/p&gt;

&lt;h4 id=&#34;spell-check&#34;&gt;Spell check&lt;/h4&gt;

&lt;p&gt;To enable spell check for Latex files, go to setting and find the spell-check package, add &lt;code&gt;text.tex.latex&lt;/code&gt; in the grammer filed.&lt;/p&gt;

&lt;h4 id=&#34;common-used&#34;&gt;Common used&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;shift + f11&lt;/code&gt;: full screen, distration free from the Zen package.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl + \&lt;/code&gt;: toggle tree view.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl + /&lt;/code&gt;: toggle comment.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ctrl + shift + up/down&lt;/code&gt;: move line up/down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;update soon&lt;/em&gt;&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2015/04/10/useful-atom-shortcuts/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Notes for Zoo 540 Theoretical ecology (Part I)</title>
      <link>https://blog.dlilab.com/en/2014/12/12/notes-for-theoretical-ecology-class-1/</link>
      <pubDate>Fri, 12 Dec 2014 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2014/12/12/notes-for-theoretical-ecology-class-1/</guid>
      <description>
        

&lt;blockquote&gt;
&lt;p&gt;Simulation is critical to understand what your methods are doing! Try to simulate your dataset before doing any statistical analysis.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;grouse-data&#34;&gt;Grouse data&lt;/h2&gt;

&lt;p&gt;The data have presence/absence of four bird species at 117 route. Each route has 8 stations distributed along the 1 mile by 1 mile border evennly. The data also include environmental data at each station, including wind speed, temperature, noise, etc. The question is &amp;ldquo;what factors are controlling species abundance and distribution?&amp;rdquo;.&lt;/p&gt;

&lt;h4 id=&#34;simulation&#34;&gt;Simulation&lt;/h4&gt;

&lt;p&gt;It is alway a good idea to simulate your dataset first before do statistical analysis. Here, we choose species &lt;code&gt;WITU&lt;/code&gt;, wild turkey as an example. (code from Tony Ives) The key info here is how to do a compund distribution simulation.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;d  # the dataset in long table form: each row is an observation
w  # aggregated at each route, using `FUN = mean`.
# I decided I wanted to generate data that had the appropriate
# variability in counts per ROUTE. This variability can be seen in the
# following histogram.
hist(w$WITU)

# As a first attempt, assume that each observation at each STATION is
# random and independent of all other stations, including those
# stations in the same ROUTE. The mean number of observation
# (presences) across all STATIONs is
mWITU &amp;lt;- mean(d$WITU)

# Therefore, I produced a data set that has the same structure as d in
# which WITU is selected from a binomial distribution with probability
# = mWITU and size = 1 (size is the number of trials).

sim.d &amp;lt;- subset(d, select = ROUTE:Y_NAD83)
sim.d$WITU &amp;lt;- rbinom(n = dim(d)[1], size = 1, prob = mWITU)

# Now I treat sim.d just like d to get the histogram I&#39;m interested in
sim.w &amp;lt;- data.frame(aggregate(cbind(sim.d$WITU, sim.d$X_NAD83, sim.d$Y_NAD83), 
    by = list(sim.d$ROUTE), FUN = &amp;quot;mean&amp;quot;))
names(sim.w) &amp;lt;- c(&amp;quot;ROUTE&amp;quot;, &amp;quot;WITU&amp;quot;, &amp;quot;X_NAD83&amp;quot;, &amp;quot;Y_NAD83&amp;quot;)

# Finally, I compare the distributions. Run this code (starting with
# the subset() function above) several times to convince yourself these
# distributions are different.
op = par(mfrow = c(2, 1))
hist(w$WITU)
hist(sim.w$WITU)

## Because there is more variation in the data than in the first
## simulation, I decided to assume that ROUTEs had different
## probabilities of WITU being observed in STATIONs. Specifically, for
## each ROUTE, I assumed that the probability of a WITU being observed
## at a station was prob, and that prob is distributed according to an
## exponential distribution among ROUTEs.  This is an example of a
## compund distribution: the probability from a binomial distribution is
## itself described by an exponential distribution.

sim.d &amp;lt;- subset(d, select = ROUTE:Y_NAD83)

# This uses a for() loop that loops through the levels of sim.d$ROUTE.
for (route in levels(sim.d$ROUTE)) {
    n &amp;lt;- sum(sim.d$ROUTE == route)
    prob &amp;lt;- rexp(n = 1, rate = 1/mWITU)
    sim.d$WITU[sim.d$ROUTE == route] &amp;lt;- rbinom(n = n, size = 1, prob = prob)
    sim.d$route.mean[sim.d$ROUTE == route] &amp;lt;- prob
}

# Or ROUTE to be beta distribution first --&amp;gt; beta-binomial distribution
shape1 &amp;lt;- 1
shape2 &amp;lt;- (1 - mRUGR) * shape1/mRUGR

sim.d &amp;lt;- subset(d, select = ROUTE:DATE)
for (route in levels(sim.d$ROUTE)) {
    n &amp;lt;- sum(sim.d$ROUTE == route)
    prob.route &amp;lt;- rbeta(n = 1, shape1 = shape1, shape2 = shape2)
    sim.d$RUGR[sim.d$ROUTE == route] &amp;lt;- rbinom(n = n, size = 1, prob = prob.route)
}

# Again, I generate sim.w like w, although I&#39;ve also added a column for
# the value of prob from each ROUTE and called it route.mean.
sim.w &amp;lt;- data.frame(aggregate(cbind(sim.d$WITU, sim.d$X_NAD83, sim.d$Y_NAD83, 
    sim.d$route.mean), by = list(sim.d$ROUTE), FUN = &amp;quot;mean&amp;quot;))
names(sim.w) &amp;lt;- c(&amp;quot;ROUTE&amp;quot;, &amp;quot;WITU&amp;quot;, &amp;quot;X_NAD83&amp;quot;, &amp;quot;Y_NAD83&amp;quot;, &amp;quot;route.mean&amp;quot;)

# Run this a few times to convince yourself that the simulations do a
# pretty good job reproducing the data
op = par(mfrow = c(3, 1))
hist(w$WITU)
hist(sim.w$WITU)
hist(sim.w$route.mean)

# for betabinomial distribution, we can also estimate the MLL of prob
# first, then simulate the data

# Probability distribution function for a betabinomial distribution
# modified from the library &#39;emdbook&#39;
dbetabinom &amp;lt;- function(y, prob, size, theta, shape1, shape2, log = FALSE) {
    if (missing(prob) &amp;amp;&amp;amp; !missing(shape1) &amp;amp;&amp;amp; !missing(shape2)) {
        prob &amp;lt;- shape1/(shape1 + shape2)
        theta &amp;lt;- shape1 + shape2
    }
    v &amp;lt;- lfactorial(size) - lfactorial(y) - lfactorial(size - y) - lbeta(theta * 
        (1 - prob), theta * prob) + lbeta(size - y + theta * (1 - prob), 
        y + theta * prob)
    if (sum((y%%1) != 0) != 0) {
        warning(&amp;quot;non-integer x detected; returning zero probability&amp;quot;)
        v[n] &amp;lt;- -Inf
    }
    if (log) 
        v else exp(v)
}

# Log-likelihood function for the betabinomial given data Y (vector of
# successes) and Size (vector of number of trials) in terms of
# parameters prob and theta
dbetabinom_LLF &amp;lt;- function(parameters, Y, Size) {
    prob &amp;lt;- parameters[1]
    theta &amp;lt;- parameters[2]
    -sum(dbetabinom(y = Y, size = Size, prob = prob, theta = theta, log = TRUE))
}

LLestimates &amp;lt;- optim(fn = dbetabinom_LLF, par = c(prob = 0.2, theta = 0.5), 
    Y = w$OBS, Size = w$STATIONS, method = &amp;quot;BFGS&amp;quot;)
# w$OBS : how many birds observed in all stations from one route
# w$STATIONS: 8 stations / route.
LLestimates
# $par prob 0.177216

# update parameters
shape1 &amp;lt;- LLestimates$par[1] * LLestimates$par[2]
shape2 &amp;lt;- (1 - LLestimates$par[1]) * LLestimates$par[2]

# This uses a for() loop that loops through the levels of sim.d$ROUTE.
sim.d &amp;lt;- subset(d, select = ROUTE:DATE)
for (route in levels(sim.d$ROUTE)) {
    n &amp;lt;- sum(sim.d$ROUTE == route)
    prob.route &amp;lt;- rbeta(n = 1, shape1 = shape1, shape2 = shape2)
    sim.d$RUGR[sim.d$ROUTE == route] &amp;lt;- rbinom(n = n, size = 1, prob = prob.route)
}

## Or we can even include envi variales.

# Log-likelihood function for the betabinomial given data Y (vector of
# successes), Size (vector of number of trials), and independent
# variable X (WINDSPEEDSQR) in terms of parameters prob and theta
dbetabinom_LLF &amp;lt;- function(parameters, Y, Size, X) {
    theta &amp;lt;- parameters[1]
    b0 &amp;lt;- parameters[2]
    b1 &amp;lt;- parameters[3]
    
    # inverse logit function
    prob &amp;lt;- 1/(1 + exp(-b1 * (X - b0)))
    -sum(dbetabinom(y = Y, size = Size, prob = prob, theta = theta, log = TRUE))
}

LLe &amp;lt;- optim(fn = dbetabinom_LLF, par = c(theta = 0.5, b0 = 1, b1 = -0.5), 
    Y = w$OBS, Size = w$STATIONS, X = w$WINDSPEEDSQR, method = &amp;quot;BFGS&amp;quot;)
LLe
# $par theta b0 b1 5.6803877 -3.7789926 -0.3007611

## Simulating data for RUGR

# Set up parameters
theta &amp;lt;- LLe$par[1]
b0 &amp;lt;- LLe$par[2]
b1 &amp;lt;- LLe$par[3]

# This uses a for() loop that loops through the levels of sim.d$ROUTE.
sim.d &amp;lt;- subset(d, select = ROUTE:DATE)
for (route in levels(sim.d$ROUTE)) {
    p &amp;lt;- 1/(1 + exp(-b1 * (w$WINDSPEEDSQR[w$ROUTE == route] - b0)))
    shape1 &amp;lt;- p * theta
    shape2 &amp;lt;- (1 - p) * theta
    n &amp;lt;- sum(sim.d$ROUTE == route)
    prob.route &amp;lt;- rbeta(n = 1, shape1 = shape1, shape2 = shape2)
    sim.d$RUGR[sim.d$ROUTE == route] &amp;lt;- rbinom(n = n, size = 1, prob = prob.route)
}

# Compute statistical significant of H0:b1=0 (effect of WINDSPEEDSQR)

# Log-likelihood function for the betabinomial given data Y (vector of
# successes), Size (vector of number of trials), and independent
# variable X (WINDSPEEDSQR) with b1 = 0 in terms of parameters prob and
# theta
dbetabinom_LLF &amp;lt;- function(parameters, Y, Size, X) {
    theta &amp;lt;- parameters[1]
    b0 &amp;lt;- parameters[2]
    
    # inverse logit function
    prob &amp;lt;- 1/(1 + exp(b0))
    -sum(dbetabinom(y = Y, size = Size, prob = prob, theta = theta, log = TRUE))
}

LLe0 &amp;lt;- optim(fn = dbetabinom_LLF, par = c(theta = 0.5, b0 = 1), Y = w$OBS, 
    Size = w$STATIONS, X = w$WINDSPEEDSQR, method = &amp;quot;BFGS&amp;quot;)
LLe0

c(LLe$value, LLe0$value)
# [1] 179.9093 181.9467 negative likelihood
pchisq(2 * (LLe0$value - LLe$value), df = 1, lower.tail = FALSE)
# 0.04352663
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;maximum-likelihood&#34;&gt;Maximum likelihood&lt;/h2&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# Likelihood function for a Bernouli process generate data
n &amp;lt;- 10
p &amp;lt;- 0.8
set.seed(123)
xi &amp;lt;- rbinom(n = n, size = 1, prob = p)

L &amp;lt;- function(pp) apply(X = array(pp), MARGIN = 1, FUN = function(ppp) prod(xi * 
    ppp + (1 - xi) * (1 - ppp)))
LL &amp;lt;- function(pp) apply(X = array(pp), MARGIN = 1, FUN = function(ppp) sum(log(xi * 
    ppp + (1 - xi) * (1 - ppp))))

par(mfrow = c(1, 1), lwd = 2, bty = &amp;quot;l&amp;quot;, las = 1, cex = 1.5)
curve(L, from = 0, to = 1, main = paste(&amp;quot;p = &amp;quot;, p, &amp;quot;mean(x) = &amp;quot;, mean(xi), 
    &amp;quot; n = &amp;quot;, n))
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/aj99JQI.png&#34; alt=&#34;plot of chunk unnamed-chunk-2&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Here, the maximum likelihood is at &lt;code&gt;x = 0.7&lt;/code&gt;, which is the &lt;code&gt;mean(x)&lt;/code&gt; not the true probability &lt;code&gt;p&lt;/code&gt;. This is because the maximum likelihood is estimated from the actual data, not the TRUE underlying probability that we always do not know.&lt;/p&gt;

&lt;h2 id=&#34;confident-interval&#34;&gt;Confident interval&lt;/h2&gt;

&lt;p&gt;How do you calculate confident interval? In basic statistical classes, we were told to use &lt;code&gt;mean +- 1.96*SE&lt;/code&gt;. But this way is a special case for general way since normal distribution is symmetric.&lt;/p&gt;

&lt;p&gt;The general way works like this, using binomal distribution as an example: we know the mean proportion of success in the data as &lt;code&gt;p_hat = x/n&lt;/code&gt;. Then we propose a &lt;code&gt;prob&lt;/code&gt; value, say 0.3, then we simulate &lt;code&gt;n&lt;/code&gt; numbers from a binomial distribution with &amp;ldquo;true&amp;rdquo; propbability &lt;code&gt;prob = 0.3&lt;/code&gt;. We then can calculate the propbability  that &lt;code&gt;p_hat&lt;/code&gt; generated the simulated values using the simulated distribution. If this value is less than 0.025, then the &lt;code&gt;prob&lt;/code&gt; value proposed is not within the 95% confident interval of the true probability of our acutual data. Repeat this procedures&amp;hellip; Probably just look at the code:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# Confidence intervals for a Bernouli process generate data
n &amp;lt;- 100
p &amp;lt;- 0.5
xi &amp;lt;- rbinom(n = n, size = 1, prob = p)

# Compute estimate
p_hat &amp;lt;- mean(xi)

# Plot estimator
op = par(mfrow = c(1, 1), lwd = 2, bty = &amp;quot;l&amp;quot;, las = 1, cex = 1.5)

lower_cum &amp;lt;- function(p_est, pp, n) pbinom(q = p_est * n - 1, size = n, 
    prob = pp)
upper_cum &amp;lt;- function(p_est, pp, n) 1 - pbinom(q = p_est * n, size = n, 
    prob = pp)

pp &amp;lt;- 0.3
W &amp;lt;- function(pp, n) cbind((0:n)/n, dbinom(x = 0:n, size = n, prob = pp))
plot(W(pp, n), type = &amp;quot;h&amp;quot;, main = paste(&amp;quot;p=&amp;quot;, pp, &amp;quot;p_hat=&amp;quot;, p_hat, &amp;quot;lower=&amp;quot;, 
    0.001 * round(1000 * lower_cum(p_hat, pp, n)), &amp;quot;upper=&amp;quot;, 0.001 * round(1000 * 
        upper_cum(p_hat, pp, n))), xlab = &amp;quot;estimate&amp;quot;, ylab = &amp;quot;probability&amp;quot;)
points(p_hat, 0, col = &amp;quot;red&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/OxiK42C.png&#34; alt=&#34;plot of chunk unnamed-chunk-3&#34; /&gt;&lt;/p&gt;

&lt;p&gt;In this case, 0.3 is not within the 95% CI of &lt;code&gt;p_hat&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# Confidence intervals for a Bernouli process generate data
n &amp;lt;- 100
p &amp;lt;- 0.5
xi &amp;lt;- rbinom(n = n, size = 1, prob = p)

# Compute estimate
p_hat &amp;lt;- mean(xi)

# Plot estimator
op = par(mfrow = c(1, 1), lwd = 2, bty = &amp;quot;l&amp;quot;, las = 1, cex = 1.5)

lower_cum &amp;lt;- function(p_est, pp, n) pbinom(q = p_est * n - 1, size = n, 
    prob = pp)
upper_cum &amp;lt;- function(p_est, pp, n) 1 - pbinom(q = p_est * n, size = n, 
    prob = pp)

pp &amp;lt;- 0.6
W &amp;lt;- function(pp, n) cbind((0:n)/n, dbinom(x = 0:n, size = n, prob = pp))
plot(W(pp, n), type = &amp;quot;h&amp;quot;, main = paste(&amp;quot;p=&amp;quot;, pp, &amp;quot;p_hat=&amp;quot;, p_hat, &amp;quot;lower=&amp;quot;, 
    0.001 * round(1000 * lower_cum(p_hat, pp, n)), &amp;quot;upper=&amp;quot;, 0.001 * round(1000 * 
        upper_cum(p_hat, pp, n))), xlab = &amp;quot;estimate&amp;quot;, ylab = &amp;quot;probability&amp;quot;)
points(p_hat, 0, col = &amp;quot;red&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/eIrYaIF.png&#34; alt=&#34;plot of chunk unnamed-chunk-4&#34; /&gt;&lt;/p&gt;

&lt;p&gt;In this case, 0.6 is within the 95% CI. Repeat this procedure, we can get the 95% CI for &lt;code&gt;p_hat&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# numerically find confidence intervals
alpha &amp;lt;- 0.05

toMin_lower &amp;lt;- function(pp) (lower_cum(p_hat, pp, n) - alpha/2)^2
toMin_upper &amp;lt;- function(pp) (upper_cum(p_hat, pp, n) - alpha/2)^2

upper_alpha &amp;lt;- optim(p_hat, toMin_lower)$par
lower_alpha &amp;lt;- optim(p_hat, toMin_upper)$par

par(mfrow = c(1, 2))
pp &amp;lt;- upper_alpha
plot(W(pp, n), type = &amp;quot;h&amp;quot;, main = paste(&amp;quot;p=&amp;quot;, 0.001 * round(1000 * pp), 
    &amp;quot;lower=&amp;quot;, 0.001 * round(1000 * lower_cum(p_hat, pp, n)), &amp;quot;upper=&amp;quot;, 
    0.001 * round(1000 * upper_cum(p_hat, pp, n))), xlab = &amp;quot;estimate&amp;quot;, 
    ylab = &amp;quot;probability&amp;quot;)
points(p_hat, 0, col = &amp;quot;red&amp;quot;)

pp &amp;lt;- lower_alpha
plot(W(pp, n), type = &amp;quot;h&amp;quot;, main = paste(&amp;quot;p=&amp;quot;, 0.001 * round(1000 * pp), 
    &amp;quot;lower=&amp;quot;, 0.001 * round(1000 * lower_cum(p_hat, pp, n)), &amp;quot;upper=&amp;quot;, 
    0.001 * round(1000 * upper_cum(p_hat, pp, n))), xlab = &amp;quot;estimate&amp;quot;, 
    ylab = &amp;quot;probability&amp;quot;)
points(p_hat, 0, col = &amp;quot;red&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/s0e1Lwi.png&#34; alt=&#34;plot of chunk unnamed-chunk-5&#34; /&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# Test confidence intervals
n &amp;lt;- 500
p_true &amp;lt;- 0.7
nexpts &amp;lt;- 1000
countOutside &amp;lt;- array(0, c(nexpts, 6))
for (expt in 1:nexpts) {
    p_hat &amp;lt;- (1/n) * sum(rbinom(n = n, size = 1, prob = p_true))
    if (p_hat == 0) {
        lowerbound &amp;lt;- 0
        lowerconverge &amp;lt;- 0
    } else {
        lower_alpha &amp;lt;- optim(p_hat, toMin_upper)
        lowerbound &amp;lt;- lower_alpha$par
        lowerconverge &amp;lt;- lower_alpha$value &amp;gt; 10^-4
    }
    if (p_hat == 1) {
        upperbound &amp;lt;- 1
        upperconverge &amp;lt;- 0
    } else {
        upper_alpha &amp;lt;- optim(p_hat, toMin_lower)
        upperbound &amp;lt;- upper_alpha$par
        upperconverge &amp;lt;- upper_alpha$value &amp;gt; 10^-4
    }
    
    countOutside[expt, ] &amp;lt;- c(p_true &amp;lt;= lowerbound, p_true &amp;gt;= upperbound, 
        lowerbound, upperbound, lowerconverge, upperconverge)
}
c(mean(countOutside[countOutside[, 5] == 0, 1]), mean(countOutside[countOutside[, 
    6] == 0, 2]))
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## [1] 0.022 0.030
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;colMeans(countOutside)
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## [1] 0.0220 0.0300 0.6597 0.7378 0.0000 0.0000
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;head(countOutside, n = 10)
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;##       [,1] [,2]   [,3]   [,4] [,5] [,6]
##  [1,]    0    0 0.6474 0.7265    0    0
##  [2,]    0    0 0.6743 0.7513    0    0
##  [3,]    0    0 0.6515 0.7303    0    0
##  [4,]    0    0 0.6619 0.7399    0    0
##  [5,]    0    0 0.6722 0.7494    0    0
##  [6,]    0    0 0.6371 0.7169    0    0
##  [7,]    0    0 0.6722 0.7494    0    0
##  [8,]    0    0 0.6392 0.7188    0    0
##  [9,]    0    0 0.6805 0.7571    0    0
## [10,]    0    0 0.6474 0.7265    0    0
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;analysis-of-the-grouse-data&#34;&gt;Analysis of the grouse data&lt;/h2&gt;

&lt;p&gt;Goal: Estimating the effect of &lt;code&gt;WINDSPEEDSQR&lt;/code&gt; on observations of &lt;code&gt;RUGR&lt;/code&gt;. There are many ways to do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a likelihood ratio test&lt;/li&gt;
&lt;li&gt;linear regression with data transformation&lt;/li&gt;
&lt;li&gt;LMM&lt;/li&gt;
&lt;li&gt;GLM&lt;/li&gt;
&lt;li&gt;GLMM&lt;/li&gt;
&lt;li&gt;a parametric bootstrap test&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Always use &lt;code&gt;quasibinomial&lt;/code&gt; or &lt;code&gt;quasipoisson&lt;/code&gt; got GLMs. In GLMM, &lt;code&gt;(1|id)&lt;/code&gt; will allow the variation to be larger than the distribution allowed, i.e. similar as &lt;code&gt;quasibinomial&lt;/code&gt; or &lt;code&gt;quasipoisson&lt;/code&gt; and it will be like the residuals in the linear regression, absorbing all remaining unexplained variations.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;## (i) a likelihood ratio test Probability distribution function for a
## betabinomial distribution from the library &#39;emdbook&#39;
dbetabinom &amp;lt;- function(y, prob, size, theta, shape1, shape2, log = FALSE) {
    if (missing(prob) &amp;amp;&amp;amp; !missing(shape1) &amp;amp;&amp;amp; !missing(shape2)) {
        prob &amp;lt;- shape1/(shape1 + shape2)
        theta &amp;lt;- shape1 + shape2
    }
    v &amp;lt;- lfactorial(size) - lfactorial(y) - lfactorial(size - y)
    -lbeta(theta * (1 - prob), theta * prob)
    +lbeta(size - y + theta * (1 - prob), y + theta * prob)
    if (sum((y%%1) != 0) != 0) {
        warning(&amp;quot;non-integer x detected; returning zero probability&amp;quot;)
        v[n] &amp;lt;- -Inf
    }
    if (log) 
        v else exp(v)
}

# Log-likelihood function for the betabinomial given data Y (vector of
# successes), Size (vector of number of trials), and independent
# variable X (WINDSPEEDSQR) in terms of parameters prob and theta
dbetabinom_LLF &amp;lt;- function(parameters, Y, Size, X) {
    theta &amp;lt;- parameters[1]
    b0 &amp;lt;- parameters[2]
    b1 &amp;lt;- parameters[3]
    
    # inverse logit function
    prob &amp;lt;- 1/(1 + exp(-b1 * (X - b0)))
    -sum(dbetabinom(y = Y, size = Size, prob = prob, theta = theta, log = TRUE))
}

LLe &amp;lt;- optim(fn = dbetabinom_LLF, par = c(theta = 0.5, b0 = 1, b1 = 0.5), 
    Y = w$OBS, Size = w$STATIONS, X = w$WINDSPEEDSQR, method = &amp;quot;BFGS&amp;quot;)

# Compute statistical significant of H0:b1=0 (effect of WINDSPEEDSQR)

# Log-likelihood function for the betabinomial given data Y (vector of
# successes), Size (vector of number of trials), and independent
# variable X (WINDSPEEDSQR) with b1 = 0 in terms of parameters prob and
# theta
dbetabinom_LLF0 &amp;lt;- function(parameters, Y, Size, X) {
    theta &amp;lt;- parameters[1]
    b0 &amp;lt;- parameters[2]
    
    # inverse logit function
    prob &amp;lt;- 1/(1 + exp(b0))
    -sum(dbetabinom(y = Y, size = Size, prob = prob, theta = theta, log = TRUE))
}

LLe0 &amp;lt;- optim(fn = dbetabinom_LLF0, par = c(theta = 0.5, b0 = 1), Y = w$OBS, 
    Size = w$STATIONS, X = w$WINDSPEEDSQR, method = &amp;quot;BFGS&amp;quot;)
LLe0

c(LLe$value, LLe0$value)
pchisq(2 * (LLe0$value - LLe$value), df = 1, lower.tail = FALSE)


## (ii) LMM for the presence of RUGR at stations (ignoring the binary
## nature of the data)
library(lme4)
# Make variable in d for mean WINDSPEEDSQR (to give a fair comparison
# between methods at the station vs. route levels)
d %&amp;gt;% group_by(ROUTE) %&amp;gt;% mutate(meanWind = mean(WINDSPEEDSQR))

lmer(RUGR ~ WINDSPEEDSQR + (1 | ROUTE), data = d)
lmer(RUGR ~ meanWINDSPEEDSQR + (1 | ROUTE), data = d)
# To get p-values, you can use Anova in library(car)
library(car)
Anova(lmer(RUGR ~ WINDSPEEDSQR + (1 | ROUTE), data = d))
Anova(lmer(RUGR ~ meanWINDSPEEDSQR + (1 | ROUTE), data = d))

## (iii) LMM for the number of observations per route (arcsine
## square-root transformed)
w$tOBS &amp;lt;- asin((w$OBS/w$STATIONS))^(0.5)
summary(lm(tOBS ~ WINDSPEEDSQR, data = w))

## (iv) GLM for the presence of RUGR at stations
summary(glm(RUGR ~ WINDSPEEDSQR, family = &amp;quot;quasibinomial&amp;quot;, data = d))
summary(glm(RUGR ~ meanWINDSPEEDSQR, family = &amp;quot;quasibinomial&amp;quot;, data = d))
# two-tailed p-value (alpha = 0.05) for a t distribution

## (v) GLM for the number of observations per route
summary(glm(cbind(OBS, STATIONS - OBS) ~ WINDSPEEDSQR, family = &amp;quot;binomial&amp;quot;, 
    data = w))
summary(glm(cbind(OBS, STATIONS - OBS) ~ WINDSPEEDSQR, family = &amp;quot;quasibinomial&amp;quot;, 
    data = w))

## (vi) GLMM for the presence of RUGR at stations
glmer(RUGR ~ meanWINDSPEEDSQR + (1 | ROUTE), family = &amp;quot;binomial&amp;quot;, data = d)
Anova(glmer(RUGR ~ meanWINDSPEEDSQR + (1 | ROUTE), family = &amp;quot;binomial&amp;quot;, 
    data = d))

id &amp;lt;- as.factor(1:dim(d)[1])
glmer(RUGR ~ meanWINDSPEEDSQR + (1 | ROUTE) + (1 | id), family = &amp;quot;binomial&amp;quot;, 
    data = d)

## (vii) GLMM for the number of observations per route
glmer(cbind(OBS, STATIONS - OBS) ~ WINDSPEEDSQR + (1 | ROUTE), family = &amp;quot;binomial&amp;quot;, 
    data = w)
## (viii) a parametric bootstrap test assuming the distribution of
## observations per route is betabinomial
library(emdbook)

# Estimated (&#39;true&#39;) value of b1
b1_true &amp;lt;- LLe$par[3]
# Estimated values of b0 and theta under the H0: no effect of windspeed
theta_true0 &amp;lt;- LLe0$par[1]
b0_true0 &amp;lt;- LLe0$par[2]
# Bootstrap simulation under H0
nreps &amp;lt;- 2000
est_b1 &amp;lt;- array(0, c(nreps, 1))
for (rep in 1:nreps) {
    sim.w &amp;lt;- w
    for (route in levels(w$ROUTE)) {
        p &amp;lt;- 1/(1 + exp(b0_true0))
        sim.w$OBS[w$ROUTE == route] &amp;lt;- rbetabinom(n = 1, size = w$STATIONS[w$ROUTE == 
            route], p = p, theta = theta_true0)
    }
    sim.LLe &amp;lt;- optim(fn = dbetabinom_LLF, par = c(theta = 0.5, b0 = 1, 
        b1 = 0.5), Y = sim.w$OBS, Size = sim.w$STATIONS, X = sim.w$WINDSPEEDSQR, 
        method = &amp;quot;BFGS&amp;quot;)
    est_b1[rep] &amp;lt;- sim.LLe$par[3]
}


# Histogram of bootstrap distribution of the estimator of b1
hist(est_b1)
abline(v = b1_true, col = &amp;quot;red&amp;quot;)
lines(c(b1_true, b1_true), c(0, nreps), col = &amp;quot;red&amp;quot;)

# P-values
pvalue.onetailed &amp;lt;- mean(est_b1 &amp;lt; b1_true)
pvalue.onetailed
pvalue.twotailed &amp;lt;- 2 * pvalue.onetailed
pvalue.twotailed
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;hemlock-data&#34;&gt;Hemlock data&lt;/h2&gt;

&lt;p&gt;For a group variable, if data in each group only have a small range of values (e.g. clustering data distribution in each group), say group 1 has values from 10-20, group 2 has 20-30, etc. then it is not good to analyze at group level. Instead we should combine all groups together to analyze them. On the other hand, if each group has wide range of data, then it should be fine to analyze at groyp level.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2014/12/12/notes-for-theoretical-ecology-class-1/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>A simple R function to compress pictures</title>
      <link>https://blog.dlilab.com/en/2014/11/30/compress-pic-by-r/</link>
      <pubDate>Sun, 30 Nov 2014 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2014/11/30/compress-pic-by-r/</guid>
      <description>
        &lt;blockquote&gt;
&lt;p&gt;I do not know too much about picture compression. There must be better ways/packages to do this. This small project is just for fun.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;First, here is a function to blur a picture. It will use the mean value of all cells in a submatrix as value for each cell of that submatrix.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(png)
library(parallel)

filter.img = function(mat, k = 1) {
    pad.mat &amp;lt;- matrix(0, dim(mat)[1] + 2 * k, dim(mat)[2] + 2 * k)
    pad.mat[(k + 1):(dim(mat)[1] + k), (k + 1):(dim(mat)[2] + k)] = mat
    pad.mat2 = matrix(0, dim(pad.mat)[1], dim(pad.mat)[2])
    for (i in (k + 1):(dim(mat)[1] + k)) {
        for (j in (k + 1):(dim(mat)[2] + k)) {
            pad.mat2[i, j] = mean(pad.mat[(i - k):(i + k), (j - k):(j + k)])
        }
    }
    pad.mat2[(k + 1):(dim(pad.mat2)[1] - k), (k + 1):(dim(pad.mat2)[2] - k)]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then let&amp;rsquo;s read the picture below. Then we seperate the red, green, and blue arrays of the picture.
&lt;img src=&#34;https://i.imgur.com/HclZzde.png&#34; alt=&#34;v&#34; /&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# read picture and get red, green, blue arrays
str(vg &amp;lt;- readPNG(&amp;quot;Van_Gogh_Wheatfield_with_Crows.png&amp;quot;))
red.vg &amp;lt;- vg[, , 1]
green.vg &amp;lt;- vg[, , 2]
blue.vg &amp;lt;- vg[, , 3]
filter.vg = list(red.vg, green.vg, blue.vg)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then here is the function that will do the compression on each array and combine together.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# blur red, gree, and blue and then combine together.
final.png = function(lst = filter.vg, k = 1) {
    out.filter.vg = mclapply(lst, function(x) filter.img(x, k = k), mc.cores = 3)
    out.array = array(unlist(out.filter.vg), dim = c(dim(lst[[1]])[1], dim(lst[[1]])[2], 
        3))
    writePNG(out.array, target = paste(&amp;quot;dli55_&amp;quot;, k, &amp;quot;.png&amp;quot;, sep = &amp;quot;&amp;quot;))
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Ok, let&amp;rsquo;s try different extents of compression.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;final.png(k = 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/poS9Cza.png&#34; alt=&#34;k1&#34; /&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;final.png(k = 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/nvO5vwx.png&#34; alt=&#34;k3&#34; /&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;final.png(k = 5)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;https://i.imgur.com/3Leq1uy.png&#34; alt=&#34;k5&#34; /&gt;&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2014/11/30/compress-pic-by-r/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>Maximum likelihood estimation of normal distribution</title>
      <link>https://blog.dlilab.com/en/2014/10/08/mle-normal-distribution/</link>
      <pubDate>Wed, 08 Oct 2014 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2014/10/08/mle-normal-distribution/</guid>
      <description>
        &lt;p&gt;The probability density function of normal distribution is:
&lt;code&gt;\[
f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Support we have the following &lt;em&gt;n i.i.d&lt;/em&gt; observations: &lt;code&gt;\(x_{1},x_{2},\dots,x_{n}\)&lt;/code&gt;.
Because they are independent, the probability that we have observed
these data are:
&lt;code&gt;\[
f(x_{1},x_{2},\dots,x_{n}|\sigma,\mu)=\prod_{i=1}^{n}\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x_{i}-\mu)^{2}}{2\sigma^{2}}}=(\frac{1}{\sigma\sqrt{2\pi}})^{n}e^{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}}
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\[\begin{array}{cl}
\log(f(x_{1},x_{2},\dots,x_{n}|\sigma,\mu)) &amp;amp; =\log((\frac{1}{\sigma\sqrt{2\pi}})^{n}e^{-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}})\\
 &amp;amp; =n\log\frac{1}{\sigma\sqrt{2\pi}}-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\\
 &amp;amp; =-\frac{n}{2}\log(2\pi)-n\log\sigma-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}
\end{array}\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s call &lt;code&gt;\(\log(f(x_{1},x_{2},\dots,x_{n}|\sigma,\mu))\)&lt;/code&gt; as &lt;code&gt;\(\mathcal{L},\)&lt;/code&gt;
then let:
&lt;code&gt;\[
\frac{d\mathcal{L}}{d\mu}=-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\mid_{\mu}=0
\]&lt;/code&gt;
 solve this equation, we get
&lt;code&gt;\[
\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(2\hat{\mu}-2x_{i})=0
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Because &lt;code&gt;\(\sigma^{2}\)&lt;/code&gt; should be larger than zero,
&lt;code&gt;\[
\hat{\mu}=\frac{\sum_{i=1}^{n}x_{i}}{n}
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Similarly, let
&lt;code&gt;\[
\frac{d\mathcal{L}}{d\sigma}=-\frac{n}{\sigma}+\sum_{i=1}^{n}(x_{i}-\mu)^{2}\sigma^{-3}=0
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I realized that it would be better to get the maximum likelihood estimator
of &lt;code&gt;\(\sigma^{2}\)&lt;/code&gt; instead of &lt;code&gt;\(\sigma\)&lt;/code&gt;. Thus&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\[
\hat{\sigma}^{2}=\frac{\sum_{i=1}^{n}(x_{i}-\hat{\mu})^{2}}{n}
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;But this MLE of &lt;code&gt;\(\sigma^{2}\)&lt;/code&gt; is biased. A point estimateor &lt;code&gt;\(\hat{\theta}\)&lt;/code&gt; is said to be an unbiased estimator
of &lt;code&gt;\(\theta\)&lt;/code&gt; is &lt;code&gt;\(E(\hat{\theta})=\theta\)&lt;/code&gt; for every possible value
of &lt;code&gt;\(\theta\)&lt;/code&gt;. If &lt;code&gt;\(\hat{\theta}\)&lt;/code&gt; is not unbiased, the difference &lt;code&gt;\(E(\hat{\theta})-\theta\)&lt;/code&gt;is
called the bias of &lt;code&gt;\(\hat{\theta}\)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We know that
&lt;code&gt;\[
\sigma^{2}=Var(X)=E(X^{2})-(E(X))^{2}\Rightarrow E(X^{2})=Var(X)+(E(X))^{2}
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then
&lt;code&gt;\[
\begin{array}{cl}
E(\hat{\sigma}^{2}) &amp;amp; =\frac{1}{n}E(\sum_{i=1}^{n}(x_{i}-\hat{\mu})^{2})\\
 &amp;amp; =\frac{1}{n}E(\sum x_{i}^{2}-n\hat{\mu}^{2})\\
 &amp;amp; =\frac{1}{n}E(\sum x_{i}^{2}-\frac{(\sum x_{i})^{2}}{n})\\
 &amp;amp; =\frac{1}{n}\left\{ \sum E(x_{i}^{2})-\frac{1}{n}E\left[(\sum x_{i})^{2}\right]\right\} \\
 &amp;amp; =\frac{1}{n}\left\{ \sum(\sigma^{2}+\mu^{2})-\frac{1}{n}\left[n\sigma^{2}+(n\mu)^{2}\right]\right\} \\
 &amp;amp; =\frac{1}{n}\left\{ n\sigma^{2}+n\mu^{2}-\sigma^{2}-n\mu^{2}\right\} \\
 &amp;amp; =\frac{n-1}{n}\sigma^{2}\\
 &amp;amp; \neq\sigma^{2}
\end{array}
\]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Bias is &lt;code&gt;\(E(\sigma^{2})-\sigma^{2}=-\frac{\sigma^{2}}{n}\)&lt;/code&gt;. In fact the unbiased estimator of
&lt;code&gt;\(\sigma^{2}\)&lt;/code&gt; is &lt;code&gt;\(s^{2}=\frac{\sum_{i=1}^{n}(x_{i}-\hat{\mu})^{2}}{n-1}\)&lt;/code&gt;.
But the fact that &lt;code&gt;\(s^{2}\)&lt;/code&gt; is unbiased does not imply that &lt;code&gt;\(s\)&lt;/code&gt; is
unbiased for estimating &lt;code&gt;\(\sigma\)&lt;/code&gt;. The expected value of the square
root is not the square root of the expected value. Fortunately, the
biase of &lt;code&gt;\(s\)&lt;/code&gt; is small unless the sample size is very small. Thus
there are good reasons to use &lt;code&gt;\(s\)&lt;/code&gt; as an estimator of &lt;code&gt;\(\sigma\)&lt;/code&gt;.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2014/10/08/mle-normal-distribution/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
    <item>
      <title>ESA 2014 -- day 2</title>
      <link>https://blog.dlilab.com/en/2014/08/12/esa-2014-day-2/</link>
      <pubDate>Tue, 12 Aug 2014 00:00:00 +0000</pubDate>
      
      <guid>https://blog.dlilab.com/en/2014/08/12/esa-2014-day-2/</guid>
      <description>
        &lt;p&gt;ESA is in full swing now at its second day.&lt;/p&gt;

&lt;p&gt;I went to two ignite sessions about tools and tips for working with ecological data, though I did not stay there for the whole sessions. &lt;a href=&#34;https://www.dataone.org/organization/executive-team/amber-budden&#34; target=&#34;_blank&#34;&gt;Amber Budden&lt;/a&gt; introduced and walked through &lt;a href=&#34;https://www.dataone.org/&#34; target=&#34;_blank&#34;&gt;DataOne&lt;/a&gt;, which is a public data repository. &lt;a href=&#34;http://carlystrasser.net/&#34; target=&#34;_blank&#34;&gt;Carly Strasser&lt;/a&gt; then gave us four tips about making high quality data. You can download her slides &lt;a href=&#34;http://www.slideshare.net/carlystrasser&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;. There were also some other cool talks but I missed them. Especially the talk given by &lt;a href=&#34;https://twitter.com/_inundata&#34; target=&#34;_blank&#34;&gt;Karthik Ram&lt;/a&gt; about rOpenSci project.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://www.cts.cuni.cz/~storch/&#34; target=&#34;_blank&#34;&gt;David Storch&lt;/a&gt; gave a fantastic talk about relationships between species richness and number of individuals. Based on the energy hypothesis, more energy available will lead to more NPP, thus more individual stems, ending with more species. However, David &lt;em&gt;et al&lt;/em&gt; found that the variation of individual numbers is not responsible to variation of species richness, though number of individuals may contribute to species richness regulation. However, this probably  dependents on spatial scale and clade.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://www.uib.no/persons/Vigdis.Vandvik&#34; target=&#34;_blank&#34;&gt;Vigdis Vandvik&lt;/a&gt;&amp;rsquo;s lab did some really cool plant transplantation experiments. In alpine grassland of Norway, they transplanted turfs to warmer and wetter locations. They also transplanted turfs within the same site to serve as control. Then they looked at species composition and community abundance weighed trait value for the turfs and controls and their neighbor quadrats. They found that precipitation does not matter too much in their system, even precipitation varied from 600 - 3000 mm. Instead, temperature explained most of the variations. It is also interesting that the species composition changes as well as two traits (leaf area and seed mass) actually not differ from expected by chance. However, SLA and maximum height did differ significantly from random expectation. The bigger in initial dissimilarities among transplanted turfs and the sites the turfs transplanted to, the larger of the changes in functional traits of plants in that transplanted turf. So, I guess that filtering effects are really strong here!&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://www.cnr.usu.edu/htm/facstaff/memberID=749&#34; target=&#34;_blank&#34;&gt;Peter Adler&lt;/a&gt; analyzed coexistence of five perennial plants using Chesson 2000&amp;rsquo;s framework and long-term demographic data. They found that stabilizing niche differences among these five species were large but fitness differences were small. They also found that niche of recruitment is the most important factor in their model. Intraspecific competition in these systems is stronger than interspecific competition.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;http://web.stanford.edu/~fukamit/index.html&#34; target=&#34;_blank&#34;&gt;Tadashi Fukami&lt;/a&gt; found priority effects between yeast and bacteria in flower nectar. Pollinators do not like bacteria but like yeast in nectar. Thus yeast have negative effects on bacteria, which then have negative on plant-pollinator relationships (i.e. more bacteria, less pollinator visitation). Thus, yeast have positive effects on plant-pollinator relationships! In order to understand the microbial effects of nectar on plant-pollinator relationship, we need to study microbial community assembly! Neat story. I did not think too much about community assembly at this scale, but it is much easy to conduct manipulation experiments at this scale!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It seems that you sorted out all things and you do not need my help.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sentences made me a really nice day, thanks, &lt;a href=&#34;http://www.uvm.edu/~ngotelli/homepage.html&#34; target=&#34;_blank&#34;&gt;Nick&lt;/a&gt;.&lt;/p&gt;

        
        &lt;script&gt;window.location.href=&#39;https://blog.dlilab.com/en/2014/08/12/esa-2014-day-2/&#39;;&lt;/script&gt;
        
      </description>
    </item>
    
  </channel>
</rss>
