Yunpeng's BlogLife, coding and everything2019-07-13T09:42:23.413Zhttps://yunpengn.github.io/blog/Niu YunpengHexoRe-understand JavaScripthttps://yunpengn.github.io/blog/2019/06/18/re-understand-javascript/2019-06-18T05:56:39.000Z2019-07-13T09:42:23.413Z
<p>In this post, we will be looking at a few interesting (but could be challenging) JavaScript questions.</p><p>Most of them are actually testing the so-called “down-side” of JavaScript. You certainly should not write such code in a real-world codebase. As I have repeated many times, code should be clear, precise and concise, in that order of importantce. Nevertheless, they are indeed good questions to test your competency in JavaScript.</p><h2 id="9-or-10"><a href="#9-or-10" class="headerlink" title="9 or 10?"></a>9 or 10?</h2><p>You are given a function called <code>magic_length</code>, which is defined as follows:</p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">magic_length</span>(<span class="params">input</span>) </span>{</span><br><span class="line"> <span class="keyword">return</span> input.length == <span class="number">10</span> && input == <span class="string">",,,,,,,,,"</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>Please give one possible value of <code>input</code> such that <code>magic_length(input)</code> will return <code>true</code>. <em>Notice: <code>input</code> should be of basic data type provided by built-in libraries.</em></p>
Introducing your Guide to be AWS Certifiedhttps://yunpengn.github.io/blog/2019/06/15/aws-certificate-guide/2019-06-15T12:34:12.000Z2019-07-13T09:42:23.412Z
<p>Hi guys, it has been a long time since my last post. In the past few months, I have been preparing for the <a href="https://aws.amazon.com/certification/certified-solutions-architect-associate/" target="_blank" rel="noopener">AWS Certified Solutions Architect – Associate</a> examination, which is part of the series of <a href="https://aws.amazon.com/certification/" target="_blank" rel="noopener">AWS Certificate Examination</a>.</p><p>This examination (and its siblings) focus on some cloud computing concepts, as well as a lot of details specific to the services provided by AWS. To prepare for this examination, there are a few important learning resources:</p><ul><li><a href="https://d1.awsstatic.com/training-and-certification/docs-sa-assoc/AWS_Certified_Solutions_Architect_Associate_Feb_2018_%20Exam_Guide_v1.5.2.pdf" target="_blank" rel="noopener">Official Examination Guide</a> and <a href="https://d1.awsstatic.com/training-and-certification/docs/AWS_Certified_Solutions_Architect_Associate_Sample_Questions.pdf" target="_blank" rel="noopener">sample questions</a>;</li><li>AWS whitepapers, which have been collected and available at <a href="https://aws.amazon.com/whitepapers/" target="_blank" rel="noopener">https://aws.amazon.com/whitepapers/</a>;</li><li><a href="https://www.aws.training/LearningLibrary" target="_blank" rel="noopener">AWS Training Learn Library</a>, which offers a free subscription currently;</li><li>Some online courses, of which <a href="https://acloud.guru/" target="_blank" rel="noopener">A Cloud Guru</a> being one of the most popular providers.</li></ul>
Consistency between Redis Cache and SQL Databasehttps://yunpengn.github.io/blog/2019/05/04/consistent-redis-sql/2019-05-04T09:02:18.000Z2019-07-13T09:42:23.412Z
<p>Nowadays, Redis has become one of the most popular cache solution in the Internet industry. Although relational database systems (SQL) bring many awesome properties such as ACID, the performance of the database would degrade under high load in order to maintain these properties.</p><p>In order to fix this problem, many companies & websites have decided to add a cache layer between the application layer (i.e., the backend code which handles the business logic) and the storage layer (i.e., the SQL database). This cache layer is usually implemented using an in-memory cache. This is because, as stated in many textbooks, the performance bottleneck of traditional SQL databases is usually I/O to secondary storage (i.e., the hard disk). As the price of main memory (RAM) has gone down in the past decade, it is now feasible to store (at least part of) the data in main memory to improve performance. One popular choice is Redis.</p><img src="/blog/images/ram_cost.png" width="370" title="The cost of RAM in the past decades">
How Query Optimizer Works in RDBMShttps://yunpengn.github.io/blog/2019/02/07/how-query-optimizer-works/2019-02-06T16:09:39.000Z2019-07-13T09:42:23.413Z
<p>In a previous <a href="/blog/2019/01/05/relational-operators/" title="post">post</a>, we discussed how the various relational operators are implemented in relational database systems. If you have read that post, you probably still remember that there are a few alternative implementations for every operator. Thus, how should RDBMS determine which algorithm (or implementation) to use?</p><p>Obviously, to optimize the performance for any query, RDBMS has to select the correct the algorithm based on the query. It would not be desirable to always use the same algorithm. Also, SQL is a declarative language <em>(i.e., as a programmer we only declare what we want to do with the language, not tell how the language should accomplish the task)</em>. Therefore, it would be an anti-pattern if the user of the database system needs to specify which algorithm to use when writing the query. Instead, the correct approach would be that the user would treat the entire system as a blackbox. The end-user should not care about which algorithm is picked but expect the performance optimization is guaranteed.</p>
Understanding How is Data Stored in RDBMShttps://yunpengn.github.io/blog/2019/01/20/understanding-database-storage/2019-01-20T13:16:19.000Z2019-07-13T09:42:23.414Z
<p>We all know that DBMS <em>(database management system)</em> is used to store (a massive amount of) data. However, have you ever wondered how is data stored in DBMS? In this post, we will focus on data storage in RDBMS, the most traditional relational database systems.</p><h2 id="Physical-Storage"><a href="#Physical-Storage" class="headerlink" title="Physical Storage"></a>Physical Storage</h2><p>Data can be stored in many different kinds of medium or devices, from the fastest but costy registers to the slow but cheap hard drives, or even magnetic tapes. Nowadays, <a href="https://en.wikipedia.org/wiki/Infrastructure_as_a_service" target="_blank" rel="noopener">IaaS</a> providers such as <a href="https://aws.amazon.com" target="_blank" rel="noopener">AWS</a> even provides services such as <a href="https://aws.amazon.com/glacier/" target="_blank" rel="noopener">S3 Glacier</a> as a low-cost archiving storage solution. The diagram below shows the memory hierarchy of common devices.</p><img src="/blog/images/memory_hierarchy.jpg" width="500" title="Memory hierarchy">
Evaluation & Implementation of Relational Operatorshttps://yunpengn.github.io/blog/2019/01/05/relational-operators/2019-01-05T14:10:55.000Z2019-07-13T09:42:23.414Z
<p>This post talks about some basic implementation of relational operators in traditional RDBMS (relational database management systems). It was based on Chapter 14 of the <a href="http://pages.cs.wisc.edu/~dbbook/" target="_blank" rel="noopener">textbook</a> by <a href="http://pages.cs.wisc.edu/~raghu" target="_blank" rel="noopener">Raghu Ramakrishnan</a> and <a href="http://www.cs.cornell.edu/johannes" target="_blank" rel="noopener">Johannes Gehrke</a>.</p><p>Below we will talk about the classical evaluation & implementation of relational operators one-by-one, namely:</p><ul><li><a href="#Selection">Selection</a></li><li><a href="#Projection">Projection</a></li><li><a href="#Join">Join</a>, cross product</li><li><a href="#Set-Operations">Set operations</a> (intersection, union, difference)</li><li><a href="#Aggregation">Grouping & aggregation</a></li></ul>
Literature Review on Join Reorderabilityhttps://yunpengn.github.io/blog/2018/12/22/literature-review-join-reorder/2018-12-22T08:24:11.000Z2019-07-13T09:42:23.413Z
<p>Recently, I was looking at some research papers on the join reorderability. To start with, let’s understand what do we mean by <em>“join reorderability”</em> and why it is important.</p><h2 id="Background-Knowledge"><a href="#Background-Knowledge" class="headerlink" title="Background Knowledge"></a>Background Knowledge</h2><p>Here, we are looking at a query optimization problem, specifically join optimization. As mentioned by <a href="http://www.benjaminnevarez.com/2010/06/optimizing-join-orders/" target="_blank" rel="noopener">Benjamin Nevarez</a>, there are two factors in join optimization: <strong>selection of a join order</strong> and <strong>choice of a join algorithm</strong>.</p><p>As stated by Tan Kian Lee’s <a href="https://www.comp.nus.edu.sg/~tankl/cs3223/slides/opr.pdf" target="_blank" rel="noopener">lecture notes</a>, common join algorithms include iteration-based nested loop join <em>(tuple-based, page-based, block-based)</em>, sort-based merge join and partition-based hash join. We should consider a few factors when deciding which algorithm to use: 1) types of the join predicate (equality predicate v.s. non-equality predicate); 2) sizes of the left v.s. right join operand; 3) available buffer space & access methods.</p><p>For a query attempting to join <code>n</code> tables together, we need <code>n - 1</code> individual joins. Apart from the join algorithm applied to each join, we have to decide in which order these <code>n</code> tables should be joined. We could represent such join queries on multiple tables as a tree. The tree could have different shapes, such as left-deep tree, right-deep tree and bushy tree. The 3 types of trees are compared below on an example of joining 4 tables together.</p><img src="/blog/images/join_order_tree.jpg" width="450" title="3 types of join trees">
Redis Cluster & Common Partition Techniques in Distributed Cachehttps://yunpengn.github.io/blog/2018/07/27/redis-cluster-partition/2018-07-27T05:09:53.000Z2019-07-13T09:42:23.413Z
<p>In this post, I will discuss a few common partition techniques in distributed cache. Especially, I will elaborate on my understanding on the use of <a href="https://redis.io/topics/cluster-tutorial" target="_blank" rel="noopener">Redis Cluster</a>.</p><p>Please understand that at the time of writing, the latest version of Redis is <a href="http://download.redis.io/releases/redis-4.0.10.tar.gz" target="_blank" rel="noopener">4.0.10</a>. Many articles on the same topic have a different idea from this post. This is mainly because, those articles are probably outdated. In particular, they may refer to the Redis Cluster implementation in Redis 3. Redis Cluster has been improved a lot since Redis 4.</p><p><em>(This article was based on part of my project report. You may want to take a look at the full report <a href="https://dl.comp.nus.edu.sg/handle/1900.100/7123" target="_blank" rel="noopener">here</a>. You may need a valid account to gain access to NUS SoC Digital Library.)</em></p><h2 id="Common-Partition-Techniques"><a href="#Common-Partition-Techniques" class="headerlink" title="Common Partition Techniques"></a>Common Partition Techniques</h2><p>Here, we refer to <strong>horizontal partitioning</strong>, which is also known as <strong>data sharding</strong>. Traditionally, there are 3 approaches to achieve data partitioning, namely, server-side partitioning, cluster proxy, and client-side partitioning.</p>
To Select the Correct Technical Stack for Webhttps://yunpengn.github.io/blog/2018/04/29/select-web-tech-stack/2018-04-29T14:04:05.000Z2019-07-13T09:42:23.414Z
<p>When I planned to upgrade the <a href="https://github.com/yunpengn/CS1101S-DG-Website" target="_blank" rel="noopener">CS1101S DG Website</a> project, selection of the technical stack became a big headache. The current decision is</p><ul><li>Backend: <a href="https://spring.io/projects/spring-boot" target="_blank" rel="noopener">Spring Boot 2.x</a></li><li>Frontend: <a href="https://vuejs.org" target="_blank" rel="noopener">Vue.js 2.x</a> + <a href="http://getbootstrap.com" target="_blank" rel="noopener">Bootstrap 4.x</a> (integrated with Vue.js using <a href="https://bootstrap-vue.js.org" target="_blank" rel="noopener">Bootstrap Vue</a>)</li></ul><p>In this post, I would like to present the decision-making process.</p><h2 id="What-are-the-possible-languages-frameworks"><a href="#What-are-the-possible-languages-frameworks" class="headerlink" title="What are the possible languages, frameworks?"></a>What are the possible languages, frameworks?</h2><p>Certainly, there are many different choices. Let’s compare them as follows. To select a backend framework, it is essentially to select a server-side programming language.</p><ul><li><strong>Java <em>(current choice)</em></strong>: good for scalability and maintainability, used in many enterprise applications. As a relatively <em>old</em> language, its robustness is no doubt.</li><li><strong>PHP</strong>: also a traditional choice. However, its performance is not as good as Java (since Java is a fully compiled language, PHP is parsed into opcode and sent to <a href="http://www.zend.com/en/resources/php-7" target="_blank" rel="noopener">Zend Engine</a>).</li><li><strong>Ruby</strong>: a dynamic-typed language, which becomes famous due to Ruby on Rails. You can write less code to achieve more functionalities. However, its performance is even worse and its development environment is also not trivial to set up.</li><li><strong>Node.js</strong>: a newer technology than others. It provides a unified language for both frontend and backend development. It is fast since it leverages JavaScript event loop to create non-blocking I/O.</li><li><strong>Python</strong>: clear and compact syntax that is helpful to developers. Similar to Ruby, it has potential performance issues.</li></ul>
Blogging with Hexo.jshttps://yunpengn.github.io/blog/2018/04/11/blog-with-hexo/2018-04-11T05:00:17.000Z2019-07-13T09:42:23.412Z
<p>As you may already know, this blog is built using <a href="https://hexo.io" target="_blank" rel="noopener">Hexo.js</a> with theme <a href="https://github.com/theme-next/hexo-theme-next" target="_blank" rel="noopener">Next</a>. In this post, I will discuss the reasons why I select this static site generator and this theme.</p><h2 id="Why-do-I-select-Hexo-js"><a href="#Why-do-I-select-Hexo-js" class="headerlink" title="Why do I select Hexo.js?"></a>Why do I select Hexo.js?</h2><ul><li>I want a blog website that only consists of static webpages. Thus, I cannot use any content management system (CMS) with dynamic pages, like <a href="https://wordpress.org/" target="_blank" rel="noopener">WordPress</a> and <a href="https://www.drupal.org/" target="_blank" rel="noopener">Drupal</a>.<ul><li>This provides me with more options to host it. For instance, <a href="https://pages.github.com/" target="_blank" rel="noopener">GitHub Pages</a> only supports static webpages.</li><li>Static webpages are generally faster. They do not need any server-side pre-rendering.</li></ul></li><li>It may be a waste of time to write raw HTML, CSS & JavaScript code for every page of the blog. Much of the code can be reused. Thus, I need a framework to help me generate the static webpages.</li><li>I want to develop in both Windows and Linux-based environment. This means some programming languages like Ruby may be troublesome. Thus, I will not choose engines like <a href="https://jekyllrb.com/" target="_blank" rel="noopener">Jekyll</a>.</li><li>The body of my blog posts should not be in plain text. I need basic styling of the text. Also, I may insert code snippets to technical posts sometimes.<ul><li>Therefore, the framework had better support <a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown</a> and/or <a href="http://www.methods.co.nz/asciidoc/" target="_blank" rel="noopener">AsciiDoc</a>.</li><li>I know how to use <a href="https://www.latex-project.org" target="_blank" rel="noopener">LaTeX</a>. My slides for my CS1101S classes are all typed in Latex with <a href="https://ctan.org/pkg/beamer" target="_blank" rel="noopener">Beamer</a> package. However, although LaTeX is very powerful, I have to say its syntax is way too complex.<ul><li>In fact, the <a href="https://github.com/theme-next/hexo-theme-next" target="_blank" rel="noopener">Next</a> theme also supports math equation rendering by either <a href="https://www.mathjax.org" target="_blank" rel="noopener">MathJax</a> or <a href="https://katex.org" target="_blank" rel="noopener">Katex</a>.</li></ul></li></ul></li></ul><p>Given all the factors mentioned above, I choose <a href="https://hexo.io/" target="_blank" rel="noopener">Hexo.js</a> in the end.</p>
Hello Worldhttps://yunpengn.github.io/blog/2017/10/13/hello-world/2017-10-13T04:40:58.000Z2019-07-13T09:42:23.412Z
<p>Welcome to <a href="https://hexo.io/" target="_blank" rel="noopener">Hexo</a>! This is your very first post. Check <a href="https://hexo.io/docs/" target="_blank" rel="noopener">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="https://hexo.io/docs/troubleshooting.html" target="_blank" rel="noopener">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues" target="_blank" rel="noopener">GitHub</a>.</p><h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure><p>More info: <a href="https://hexo.io/docs/writing.html" target="_blank" rel="noopener">Writing</a></p>