Anda

Notes of study, work and life


  • Home

  • Tags28

  • Categories2

  • Archives29

  • Search

O2O: logback tutorial

Posted on 2019-11-10 | In CS
Symbols count in article: 659 | Reading time ≈ 1 mins.

Logback

Description

Logback is a log module.
Logback-2019-8-1-12-20-49.png

Logback-2019-8-1-12-21-24.png

logger, appender, layout are tags of logback

Configuration

  1. Create a logback.xml in resources package

Logback-2019-11-10-12-31-22.png

  1. Defines some properties of the logback configuration.

Logback-2019-11-10-12-52-52.png

  1. Defines the appenders.

Logback-2019-11-10-13-1-13.png

Logback-2019-11-10-14-3-39.png

Info, Error is the same as Debug.
Logback-2019-11-10-14-4-14.png

Logback-2019-11-10-14-7-21.png

  1. Defines the logger

Logback-2019-11-10-14-19-22.png

Logback-2019-11-10-14-19-41.png

Debug: We should change all MaxHistory to maxHistory in the appenders.
Debug: We should change all append-ref to appender-ref in the appenders.

Practice

  1. Initialize a Logger instance in the AreaController.

2_Logback-2019-11-10-14-28-29.png

  1. Add some test info statements in the function.

2_Logback-2019-11-10-14-48-50.png

  1. Check the console and the file.
    2_Logback-2019-11-10-14-49-12.png

Here we can search catalina.base to get to the directory of our log files.
2_Logback-2019-11-10-14-52-23.png

2_Logback-2019-11-10-14-51-30.png

O2O: Project Structure and Configuration

Posted on 2019-08-01 | Edited on 2019-11-10 | In CS
Symbols count in article: 13k | Reading time ≈ 12 mins.

Notes of O2O projcet

Tech stack

  1. SSM: Spring, SpringMVC, MyBatis

    Project Structure

    General Structure

    Front End

    Notes-2019-7-30-21-48-14.png

Shop

Notes-2019-7-30-21-48-32.png

Notes-2019-7-30-21-49-14.png

Administrator

Notes-2019-7-30-21-49-0.png

Product

Notes-2019-7-30-21-50-17.png

Concrete Class

Notes-2019-7-29-17-21-23.png

Area

Notes-2019-7-29-17-22-14.png
1. Build the class in Java
Notes-2019-7-29-17-50-54.png

1. Build the table in database
Notes-2019-7-29-17-52-58.png

Notes-2019-7-29-19-47-45.png

Q: Why do we use DateTime rather than TimeStamp in mySQL?
A: Because DateTime has a larger range of 0000 -> 9999 year, while TimeStamp can only varies between 1970 and 2037. However, TimeStamp can automatically adjust to the current time zone.

Q: What’s the difference between InnoDB and MYISAM?
A: MYISAM has a table lock that doesn’t allow multiple threads modifying the table while current thread working on it. InnoDB is locked by row. Also, MYISAM has a better read performance, while InnoDB has a better write performance.

PersonInfo

Notes-2019-7-29-19-48-53.png

Notes-2019-7-29-19-51-51.png

Notes-2019-7-30-16-10-28.png

Notes-2019-7-30-16-10-50.png

WechatAuth and LocalAuth

Notes-2019-7-30-16-12-50.png

Notes-2019-7-30-16-14-45.png

Notes-2019-7-30-16-16-31.png

Notes-2019-7-30-16-23-45.png

Notes-2019-7-30-16-57-58.png

Notes-2019-7-30-16-58-44.png

Here we should also change open_id in tb_wechat_auth into a unique key.

Q: Why do we want to set username to be unique?
A: Because we want to improve the performance of lookup.

HeadLine

Notes-2019-7-30-17-8-50.png

Notes-2019-7-30-17-12-3.png
Notes-2019-7-30-17-15-35.png

ShopCategory

Notes-2019-7-30-17-19-17.png

Notes-2019-7-30-20-57-2.png

parentId is like a recurrence relation here. if the parent is null, that means the category is the root. Any child node can traverl all the way back to root.

Shop

Notes-2019-7-30-21-5-5.png

Notes-2019-7-30-21-11-59.png

Notes-2019-7-30-21-12-20.png

ProductCategory

Notes-2019-7-30-21-15-54.png

Notes-2019-7-30-21-16-44.png

ProductImg

Notes-2019-7-30-21-22-37.png

Notes-2019-7-30-21-23-26.png

Product

Notes-2019-7-30-21-27-50.png

Notes-2019-7-30-21-33-45.png

Notes-2019-7-30-21-34-3.png

Notes-2019-7-30-21-34-24.png

Configurations

Packages of the projects

Notes-2019-7-31-8-43-55.png

Dependencies

  1. Junit

    Has to be higher than 4.12, otherwise there’s error.

  2. Logback
    Notes-2019-7-31-8-47-55.png

Here we want to remove \test\, since we also want it to appear in the actual scope.

  1. spring

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    <properties>
    <spring.version>4.3.7.RELEASE</spring.version>
    </properties>

    <dependencies>
    <!-- Spring -->
    <!-- 1)包含Spring 框架基本的核心工具类。Spring 其它组件要都要使用到这个包里的类,是其它组件的基本核心 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-core</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 2)这个jar 文件是所有应用都要用到的,它包含访问配置文件、创建和管理bean 以及进行Inversion of Control
    / Dependency Injection(IoC/DI)操作相关的所有类。如果应用只需基本的IoC/DI 支持,引入spring-core.jar
    及spring-beans.jar 文件就可以了。 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-beans</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 3)这个jar 文件为Spring 核心提供了大量扩展。可以找到使用Spring ApplicationContext特性时所需的全部类,JDNI
    所需的全部类,instrumentation组件以及校验Validation 方面的相关类。 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-context</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 4) 这个jar 文件包含对Spring 对JDBC 数据访问进行封装的所有类。 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-jdbc</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 5) 为JDBC、Hibernate、JDO、JPA等提供的一致的声明式和编程式事务管理。 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-tx</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 6)Spring web 包含Web应用开发时,用到Spring框架时所需的核心类,包括自动载入WebApplicationContext特性的类、Struts与JSF集成类、文件上传的支持类、Filter类和大量工具辅助类。 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-web</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 7)包含SpringMVC框架相关的所有类。 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-webmvc</artifactId>
    <version>${spring.version}</version>
    </dependency>
    <!-- 8)Spring test 对JUNIT等测试框架的简单封装 -->
    <dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-test</artifactId>
    <version>${spring.version}</version>
    <scope>test</scope>
    </dependencies>
  2. Servlet
    Servlet serevice dependency

    1
    2
    3
    4
    5
    6
    <!-- Servlet web -->
    <dependency>
    <groupId>javax.servlet</groupId>
    <artifactId>javax.servlet-api</artifactId>
    <version>4.0.1</version>
    </dependency>
  3. json support for front-end

    1
    2
    3
    4
    5
    6
    <!-- json解析 -->
    <dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.9.9</version>
    </dependency>
  4. spring-core dependency

    1
    2
    3
    4
    5
    6
    <!-- Map工具类 对标准java Collection的扩展 spring-core.jar需commons-collections.jar -->
    <dependency>
    <groupId>commons-collections</groupId>
    <artifactId>commons-collections</artifactId>
    <version>3.2.2</version>
    </dependency>
  5. mybatis dependency

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    <!-- DAO: MyBatis -->
    <dependency>
    <groupId>org.mybatis</groupId>
    <artifactId>mybatis</artifactId>
    <version>3.5.1</version>
    </dependency>
    <dependency>
    <groupId>org.mybatis</groupId>
    <artifactId>mybatis-spring</artifactId>
    <version>2.0.1</version>
    </dependency>
  6. mysql
    mysql-connector-java supports sql api
    com.mchange supports connection pool

1
2
3
4
5
6
7
8
9
10
11
12
<!-- 数据库 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.mchange/c3p0 -->
<dependency>
<groupId>com.mchange</groupId>
<artifactId>c3p0</artifactId>
<version>0.9.5.4</version>
</dependency>
1
2
3
4
5
6
7
8
9
10
什么是连接池
数据库连接池负责分配、管理和释放数据库连接,它允许应用程序重复使用一个现有的数据库连接,而不是再重新建立一个。

为什么要使用连接池
 数据库连接是一种关键的有限的昂贵的资源,这一点在多用户的网页应用程序中体现得尤为突出。  一个数据库连接对象均对应一个物理数据库连接,每次操作都打开一个物理连接,使用完都关闭连接,这样造成系统的 性能低下。 数据库连接池的解决方案是在应用程序启动时建立足够的数据库连接,并讲这些连接组成一个连接池(简单说:在一个“池”里放了好多半成品的数据库联接对象),由应用程序动态地对池中的连接进行申请、使用和释放。对于多于连接池中连接数的并发请求,应该在请求队列中排队等待。并且应用程序可以根据池中连接的使用率,动态增加或减少池中的连接数。 连接池技术尽可能多地重用了消耗内存地资源,大大节省了内存,提高了服务器地服务效率,能够支持更多的客户服务。通过使用连接池,将大大提高程序运行效率,同时,我们可以通过其自身的管理机制来监视数据库连接的数量、使用情况等。 
---------------------
作者:CrankZ
来源:CSDN
原文:https://blog.csdn.net/crankz/article/details/82874158
版权声明:本文为博主原创文章,转载请附上博文链接!

Configurations

  1. jdbc
    File name is jdbc.properties, save it under src/main/resources

    1
    2
    3
    4
    jdbc.driver = com.mysql.jdbc.Driver
    jdbc.url = jdbc:mysql://localhost:3306/o2o?useUnicode=true&characterEncoding=utf8
    jdbc.username = root
    #jdbc.password = password
  2. mybatis
    File name is mybatis-config.xml, save it under src/main/resources

    1
    2
    3
    4
    5
    6
    7
    8
    <!-- 使用jdbc的getGeneratedKeys获取数据库自增主键值 -->
    <setting name="useGeneratedKeys" value="true" />

    <!-- 使用列标签替换列别名 默认:true -->
    <setting name="useColumnLabel" value="true" />

    <!-- 开启驼峰命名转换:Table{create_time} -> Entity{createTime} -->
    <setting name="mapUnderscoreToCamelCase" value="true" />
  3. spring
    spring-dao.xml

    1
    2
    3
    4
    5
    <!-- 配置整合mybatis过程 -->
    <!-- 1.配置数据库相关参数properties的属性:${url} -->
    <!-- 2.配置连接池属性 -->
    <!-- 3.配置SqlSessionFactory对象 -->
    <!-- 4.配置扫描Dao接口包,动态实现Dao接口,注入到spring容器中 -->

spring-service.xml

1
2
3
4
<!-- 扫描service包下所有使用注解的类型 -->
<!-- 配置事务管理器 -->
<!-- 注入数据库连接池 -->
<!-- 配置基于注解的声明式事务 -->

spring-web.xml

1
2
3
4
5
<!-- 配置SpringMVC -->
<!-- 1.开启SpringMVC注解模式 -->
<!-- 2.静态资源默认servlet配置 (1)加入对静态资源的处理:js,gif,png (2)允许使用"/"做整体映射 -->
<!-- 3.定义视图解析器 -->
<!-- 4.扫描web相关的bean -->

  1. web.xml
    Integrate the configuration of Spring

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
      <servlet>
    <servlet-name>spring-dispatcher</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <init-param>
    <param-name>contextConfigLocation</param-name>
    <param-value>classpath:spring/spring-*.xml</param-value>
    </init-param>
    </servlet>
    <servlet-mapping>
    <servlet-name>spring-dispatcher</servlet-name>
    <!-- 默认匹配所有的请求 -->
    <url-pattern>/</url-pattern>
    </servlet-mapping>
  2. Review
    Notes-2019-7-31-10-7-2.png

Test for DAO

Take Area class as an example, we try to implement its DAO.

Q: What is DAO?
A: DAO是Data Access Object数据访问接口,数据访问:顾名思义就是与数据库打交道。In computer software, a data access object (DAO) is an object that provides an abstract interface to some type of database or other persistence mechanism. By mapping application calls to the persistence layer, the DAO provides some specific data operations without exposing details of the database. This isolation supports the single responsibility principle. It separates what data access the application needs, in terms of domain-specific objects and data types (the public interface of the DAO), from how these needs can be satisfied with a specific DBMS, database schema, etc. (the implementation of the DAO).

We first implement the interface of user’s operation(such as insertion, deletion, update, query), and then map the DAO in the myBatis (mapper package) to the entity class. There we define the type of the operation and return type, then we can write SQLs to execute the actual queries.

Step:

  1. create interface AreaDao in o2o.dao
  2. create config file AreaDao.xml in resources.mapper

    Here we don’t need to implement the actual AreaDao class but simply provide an interface is enough. MyBatis will do the job for us. We just need to map it to MyBatis.

  3. Specify the namespace, id, and resultType
    Notes-2019-7-31-16-18-3.png
  4. Write the test for the dao.

First, create a BaseTest class in test/…/o2o/, which is used to initialize Spring and Junit configuration

Notes-2019-7-31-21-9-26.png

@RunWith and @ContextConfiguratiion are often used together. First one indicate what test runner we want to use for junit. @ContextConfiguration indicates the Config file of Spring DAO.

Then create a concrete test class AreaDaoTest in ~/o2o.dao extending the BaseTest.

Notes-2019-7-31-21-12-11.png

@Autowired is used to assemble the DAO class by Spring. Every time we declare an interface with @Autowired, Spring will automatically use its implementation class.

  1. Debug.
  • in jdbc.properties, password line cannot be omitted
  • jdbc.driver is deprecated, shoule be changed to com.mysql.cj.jdbc.Driver
  • Add &serverTimezone=UTC to jdbc url
  • Make sure all paths are configured correctly.

Test for Service

  1. Create AreaService interface in service package, which has a getAreaList API.
  2. Create AreaServiceImpl class in service.impl package, create a DAO and implement the API. Just simply return the DAO’s queryArea() API.

Notes-2019-7-31-21-34-33.png

  1. Create serevice.AreaServiceTest in test package, and add config file to the BaseTest.

    1
    @ContextConfiguration({"classpath:spring/spring-dao.xml", "classpath:spring/spring-service.xml"})
  2. Same as testDao, test the getAreaList() API we defined in interface AreaService. Here we changed a way to test it. Notice that since we configure the mapper with AreaDao.xml, we know that the result return from the database is exactly the field we defined in entity class Area.

Notes-2019-7-31-21-46-58.png

Test for Controller

  1. Since we use a super admin to control the Area, we create a superadmin package to manage related behaviour. Create web.superadmin.AreaController.
  2. Implement this class.

@Controller tag means this is an implementation class of Controller(just like @Service). Here we don’t have a Controller interface.

@RequestMapping means when we have a request calling superadmin, we direct it to our current class AreaController. Same as the following. Notice that we use lower case here for url.

@ResponseBody is used to write the data we have in controller to the body of response. Could be html, json or xml. In this case, we return json.

The reason why we use a Map<String, Object> here is because this data structure is like a json. We can put whatever we want as the value of a certain String, including error message.

Notes-2019-7-31-22-3-49.png

We can test it at http://localhost:8080/superadmin/listarea

and get something like

1
{"total":2,"rows":[{"areaId":2,"areaName":"Chicago","priority":2,"createTime":null,"lastEditTime":null},{"areaId":1,"areaName":"Seattle","priority":1,"createTime":null,"lastEditTime":null}]}

Debug: There’s a serious bug about bean multipartResolver. Annotate it will be fine.

Review

  1. Create the interface in DAO, and create certain APIs
  2. Create config file in mapper for this DAO, and Use SQL to execute operations on Database.
  3. Call DAO API in Service
  4. Call Service API in Controller and return it to front end(write to response).

Leetcode Summary

Posted on 2019-07-26 | Edited on 2019-09-25 | In CS
Symbols count in article: 18k | Reading time ≈ 17 mins.

Algorithm Review

Greedy

Name difficulty Solution Link
316. Remove Duplicate Letters Hard Find the current smallest achievable letter and continue searching the rest https://leetcode.com/problems/remove-duplicate-letters/description/
968. Binary Tree Cameras Hard Start from leaves, the parent of leaf always take less cameras to cover than the leaf itself carries a camera. Thus we derive from bottom to top and get the result. https://leetcode.com/problems/binary-tree-cameras/description/

Monotonic Stack

Name difficulty Solution Link
316. Remove Duplicate Letters Hard Use a increasing stack to keep the str in order and use visited to keep the uniqueness of the letters https://leetcode.com/problems/remove-duplicate-letters/description/
402. Remove K Digits Medium Keep the chars in increasing order and count the poped element. If we didn’t delete enough element we delete from the end https://leetcode.com/problems/remove-k-digits/description/
739. Daily Temperatures Medium classic Monostack Practice https://leetcode.com/problems/daily-temperatures/description/
901. Online Stock Span Medium Keep a decreasing stack, and keep track the price and its appearace. If we find something to insert, just update it with the previous appearance. https://leetcode.com/problems/online-stock-span/description/
962. Maximum Width Ramp Medium Maintain a decreasing stack of the array(the first one). Then search from the tail and try to enlarge the interval. https://leetcode.com/problems/maximum-width-ramp/description/
907. Sum of Subarray Minimums Medium Maintain nextMin[] and prevMin[], then we scan from left to right, update res by A[i] (i-prevMin[i])(nextMin[i] - i). Note that the default of next should len(A), the other is -1 https://leetcode.com/problems/sum-of-subarray-minimums/description/
1124. Longest Well-Performing Interval Medium Same as maximum width ramp. Transform into subarray sum > k problem. https://leetcode.com/problems/longest-well-performing-interval/description/
84. Largest Rectangle in Histogram Hard Use single monotonic stack, or 2 mono stack expanding each item. https://leetcode.com/problems/largest-rectangle-in-histogram/description/

Binary Search

Name difficulty Solution Link
658. Find K Closest Elements Medium Use binary search to find a size k interval. Compare the distance from left bound to the x and right bound to x https://leetcode.com/problems/find-k-closest-elements/description/
392. Is Subsequence Easy Practice bin search. https://leetcode.com/problems/is-subsequence/description/

Prefix Sum

Name difficulty Solution Link
548. Split Array with Equal Sum Mediumn find all possible positions of j, and find all possiblt positions of i and k. Use the prefixSum to check if we have seen the same interval sum. https://leetcode.com/problems/split-array-with-equal-sum/description/

Divide & Conquer

Name difficulty Solution Link
312. Burst Balloons Hard dp(left, right) = nums[left] nums[mid] nums[right] + dp(left, mid) + dp(mid, right), left < mid < right https://leetcode.com/problems/burst-balloons/description/
53. Maximum Subarray Easy find the mid pos, solve its left and right subproblem, get the current result from mid to both ends, then merge the result. https://leetcode.com/problems/maximum-subarray/description/

DP

Name difficulty Solution Link
312. Burst Balloons Hard dp(left, right) = nums[left] nums[mid] nums[right] + dp(left, mid) + dp(mid, right), left < mid < right. Find all possible length first, then search from left to right, from short to long of all subarrays(Bottom up) https://leetcode.com/problems/burst-balloons/description/
53. Maximum Subarray Easy dp[i] = dp[i-1] > 0?:dp[i-1] + nums[i]: nums[i] https://leetcode.com/problems/maximum-subarray/description/
801. Minimum Swaps To Make Sequences Increasing Medium if is already valid, swap[i] = swap[i-1] + 1, keep[i] = keep[i-1. It we can swap, keep[i] = min(keep[i], swap[i-1]), swap[i] = min(swap[i], keep[i-1] + 1 https://leetcode.com/problems/minimum-swaps-to-make-sequences-increasing/description/
750. Number Of Corner Rectangles Medium dp[i][j] means the number of matches between colomn i and j. Loop all items and scan the right of it, update res = dp[i][k]++, when grid[i][j] == 1 and grid[i][k] == 1. https://leetcode.com/problems/number-of-corner-rectangles/description/
1140. Stone Game II Medium dp(ind, m) = max points of cur player given ind and m. We search next player’s possible points and pick the smallest. then return all points- opponent’s points. can use a postsum array to optimize https://leetcode.com/problems/stone-game-ii/description/
221. Maximal Square Medium dp[i][j] means the length of the square whose bottom-right corner is current pos. dp[i][j] = min(dp[i,j-1], dp[i-1, j], dp[i-1, j-1])+1 if grid[i][j] != 0, else 0. Can further optimize the space to On https://leetcode.com/problems/maximal-square/description/
688. Knight Probability in Chessboard Medium dp[k][i][j] = sum(dp[k-1][i-dx][j-dy]) where dx dy in 8 directions https://leetcode.com/problems/knight-probability-in-chessboard/description/
879. Profitable Schemes Hard dp[i, j] = given profit i and j people, the total number of profitable schemes.dp[i+p, j+g] += dp[i][j], the target we want is dp[P, _] and dp[P++, _]. Thus we combine all the answers to dp[P, _]. https://leetcode.com/problems/profitable-schemes/description/
1155. Number of Dice Rolls With Target Sum Medium Similar to Coin Change. dp[i][j] += dp[i-1][j-k] for k in [1…f]. Note that dp[0][0] should be 1. https://leetcode.com/problems/number-of-dice-rolls-with-target-sum/description/
265. Paint House II Hard Simple DP. We can optimize it to Onk and O1 space by finding the minimum 2 items of each level while traversing. https://leetcode.com/problems/paint-house-ii/description/

Design Datastructre

Name difficulty Solution Link
641. Design Circular Deque Medium Use a doubly linkedlist https://leetcode.com/problems/design-circular-deque/description/
146. LRU Cache Medium HashMap + Double Linkedlist https://leetcode.com/problems/lru-cache/description/
460. LFU Cache Hard HashMap + LinkedHashSet. Use a map to store key to val, a map to store key to freq, a map to store freq to list of keys. Linked hash set can keep the keys in order and have a O1 insert and delete https://leetcode.com/problems/lfu-cache/description/
381. Insert Delete GetRandom O(1) - Duplicates allowed Hard TODO: implement it with LinkedHashSet. Use HashMap + array. Hashmap stores the value and its index in the array. Array stores the value and its relative index in the hashmap https://leetcode.com/problems/insert-delete-getrandom-o1-duplicates-allowed/description/

Sliding Window

Name difficulty Solution Link
713. Subarray Product Less Than K Medium Use a sliding window which product is less than k, and add the size of window to res https://leetcode.com/problems/subarray-product-less-than-k/description/
1151. Minimum Swaps to Group All 1’s Together Medium maintain a window which size is the same as total number of 1s in the array, in which have as many 1s as possible https://leetcode.com/contest/biweekly-contest-6/problems/minimum-swaps-to-group-all-1s-together/
683. K Empty Slots Hard First record which bulb is turned on on each day, then try to find an subarray such that we have k+2 items in the subarray, and the items on both ends should be smaller than that in the middle. Then we try to find this window from left to right. https://leetcode.com/problems/k-empty-slots/description/

Tree

Name difficulty Solution Link
663. Equal Tree Partition Medium Store the sum of every subtree and check if totalSum/2 in these subtrees https://leetcode.com/problems/equal-tree-partition/description/
662. Maximum Width of Binary Tree Medium Change the val of the nodes to the corresponding index in a full binary tree, then do a level order traversal, track the largest distance https://leetcode.com/problems/maximum-width-of-binary-tree/description/
652. Find Duplicate Subtrees Medium Encode the tree and track the appearance of all encoded strings https://leetcode.com/problems/find-duplicate-subtrees/description/
549. Binary Tree Longest Consecutive Sequence II Medium helper gets a node and return the longest inc num and decnum. For each node, compare to its left and right, find current lonest inc num and dec num, then update the res with curInc + curDec - 1 https://leetcode.com/problems/binary-tree-longest-consecutive-sequence-ii/description/
515. Find Largest Value in Each Tree Row Medium Recursion: check the depth and the size of the res to see if we find the first node in the level. Non-recursion: simple level order traversal https://leetcode.com/problems/find-largest-value-in-each-tree-row/description/
968. Binary Tree Cameras Hard 0 for camera, 1 for need to be covered, 2 for doesn’t need to be covered, do a DFS on the root. Finally check if the root itself needs to be covered by a virtual parent. https://leetcode.com/problems/binary-tree-cameras/description/
545. Boundary of Binary Tree Medium First add the root -> add the left bound -> add the leaves of left subtree -> add the leaves of right subtree -> add the right bound reversely(postorder) https://leetcode.com/problems/boundary-of-binary-tree/description/
222. Count Complete Tree Nodes Medium A regular tree question. Use the full binary tree characteristic. Check the right and left height and one of them must be a full binary tree, and use the formula will do. https://leetcode.com/problems/count-complete-tree-nodes/description/
1161. Maximum Level Sum of a Binary Tree Medium level order/DFS level sum https://leetcode.com/contest/weekly-contest-150/problems/maximum-level-sum-of-a-binary-tree/

Graph

Name difficulty Solution Link
1135. Connecting Cities With Minimum Cost Medium TODO https://leetcode.com/contest/biweekly-contest-5/problems/connecting-cities-with-minimum-cost/
417. Pacific Atlantic Water Flow Medium BFS/DFS from pacific and atlantic backwards. Search those points can reach both pacific and atlantic. https://leetcode.com/problems/pacific-atlantic-water-flow/description/
505. The Maze II Hard BFS and record the min distance to each reachable point, return the distance to targete point https://leetcode.com/problems/the-maze-ii/description/
834. Sum of Distances in Tree Hard First use a dfs(preorder) to build the count[] counting the number of nodes of each subtree. Then we use another dfs(postorder) to build the dist[]. dist[child] = dist[parent] - count[child] + N - count[child]. This means, from parent to child, we have counts[child] of nodes get 1 distance nearer, and N - counts[child] of nodes get 1 distance further. https://leetcode.com/contest/weekly-contest-84/problems/sum-of-distances-in-tree/
1168. Optimize Water Distribution in a Village Hard The key is to build a MST among all pipes and a dummy node that connects all other nodes with cost of building wells. Practice for Kruskal and Prim https://leetcode.com/contest/biweekly-contest-7/problems/optimize-water-distribution-in-a-village/

UnionFind

Name difficulty Solution Link
1135. Connecting Cities With Minimum Cost Medium TODO https://leetcode.com/contest/biweekly-contest-5/problems/connecting-cities-with-minimum-cost/

SubMatrix

For this type of question, always try to reduce the time complexity to n^3. Try to fix an edge/point and search other points.

Name difficulty Solution Link
1139. Largest 1-Bordered Square Medium Search all possible r for the square(big->small) and all possible root point, check if all 4 edges are 1s. Optimize it with helper matrix that tells how many consecutive 1s to the left/top of the current pos. https://leetcode.com/problems/largest-1-bordered-square/description/
764. Largest Plus Sign Medium track the minimum distance that a pos can strech. then return the max pos. https://leetcode.com/problems/largest-plus-sign/description/

String Parsing

Name difficulty Solution Link
640. Solve the Equation Medium Split the string with left and right. and then count the res of ax+b = cx+d. regex “(?=[+-])” means we want to match + or -a, but not including the sign into the split. https://leetcode.com/problems/solve-the-equation/description/
385. Mini Parser Medium Use stack to record each level of [], always operate on the latest level. Note that we need a basic level(stack = [NestedInteger()]) for return value here. https://leetcode.com/problems/mini-parser/description/
224. Basic Calculator Hard Store the result by level. Here we also need to store the sign of each level by pushing num first then sign. So when we push, we get the sign first, apply on the cur level, and add to previous level. https://leetcode.com/problems/basic-calculator/description/
726. Number of Atoms Hard Same as calculator. Store each level in the stack and merge the result to last level when we have ‘)’. https://leetcode.com/problems/number-of-atoms/description/
68. Text Justification Hard Similar to round robin, Calculate the num of space first, then use modular to pad space to each word one by one until finish. https://leetcode.com/problems/text-justification/description/
1106. Parsing A Boolean Expression Hard Use 2 stacks to store operands and operators https://leetcode.com/problems/parsing-a-boolean-expression/description/
150. Evaluate Reverse Polish Notation Medium Need to pay attention on the floor division problem and take care of the sign. https://leetcode.com/problems/evaluate-reverse-polish-notation/description/
394. Decode String Medium String parsing practice. https://leetcode.com/problems/decode-string/description/

HashMap

Name difficulty Solution Link
525. Contiguous Array Medium replace the 0 to -1, then we do the same thing as max subarray with sum k(k=0). We store the sum with the index, and if curSum already in the map, it means current interval is 0-1 balanced, we update the res. Otherwise, we update the map. https://leetcode.com/problems/contiguous-array/description/
554. Brick Wall Medium count the pos of all brick edges in map, and return the num of walls - edge with most appearance https://leetcode.com/problems/brick-wall/description/

Substring/Subsequence/SubArray

Name difficulty Solution Link
395. Longest Substring with At Least K Repeating Characters Medium Searching from substring containing only 1 unique letter to 26 of them. Maintain a window using 2 pointers with certain unique letters, and check if this window is a valid window. The reason why window works here is we are dealing with substring, which is continuous. https://leetcode.com/problems/longest-substring-with-at-least-k-repeating-characters/description/
862. Shortest Subarray with Sum at Least K Hard Same idea. Use deque as a sliding window and keep the sum >= k. Use a prefix array and increasing monotonic stack to help control the sum calculation and size control. https://leetcode.com/problems/shortest-subarray-with-sum-at-least-k/

Trie

Name difficulty Solution Link
212. Word Search II Hard Classic Trie problem. Build Trie with all words and search each grid using backtrack. https://leetcode.com/problems/word-search-ii/description/
1166. Design File System Medium Practice for Trie Tree https://leetcode.com/contest/biweekly-contest-7/problems/design-file-system/

Math

Name difficulty Solution Link
204. Count Primes Easy Use Sieve of Eratosthenes, loop all n and try to get rid of all composites of every prime https://leetcode.com/problems/count-primes/description/
296. Best Meeting Point Hard Find the median of all houses.`Proof: totalX = x1 - m + x2-m + … + xn-m . `Derivative of totalX = -1 + 1 + -1 + … Since we want the derivative to be 0, thus, the number of -1 and 1 should equal. Thus we should have half of the xn smaller than m and half bigger, which is median. Same idea with the totalY. https://leetcode.com/problems/best-meeting-point/description/
264. Ugly Number II Medium Generate 1*2, 1*3, 1*5... by multiplying 2, 3, 5 to current minimum ugly number. We set up 3 pointers that can generate next *2, *3, *5ugly number, and keep track of them until we generate n number. https://leetcode.com/problems/ugly-number-ii/description/d

Stack

Name difficulty Solution Link
946. Validate Stack Sequences Medium simulate the process of stack operation. First push, then check if we shoud pop anything at this moment. Then check the final result https://leetcode.com/problems/validate-stack-sequences/description/
1003. Check If Word Is Valid After Substitutions Medium Reconstruct this string. the first c we met has to be the last abc we insert, thus previous 2 chars have to be a and b. Check this continously until the stack is empty https://leetcode.com/problems/check-if-word-is-valid-after-substitutions/description/

Segmant Tree

Name difficulty Solution Link
1157. Online Majority Element In Subarray Hard TODO https://leetcode.com/problems/online-majority-element-in-subarray/discuss/356227/C++-Codes-of-different-approaches-(Random-Pick-Trade-off-Segment-Tree-Bucket)

Random

Name difficulty Solution Link
1157. Online Majority Element In Subarray Hard Every time i pick a random element between bounds, And check to see if it’s a majority item by binary search its left and right pos. right_pos - left_pos is the number of this element, and we check it with threshold. https://leetcode.com/problems/online-majority-element-in-subarray/discuss/356227/C++-Codes-of-different-approaches-(Random-Pick-Trade-off-Segment-Tree-Bucket)

Backtrack

Name difficulty Solution Link
291. Word Pattern II Hard For each pattern char, search the following source to find a match, search deeper and backtrack. https://leetcode.com/problems/word-pattern-ii/description/

Bucket Sort

Name difficulty Solution Link
274. H-Index Medium Use a bucket to store the number of citations of each paper. If exceeds, merge the result to nth paper. Then check backwards, try to find a point where the total count > ith paper. That point is that at this point, we have this amount of citations (total count), that have at least this many citations(ith paper) https://leetcode.com/problems/h-index/description/

Notes of Algorithm

Posted on 2019-05-07 | Edited on 2019-08-12 | In CS
Symbols count in article: 69k | Reading time ≈ 1:03

Class 1 05/06/2019

Fibonacci numbers

Fibonacci numbers (Exponential Algorithm)

1
2
3
4
5
6
0, 1, 1, 2, 3, 5, 8, 13, 21
a0, a1, a3, a3, a4, a5, a6, a7

an = an-1 + an-2
a0 = 0
a1 = 1

Code

1
2
3
4
5
6
fib1(int n) {
if (n == 0 || n == 1) {
return n;
}
return fib1(n-1) + fib1(n-2);
}

Time Complexity

Answer O(2^n)

Less intuitive, not like

1
2
for i in range(10):
a[i] = a[i-1] + 2

Recursion Tree

$ T(n) = \text {runtime of fib1(n)}$

1
2
3
4
5
6
7
              T(n)
/ \
T(n-1) T(n-2)
/ \ / \
T(n-2) T(n-3) T(n-3) T(n-4)
......
T(1) T(0) T(1) T(0) ...

When hit 0 or 1, the recursion ends.

1
2
3
4
5
6
7
8
9
10
11
            /\
/ \
/____\
/ /
/ _ /
/ _/
/_/
left side = n
right side = n/2

T(n) = num of nodes in this recursion tree
1
2
3
4
5
in first half of this tree
/\
/ \
/____\
num of nodes = 2^k-1 = 2 ^ (n/2) - 1

Thus we have

$ 2 ^ {n/2} <= T(n) <= 2 ^ n $

Therefore

$ 2 ^ {n/2} = (\sqrt {2}) ^ n ~= 1.4n $

Clearly, this is a bad time complexity algorithm.

1
2
3
4
5
6
7
8
9
                  T(n)
/ \
T(n-1) T(n-2)
/ \ / \
T(n-2) T(n-3) T(n-3) T(n-4)

There are a lot of repetition in the process of calculation.

e.g. T(n-2) T(n-3)

Linear algorithm

Code

1
2
3
4
5
6
7
8
fib2(int n) {
int[] A = new int[n];
A[0] = 0; A[1] = 1;
for (int i = 2; i <= n; i++) {
A[i] = A[i-1] + A[i-2];
}
return A[n];
}

Time Complexity

Linear time: O(n)

Constant space algorithm

Code

1
2
3
4
5
6
7
8
fib2(int n) {
int[] A = new int[2];
A[0] = 0; A[1] = 1;
for (int i = 2; i <= n; i++) {
A[i%2] = A[(i-1)%2] + A[(i-2)%2];
}
return A[n%2];
}

Analysis

1
2
3
i varies between 0 and 1
A[1] = A[0] + A[1]
A[0] = A[1] + A[0]

Matrix solution

Idea

$$
A = \left[
\begin{matrix}
1 & 1 \
1 & 0
\end{matrix}
\right]
$$

$$
A^2 = \left[
\begin{matrix}
1 & 1 \
1 & 0
\end{matrix}
\right] *
\left[
\begin{matrix}
1 & 1 \
1 & 0
\end{matrix}
\right] =
\left[
\begin{matrix}
2 & 1 \
1 & 1
\end{matrix}
\right]
$$

1
2
3
4
5
6
7
8
9
10
11
12
13
A = | 1  1 | = | a2  a1 |
| 1 0 | | a1 a0 |

A^2 = | 1 1 | | 1 1 | | 2 1 | | a3 a2 |
| 1 0 | * | 1 0 | = | 1 1 | = | a2 a1 |

A^3 = | 2 1 | | 1 1 | | 3 2 | | a4 a5 |
| 1 1 | * | 1 0 | = | 2 1 | = | a3 a2 |

.....

A^k = | a(k+1) ak |
| ak ak-1 |

Time Complexity

Basic: O(n)

Optimized: O(logn)

Strategy

Since we are dealing with A^n, the optimized time complexity of pow(x, n) is O(logn)

1
2
3
4
A^n =   |A(n/2)^2         | n is even
|A((n-1)/2)^2 * A | n is odd

A^64 = (A^32)^2 = ((A^16))^2^2 = A^(2^6)

Asymptotic Notation

Comparison of each notation

Compare Function Growth Compare Numbers f(n) = 5n g(n) = n^2
O <= f(n) = O(g(n))
o < f(n) = o(g(n)
Θ = for f(n) = 100n and g(n) = n + 100
Ω >= f(n) = Ω(g(n))
ω > f(n) = ω(g(n))

Mathematical Derivation

f(n) = O(g(n))

1
2
iff exists constants C and n0
s.t. when n > n0, f(n) <= C * g(n)

1
2
3
lim(n->inf) f(n)/g(n) =     0           | fn = O/o(g(n))
constant | fn = O/Θ/Ω(g(n))
inf | fn = Ω/ω(g(n))

Practice:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
lim(n->inf) (100n)/(n+100) = 100 
fn = O/Θ/Ω(g(n))

f(n) = nlgn g(n) = n^2
f(n) = O/o(g(n))

f(n) = 2^n g(n) = 3^n
f(n) = O/o(g(n))
lim(n->inf) f(n)/g(n) = lim(n->inf) (2/3)^n = 0

f(n) = n+3 g(n) = 2 * sqrt(n)
f(n) = Ω/ω(g(n))


f(n) = log2(n) g(n) = log3(n)
f(n) = Ω/O/Θ(g(n))
lim(n->inf) f(n)/g(n) = lim(n->inf) log2n/log3n = log3/log2

f(n) = (logn)^2 g(n) = log(n^2) = 2logn
f(n) = Ω/ω(g(n))

Master’s Theorem

Idea of merge sort

T(n) = 2T(n/2) + n

1
2
3
4
5
6
7
             T(n) n                         | n
/ \
T(n/2) n/2 T(n/2) n/2 | n
/ \ / \
T(n/4) T(n/4) T(n/4) T(n/4) | n
......
T(1) T(1) T(1) T(1) ... | n

Time Complexity of merge sort

T(n) = height T(level) = logn n = nlogn

Master’s Theorem

1
2
3
4
5
6
7
T(n) = a * T(n/b) + n^c, where a, b, c are constants

Examples:
T(n) = 2 * T(n/2) + n
T(n) = 8 * T(n/2) + n^2
T(n) = T(n/2) + 1
T(n) = T(n/3) + T(2n/3) + n (Not master's theorem)

Now we have to compare logb(a) and c

  • logb(a) > c, T(n) = Θ(n^(logb(a)))
  • logb(a) = c, T(n) = Θ((n^c) * logn)
  • logb(a) < c, T(n) = Θ(n^c)

Practice

1
2
3
4
5
6
T(n) = T(n/2) + 1   T(n) = Θ(lgn)
T(n) = 8T(n/2) + n^2 T(n) = Θ(n^3)
T(n) = 7T(n/2) + n^2 T(n) = Θ(n^(log2(7)))
T(n) = 4T(n/2) + n^2 T(n) = Θ(n^2 * (logn))
T(n) = T(n/2) + n T(n) = Θ(n)
T(n) = T(n-1) + n (Not master's theorem)

Draw the Recursion Tree of the last one

1
2
3
4
5
6
7
8
9
10
11
12
13
Recursion Tree of T(n) = T(n-1) + n 

T(n) n
T(n-1) n-1
T(n-2) n-2
...
T(1) 1




T(n) = 1 + 2 + 3 + ... + n = n(n+1)/2
Θ(n) = n^2

Other useful Summation Formula

1
2
3
4
5
1 + 2 + 3 + .. + n = n(n+1)/2

q^0 + q^1 + q^2 + ... q^n = (q^(n+1) -1)/(q-1)

1 + 1/2 + 1/3 + ... + 1/n = Θ(ln(n))

Normal case

T(n) = T(n/3) + T(2n/3) + n

1
2
3
4
5
6
7
             T(n) n                         | n
/ \
T(n/3) n/3 T(2n/3) 2n/3 | n
/ \ / \
T(n/9) T(2n/9) T(2n/9) T(4n/9) | n
......
T(1) T(1) T(1) T(1) ... | n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
            /\
/ \
/____\
\ \
\ \
\ \
\_\

left side = log3n
right side = log3/2(n)

T(n) = num of nodes in this recursion tree

n * log3(n) <= T(n) <= n * log3/2(n)

Time Complexity of Normal recursion tree

T(n) = Θ(nlgn)

Some examples

1
2
3
T(n) = T(n/4) + T(3n/4) + n             T(n) = Θ(nlgn)
T(n) = T(n/1000) + T(999n/1000) + n T(n) = Θ(nlgn)
T(n) = T(n-1) + T(1) + n T(n) = Θ(n^2)

Quick Sort

Partition

1
2
3
4
5
6
7
8
9
10
Given array like

|1 |9 |2 |8 |3 |8 |7 |5 |6 |


Choose a pivot, put all num < pivot in left, others in right

| nums <= pivot |pivot| nums > pivot |

The process is partition(A[], i, j)

Code

1
2
3
4
5
6
QuickSort(int[] A, int i, int j) {
if (i >= j) {return;}
int pivot_pos = Partition(A, i, j);
QuickSort(A, i, pivot_pos-1);
QuickSort(A, pivot_pos+1, j);
}

Time Complexity

1
2
3
4
5
6
T(n) = 2T(n/2) + n = Θ(nlogn)
T(n) = T(n/3) + T(2n/3) + n = Θ(nlogn)
T(n) = T(n/1000) + T(999n/1000) + n = Θ(nlogn)

Worst Case
T(n) = T(n-1) + n = Θ(n^2)

Average T(n) = Θ(nlogn)

Worst T(n) = Θ(n^2)

Code of Partition

1
2
3
4
5
6
pivot = A[j]
| i | ... | p | p+1 | ... | cur | ... | j |

i - p: <= pivot
p - p+1: > pivot
cur: current pointer
  • A[cur] > pivot

    1
    cur += 1
  • A[cur] <= pivot

    1
    2
    3
    swap(A[cur], A[p+1]);
    p += 1;
    cur += 1
1
2
3
4
5
6
7
8
9
10
11
12
Partition(int[] A, int i, int j) {
int pivot = A[j];
int p = i - 1;
for (int cur = i; cur <= j-1; cur++) {
if (A[cur] <= pivot) {
swap(A[cur], A[p+1]);
p += 1;
}
}
swap(A[j], A[p+1]);
return p+1;
}

Class 2 05/13/2019

Review

Review of quicksort

Stable Sort

1
2
3
4
5
6
7
| ... | 5(1) | ...  |  ...  | 5(2) |  ...  |

After sort

| ... | 5(1)| 5(2) | ... |

If two 5 are still in their relative positions, then this is a stable sort.

Runtime of Quick Sort

Runtime:

Average: $O(nlogn)$
Worst: $O(n^2)$

The worst case is $T(n) = T(n-1) + n$

Pivot Selection

  • Median of Three
1
| i | ... | (i+j)/2 | ... |  j  |

Heap Sort

Types of Heap

  1. Max Heap
  2. Min Heap

Operations (of Heap)

  • Insert(E) $O(logn)$
  • getMax() $O(1)$
  • deleteMax() $O(logn)$

Operations (of Sorted Array)

  • Insert(E) $O(n)$
  • getMax() $O(1)$
  • deleteMax() $O(1)$

Comparison of array and heap

The worst time of heap operation is logn.

However, A sorted array takes On time to insert, which is the bottleneck of the overall time complexity.

Logical View and Physical View

Logical View

1
2
3
4
5
6
7
8
9

7(1)
/ \
3(2) 6(3)
/ \ /
1(4) 2(5) 5(6)

Almost-Full Binary Tree
Parent must be >= both children

Physical View

1
2
3

1 2 3 4 5 6
Array | 7 | 3 | 6 | 1 | 2 | 5 |

helper functions of heap

i is the index

parent(i) = i / 2
leftChild(i) = i 2
rightChild = i
2+1

getMax() O(1)

getMax() - return A[1]

insert() O(logn)

Case 1

1
2
3
4
5
6
7
           7(1)
/ \
3(2) 6(3)
/ \ / \
1(4) 2(5) 5(6) 3(7)

No need to adjust

Case 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
           7(1)
/ \
3(2) 6(3)
/ \ / \
1(4) 2(5) 5(6) 8(7)

Swap 8 with 6

7(1)
/ \
3(2) 8(3)
/ \ / \
1(4) 2(5) 5(6) 6(7)

Swap 7 with 8

8(1)
/ \
3(2) 7(3)
/ \ / \
1(4) 2(5) 5(6) 6(7)

Complete inserting.

1 2 3 4 5 6 7
Array(after insert) | 8 | 3 | 7 | 1 | 2 | 5 | 6|

Code of insert()

1
2
3
4
5
6
7
8
9
10
11
12
13
void insert(int e) {
// increase the size n of the heap
n += 1;
// insert the item to heap A
A[n] = e;
// Adjust from index i=n
int i = n;
// if cur node has a parent and the parent is smaller
while (parent(i) > 0 && A[parent(i)] < A[i]) {
swap(A[parent(i)], A[i]);
i = parent(i);
}
}

deleteMax()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
           8(1)
/ \
3(2) 7(3)
/ \ / \
1(4) 2(5) 5(6) 6(7)

Swap 6 with 8, and delete 8

6(1)
/ \
3(2) 7(3)
/ \ /
1(4) 2(5) 5(6)

1 2 3 4 5 6
Array(after swap) | 6 | 3 | 7 | 1 | 2 | 5 |

Swap 6 with 7

7(1)
/ \
3(2) 6(3)
/ \ /
1(4) 2(5) 5(6)

1 2 3 4 5 6
Array(after swap) | 7 | 3 | 6 | 1 | 2 | 5 |

Heap is balanced.
deleteMax() completed.

heapify()

1
2
3
4
5
6
7
8
9
10
           6(1)
/ \
3(2) 7(3)
/ \ /
1(4) 2(5) 5(6)

The subTree of root 3 is a MaxHeap
The subTree of root 7 is a MaxHeap
However, the tree of root 6 is not a MaxHeap.
Now we need the heapify().

Code of heapify()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
void heapify(int[] A, i) {
int largest = i;
// Find the largest of cur root(i) and its children
// if left child exists and left child is bigger than cur root
if (leftChild(i) <= n && A[leftChild(i)] > A[largest]) {
largest = leftChild(i);
}
// if left child exists and left child is bigger than cur root
if (rightChild(i) <= n && A[rightChild(i)] > A[largest]) {
largest = rightChild(i);
}

// check largest
if (largest == i) return;

// let the cur largest node be the root
swap(A[i], A[largest]);
// continue heapifying down the swapped node recursively
heapify(A, largest);
}

Code of deleteMax()

1
2
3
4
5
6
7
8
void deleteMax() {
// swap the cur root with the last node
swap(A[1], A[n]);
// delete the last node
n -= 1;
// heapify the whole heap
heapify(A, 1);
}

Heap Sort

Max Heap

1
2
3
4
5
6
7
8
9
          1   2   3   4   5   6  
Array | 7 | 3 | 6 | 1 | 2 | 5 |

Pretend we are deleting the max val.(deleteMax())

1 2 3 4 5 || 6
Array(after deleted) | 5 | 3 | 6 | 1 | 2 || 7(max) |
Now we get the biggest node in the tail of the array.
continue sort the rest.

Code of Heap Sort

1
2
3
4
5
void heapSort(int[] A) {
for(int i = n; i >= 1; i--) {
deleteHeap(A, 1, i);
}
}

makeHeap()

Solution 1 insert n times O(nlogn)

Solution 2

1
2
3
4
5
6
7
8
           6(1)
/ \
3(2) 7(3)
/ \ /
1(4) 2(5) 5(6)

n = 3 is the last non-trivil node
we heapify the 7, 3, 6 nodes, where we ensure that all children are valid heap.

Code of makeHeap()

1
2
3
4
5
// On time 
// n/2 means the last non-trivil node in the heap
for (int i = n/2; i >=1; i--) {
heapify(A, i);
}

Example of heap sort

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
First, make this into a heap.

9(1)
/ \
2(2) 6(3)
/ \ /
1(4) 4(5) 3(6)

1 2 3 4 5 6
Array | 9 | 2 | 6 | 1 | 4 | 3 |

call makeHeap() on 6

call makeHeap() on 2

9(1)
/ \
4(2) 6(3)
/ \ /
1(4) 2(5) 3(6)

1 2 3 4 5 6
Array | 9 | 4 | 6 | 1 | 2 | 3 |

call deleteHeap() on 9

3(1)
/ \
4(2) 6(3)
/ \
1(4) 2(5)

1 2 3 4 5 6
Array | 3 | 4 | 6 | 1 | 2 || 9 |

call heapify() on 3

6(1)
/ \
4(2) 3(3)
/ \
1(4) 2(5)

1 2 3 4 5 6
Array | 6 | 4 | 3 | 1 | 2 || 9 |

call deleteHeap() on 6

2(1)
/ \
4(2) 3(3)
/
1(4)

1 2 3 4 5 6
Array | 2 | 4 | 3 | 1 || 6 | 9 |

call heapify() on 2

4(1)
/ \
2(2) 3(3)
/
1(4)

1 2 3 4 5 6
Array | 4 | 2 | 3 | 1 || 6 | 9 |

call deleteHeap() on 4

1(1)
/ \
2(2) 3(3)
/
4(4)

1 2 3 4 5 6
Array | 1 | 2 | 3 || 4 | 6 | 9 |
...
The array is sorted.

Sorting Algorithms

Types of sorting algorithms

Algorithm Time Complexity Notes
Quick Sort $O(nlogn)$
Merge Sort $O(nlogn)$
Heap Sort $O(nlogn)$
Bubble Sort $O(n^2)$
Insertion SOrt $O(n^2)$
Selection SOrt $O(n^2)$
Counting SOrt $O(n)$ Non-comparison based
Radix SOrt $O(n)$ Non-comparison based

Theorem of comparison based sorting algorithm

Comparison based sorting algorithm always run in $Ω(nlogn)$.

Counting Sort

1
2
3
4
5
6
7
8
9
10
11
12
Assume we have array A
The input nums should satisfy 0 <= A[i] <= k

input A | 2 | 1 | 3 | 5 | 4 | 2 | 1 | 0 | 3 |

0 1 2 3 4 5
bucket B | 1 | 2 | 2 | 2 | 1 | 1 |

0 1 2 3 4 5
bucket B' | 1 | 3 | 5 | 7 | 8 | 9 |

output C | 0 | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 5 |

Code of Counting Sort

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
//Step of counting sort
// fill the num in the buckets
for (int i = 0; i < n; i++) {
B[A[i]] += 1
}

// indicates where the curr digit ends in the ouput array
for (int i = 1; i <= k; i++) {
B[i] += B[i-1];
}

// fill the digit into the array
for (int i = n-1; i >= 0; i--) {
C[B[A[i]]] = A[i];
B[A[i]] -= 1;
}

return C;

problems of Counting Sort

Q: Why 0-based?
A: To represent the case where the input has 0.

Q: what happened if we have negative input?
A: Map the input to the form we want. e.g. [-10, 10] -> [0, 20]

Q: Is this algorithm stable?
A: In this case, yes.

Analysis of Counting Sort

Run time: $O(n+k)$
Space: $O(k)$

Radix Sort

Introduction

Also known as digit sort.

Only for positive base 10 integers.

Example

1
2
3
4
5
6
7
8
9
10
Use the counting sort to sort the colomn

100 100 100 4
23 41 4 13
123 512 512 23
512 23 013 41
4 => 123 => 23 => 100
13 13 123 123
41 4 41 512
999 999 999 999

Analysis

d is the number of digits, k is the range of the inputs(e.g. 32 bit integer)
Run time: $O((n+k)d)$
Space:

e.g.

1
2
3
4
32-bit int
| 8-bit | 8-bit | 8-bit | 8-bit |
d = 4 times of counting sort
k = 2^8 = 256

Class 3 05/20/2019

Questions about Order Statistics

Given a collection of integers, get the min/max value, median, 25% element, rth smallest element, etc.

Question: How do we find the rth smallest element?

Solution1 Sort the array

  1. sort array
  2. return A[k]

The run time should be O(nlgn)

Solution2 Based Partition

Idea

1
2
3
4
5
6
7
8
9
10
11
12
Definition 2

| 1 | ... | start | ... | end | ... | 10 |

A | 1 | 2 | 3 | 6 | 7 | 5 | 4 |

pos = partition(A, 1, 7, 5)
pos = 3

if we get the 3 as the pivot position, which means we still have to find 2 more elements

we continue searching on kth(A, 4, 7, 2)

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// definition 1
int findKth(int[] A, int k) {
int pos = partition(A, 1, n);
if (pos == k) return A[k];
if (pos > k) return findKth(A, 1, pos);
if (pos < k) return findKth(A, pos+1, n);
}

// definition 2
// global index
// rank of number with A[begin...end]
int findKth(int[]A, int begin, int end, int k) {

}

Analysis

$ T(n) = T(n/2) + n = O(n)$

1
2
3
4
Example 
T(n) = T(2n/3) + n = O(n)
T(n) = T(99n/100) + n = O(n)
T(n) = T(n-1) + n = O(n^2)

Find Top K

Solution 1 Sort the array

  1. Sort the array
  2. return A[1:k]

Time: $ O(nlogn) $

Solution 2 findKth algorithm

  1. partition
  2. return A[1:k]

Time: $ O(n) $

Solution 3 Min Heap

  1. make a min heap
  2. delete k times

Time: $O(n + klogn)$ (depends on which one’s bigger)

Solution 4 Max Heap

  1. m make a max heap A[1:k]
1
2
3
4
5
for (i = k+1, ... , n)
if(A[i] < getMax()) {
deleteMax();
insert(A[i]);
}

Time: $ O(k + (n-k)*logk)$
When n is large, $ O(nlogk) $
When k is large, $ O(k) $

Big Integer Multiplication

for integers of 32-bit(unsigned), we have a range of $ (0, 2^{32}-1) $, which are 10 digits

for integers of 64-bit(unsigned), we have a range of $ (0, 2^{64}-1) $, which are 10 digits

RSA

An encryption algorithm using large integer multiplication.

Idea

Input: 2 integers of n-bits
Example:

1
2
3
    1110 = 14
* 0101 = 5
1000110 = 70

Time : $ O(n^2) $

Divide and Conquer

a can be split to a1 | a2

for example,
a = 1110 = $ 11 * 2^2 + 10 $ (Shift left by 2 bit)

Therefore,
$ a = a_12^{n/2} + a_2 $
$ b = b_1
2^{n/2} + b_2 $
$ a b = a_1b_12^{n} + (a_1b_2+a_2b_1)2^{n/2} + a_2b_2 $

We divide the 1 problem to 4 child problem.

T(n) = 4T(n/2) + n(sum 2 n-bit integers) = O(n^2)

Improve the running time

$ p_1 = a_1b_1$
$ p_2 = a_2b_2$
$ p_3 = (a_1+a_2)(b_1+b_2)$
$ a b = p_12^{n} + (p_3-p_1-p_2)2^{n/2} + p_2 $

We can save the Time to

T(n) = 3T(n/2) + n
$ T(n) = O(n^{log_2{3}}) $

Extra cases

What if 2 integers are not with the same digits?
Pad the 0’s to the nearest exponential of 2.

1
2
3
For example
110 -> 0110
10010 -> 00010010

Square Matrix Multiplication

Idea

input: 2 n*n matrices

$
X = \left[
\begin{matrix}
A & B \
C & D
\end{matrix}
\right]_{n*n}
$

$Y = \left[
\begin{matrix}
E & F \
G & H
\end{matrix}
\right]_{n*n}
$

$ X*Y = O(n^3) $

Here we use the block of matrix to Divide & Conquer

$
X = \left[
\begin{matrix}
1 & 1 \
1 & 0
\end{matrix}
\right]_{n*n} = \left[
\begin{matrix}
A & B \
C & D
\end{matrix}
\right]
$

$Y = \left[
\begin{matrix}
1 & 1 \
1 & 0
\end{matrix}
\right]_{n*n} = \left[
\begin{matrix}
E & F \
G & H
\end{matrix}
\right]
$

$ XY = \left[
\begin{matrix}
{AE+BG} & {AF+BH} \
{CE+DG} & {CF+DH}
\end{matrix}
\right]_{n
n}
$

Therefore, $ T(n) = 8T(\frac{n}{2}) + n^2 = O(n^3)$

Optimization

$ XY = \left[
\begin{matrix}
{AE+BG} & {AF+BH} \
{CE+DG} & {CF+DH}
\end{matrix}
\right]_{n
n} = \left[
\begin{matrix}
{Q} & {R} \
{S} & {T}
\end{matrix}
\right]
$

$Assume:$
$P_1 = A(F-H)$
$P_2 = H(A+B)$
$P_3 = E(C+D)$
$P_4 = D(G-E)$
$P_5 = (A+D)(E+H)$
$P_6 = (B-D)(G+H)$
$P_7 = (A-C)(E+F)$
$Q = P_4 + P_5 - P_2 + P_6$
$R = P_1 + P_2$
$S = P_3 + P_4$
$T = P_1 + P_5 - P_3 - P_7$

$ X*Y = \left[
\begin{matrix}
{AE+BG} & {AF+BH} \
{CE+DG} & {CF+DH}
\end{matrix}
\right] = \left[
\begin{matrix}
{P_4 + P_5 - P_2 + P_6} & {P_1 + P_2} \
{P_3 + P_4} & {P_1 + P_5 - P_3 - P_7}
\end{matrix}
\right]
$

Therefore, $ T(n) = 7T(\frac{n}{2}) + n^2 = O(n^3)$, Successfully optimized.

Counting Inversions

Idea

1
2
3
4
5
6
Inversion: if i < j && A[i] > A[j], then (i, j) is an inversion.

If my favorite list is [1, 2, 3], num of inversion is 0
If list is [2, 1, 3], num of inversion is 1
If list is [2, 3, 1], num of inversion is 2
If list is [3, 2, 1], num of inversion is 3, we have a completely different taste

Max num of inversions of a n list is $ C^{2}_{n} = O(n^2) $

Code

1
2
3
4
5
6
7
8
9
// Brute force
int count = 0;
for (int i = 0; i <=n; i++) {
for(int j = i+1; j <=n; j++) {
if (A[i] > A[j]) {
count++;
}
}
}

Better Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

1 n/2 n/2+1 n
A | x | y |

num of cross inversions + x + y


begin mid mid+1 end
A | 1 3 5 | 0 2 6 |
p q

given A[p] = 1, A[q] = 0, A[p] > A[q], we have inv += 3, q++
A[p] < A[q], p++
A[q] < A[p], inv += 2, q++
A[p] < A[q], p++
A[p] < A[q], p++, end

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int CountInv(int[] A, int begin, int end) {
if (begin >= end) return 0;
int mid = (mid + end) / 2;
int x = CountInv(A, begin, mid);
int y = CountInv(A, mid+1, end);
int z = CrossInv(A, begin, mid, end);
return x + y + z;
}

// o nlogn time
int CrossInv(int[] A, int begin, int mid, int end) {
sort(A, begin, mid);
sort(A, mid+1, end);
int p = begin, q = mid + 1, count = 0;
while (p <= mid && q <= end) {
if (A[p] < A[q]) p++;
else {
count += mid - p+1;
q++;
}
}
return count;
}

Analysis

$ T(n) = 2T(\frac{n}{2}) + nlogn = O(nlognlogn)$

Optimization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int CountInv(int[] A, int begin, int end) {
if (begin >= end) return 0;
int mid = (mid + end) / 2;
int x = CountInv(A, begin, mid); // sort the left
int y = CountInv(A, mid+1, end); // sort the right
int z = CrossInv(A, begin, mid, end); // count the cross
//merge left and right
merge(A, begin, mid, end);
return x + y + z;
}

// on time
int CrossInv(int[] A, int begin, int mid, int end) {
int p = begin, q = mid + 1, count = 0;
while (p <= mid && q <= end) {
if (A[p] < A[q]) p++;
else {
count += mid - p+1;
q++;
}
}
return count;
}

$ T(n) = 2T(\frac{n}{2}) + n = O(nlogn)$

Example

1
2
3
4
5
6
7
8
9
        | 7 | 2 | 3 | 4 | 1 | 6 | 5 | 8 |
| 2 | 3 | 4 | 7 | 1 | 6 | 5 | 8 |
/ \
3
| 2 | 7 | 3 | 4 | | 1 | 6 | 5 | 8 |

/ \ / \
1 0
| 7 | 2 | | 3 | 4 | | 1 | 6 | | 5 | 8 |

Class 4 06/03/2019

Binary Search Tree and Hashtable

Difference between two data structures

Hashing BST
O(1) h insert(key,value)
O(1) h lookup(key)
O(1) h delete(key)
amartized running time O(logn) basically
support traversing by ordered keys

Binary Search Tree

Definition

1
2
3
4
5
class Node {
int key;
Node left;
Node right;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
     5
/ \
3 8
\
4
This is a BST

5
/ \
3 8
\
6
This is not a BST
  1. Left subtree is BST
  2. Right subtree is BST
  3. root.key > all keys in left subtree
    and root.key < all keys in right subtree

Tree Traversals

  1. Level-order
  2. Pre-order
  3. Post-order
  4. In-order

Level-order Traversal

1
2
3
4
5
6
7
8
9
10
void LevelOrder(Node root) {
Queue<Node> q = new LinkedList<>();
q.offer(root);
while (!q.isEmpty()) {
Node cur = q.poll();
// processing on cur; print cur.key;
if (cur.left != null) q.offer(cur.left);
if (cur.right != null) q.offer(cur.right);
}
}
Right Side View Question
1
2
3
4
5
6
     5
/ \
3 8
\
4
we get the right side view as [5, 8, 4]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// we can just modify the original code by
cur, curLevel = q.dequeue();
res[cur_level] = cur.key


class Solution {
// level order traversal
// on time on space
public List<Integer> rightSideView(TreeNode root) {
List<Integer> res = new LinkedList<>();
if (root == null) return res;
LinkedList<TreeNode> queue = new LinkedList<>();
queue.offerLast(root);
int size = 1;
while (!queue.isEmpty()) {
int temp=0;
for (int i = 0; i < size; i++) {
TreeNode cur = queue.pollFirst();
if (cur.left != null) queue.offerLast(cur.left);
if (cur.right != null) queue.offerLast(cur.right);
temp = cur.val;
}
res.add(temp);
size = queue.size();
}
return res;
}
}

Pre-order Traversal

1
2
3
4
5
6
void PreOrder(Node cur) {
if (cur == null) return;
Process(cur);
PreOrder(cur.left);
PreOrder(cur.right);
}

In-order Traversal

1
2
3
4
5
6
     5
/ \
3 8
\
4
we get the inorder node list as [3, 4, 5, 8], which is sorted.

Represent duplicate keys in BST

Requirement

Has to support root >= all left and root < all right.

  1. MultiSet
  2. MultiMap
1
2
3
4
5
6
7
     5
/ \
5 8
/
3
\
4

Implementation

Lookup

1
2
3
4
5
6
boolean Lookup(Node cur, int key) {
if (cur == null) return false;
if (cur.key == key) return true;
if (cur.key < key) return Lookup(cur.right, key);
return Lookup(cur.left, key);
}

Insert

1
2
3
4
5
6
7
8
9
10
Node insert(Node cur, int key, String val) {
if (cur == null) return new Node(key, val);
if (key == cur.key) cur.val = val;
else if (key < cur.key) {
cur.left = insert(cur.left, key, val);
} else {
cur.right = insert(cur.right, key, val);
}
return cur;
}

Delete

1
2
3
4
5
6
7
8
9
10
11
12
13
          5
/ \
3 6
/ \
2 4

Delete 5(2 children)

4
/ \
3 6
/
2

Case 1: No children
update parent’s reference

Case 2: Single child
Parent pointing to child

Case 2: two children

  1. find predecessor p
  2. swap
  3. delete recursively
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Definition for a binary tree node.
# class TreeNode:
# def __init__(self, x):
# self.val = x
# self.left = None
# self.right = None

class Solution:
# O logn time O logn space
# Recursion
# recursively find the root to delete
# 3 edge cases
# if no children, then simply return None
# if 1 left child, then find the predecessor, swap, and delete the predecessor
# if 1 right child, then find the successor, swap, and delete the successor
# in common case, a node has both children, in this case we can find the predecessor as newNode or successor as new Node
def deleteNode(self, root: TreeNode, key: int) -> TreeNode:
if not root:
return None
if root.val > key:
root.left = self.deleteNode(root.left, key)
elif root.val < key:
root.right = self.deleteNode(root.right, key)
else:
if not root.left and not root.right:
root = None
elif root.left:
root.val = self.getPrev(root)
root.left = self.deleteNode(root.left, root.val)
else:
root.val = self.getNext(root)
root.right = self.deleteNode(root.right, root.val)
return root
def getPrev(self, root):
temp = root.left
while temp.right:
temp = temp.right
return temp.val
def getNext(self, root):
temp = root.right
while temp.left:
temp = temp.left
return temp.val

Class 5 06/10/2019

Recap of BST

Predecessor and Successor

Logical view

1
2
3
4
5
6
7
8
9
          5
/ \
3 8
/ \ / \
2 4 6 10
/ \
1 7
pre(8) = 7
pre(6) = 5

Predecessor

  1. case1: rightmost node of the left subtree
  2. case2: find smaller node on the ancestor path

Re-Definition of node

1
2
3
4
5
6
class Node {
int key;
Node left;
Node right;
Node parent;
}

Find Kth key in BST

Idea of In-order traversal

We can find the kth key by stopping at kth step during in-order traversal.

Our goal is to control the time in O(h).

Re-definition of Node

1
2
3
4
5
6
class Node {
int key;
Node left;
Node right;
Node parent;
}

Logical View

1
2
3
4
5
      5(5)
/ \
3(3) 10(1)
/ \
2(1) 4(1)

Code

1
2
3
4
5
6
int kth(Node root, int k) {
rank = root.left.size + 1;
if (rank == k) return root;
if (rank < k) return kth(root.right, k - rank);
return kth(root.left, k);
}

Example

1
2
3
4
5
6
7
8
9
          5(5)
/ \
3(3) 10(1)
/ \
2(1) 4(1)

to find kth(4), rank = 4 = k, we just return 5

to find kth(3), rank = 2 < k, we call the func on 4(1), rank = 1 = k, return 4

Hashing/ HashTable

Pre-hash and Hash

Difference

Pre-Hash Hash
int/float/string/user-define -> int arbitrary large int -> reasonable int
Can only return small integers

Example

HashTable<String , T> means we have String as key, T as value.

1
2
3
4
class Pair {
int x;
int y;
}

HashTable<Pair, T> means we represent a coordinate as key.

In java, we overwrite the HashCode() as the pre-hash function.

For a String , a practical pre-hash function would be calculating the ascii code.

ASCII is a 8 bit representation of a char data type.

e.g.

For “hello” and “hell”, we have

1
2
Hashcode("hello") = 'h' * 41^3 + 'e' * 41^2
+ 'l' * 41^1 + 'l'

p.s. the hashcode of single letter depends on the encoding way. If simple english, we just use its ascii. If unicode, we use another way.

e.g. To represent Pair, we can use x^y as its hash function.

SubSet : We can split the string and calculate each part of it, while we have a bigger chance to collision and worse time consumption.
Caching : For extra long String, we can only call the hash function once, and store the pre-hash value in the memory.

HashTable

Idea

key = Integer[in large space]
Suppose key ranges from (0, m-1), then we have to map the integer from 0 to m-1
insert(key)
hash: int -> int [0, m)

How do we choose m?

m changes dynamically

Load Factor $ \alpha = n/m $
n = number of k-v pairs
m = size of current hashtable
Here, we increase m exponentially.
e.g. m = 1024, next m would be 2048

Question: Why don’t we increase m by 100 each time?
Answer: Suppose we have $ \alpha = 1$, then every time we hit the bottle neck, we have to find a continuous memory to store all the stuff, which is a expensive On operation.

Question: what happens if we do insert n times with m increasing by 100?
Answer:

1
2
3
4
5
6
7
8
9
10
11
12
13
C, C, C, ..., C, 100C
C, C, ..., 200C
C, C, ..., 300C
C, C, ..., n/100C

Overall: 100 + 200 + 300 + .. n/100 ~= O(n^2) time, which is super expensive.

C, C, C, ..., C, n/lognC
C, C, ..., n/8C
C, C, ..., n/4C
C, C, ..., n/2C

Overall: n/logn + n/8 + n/4 + .. n/2 ~= O(n) time, which is much better.

Question: what happens if we have to adjust the size?
Answer: Suppose $ \alpha < 0.2$, we would like to shrink the size by half. If $ \alpha > 0.8$, we would like to double the size. The threshold shold be decided wisely.

what should be the hash function?

hash function: (large space)int -> int [0, m)

Division Method

$ H(x) = x \% m $

If m is a power of 2, then this function is bad. e.g. If m = 1024, if pre-hash only consists of even numbers, then half of the space is wasted. We should pick prime number as m. (disadvantage)

Multiplication Method

$ H(x) = a * x \% 2^{32} >> (32 - r) $
$ m = 2^r $
a is constant chosen randomly.

Explanation: Since m = 2^r, we only need r bit to represent the hashcode. Suppose x is 32 bit, and a is 32 bit, then a*x is 64 bit. Then we mod it by 2^32, where we throw away half of the number. Next we right shift till we get only r digits, so that the hashtable con contain the num.

1
2
3
x = | 1 | 1 | 0 | ... | 1 | 0 | 1| (32)
a = | 0 | 0 | 1 | ... | 1 | 0 | 1| (32)
a is random and every 1 means we duplicate a x and left shift by x k digits. In this way the middle part is more random than right part, because e use more information of thee original x in the middle(than the edge).

Collision

Birthday Paradox

Suppose we randomly select x people, what’s the probability that two people have same birthday?

To get a 50% of prob, we should select 23 people.

This example showws the impact of collision.

There are 2 ways to deal with collision, Chaining, Probing.

Chaining

$ h(x_{1}) = k $
$ h(x_{2}) = k $
Then we have key k -> $x_1$ => $x_2$

run time O(1 + $ \alpha $)
$ \alpha $ is load factor n/m
worst case insert: len n/m logn

Probing

Here we are talking about Linear Probing.

$ h(x_{1}) = k $
$ h(x_{2}) = k $
Then we have key k ($x_1$) -> n slots taken -> (k+n) $x_2$

Delete: When wew delete an element, all the following elements have to be re-hashed.

Suppose $ \alpha = n/m = 0.1$
90 % of time is empty
Therefore we need $1 / (1-\alpha)$ space to get a slot.

time complexity: O n

Class 6 06/17/2019

Dynamic Programming

Steps of DP problem

  1. Situations of DP

    • Optimization : max/min problem
    • Counting: binomial
    • Feasibility: if the target is possible to get
  2. Design recursion

  3. Naive recursive implementation yields bad running time(Usually exponential)

    • e.g. Recompute the same thing. Fibonacci.
  4. Caching

    • bottom -> up( small -> large)
    • top -> down with memoization

Longest Common Subsequences

Description

1
2
3
4
5
6
input: A[1 ... n]
B[1 ... n]
e.g.
A = [1, 5, 2, 3, 4]
sub can be [1] [1, 3] [1, 2, 4]
not [3, 2, 4]

Solution

1
2
3
4
5
6
7
8
9
10
11
12
define f(i, j) = length of the LCS given A[1 ... i] & B[1 ... j]

A | 1 | ... | i |
B | 1 | ... | j |

case 1: A[i] = B[j]
f(i, j) = f(i-1, j-1) + 1

case 2: A[i] != B[j]
f(i, j) = max(f(i-1, j), f(i, j-1))

base case: f(0, j) = f(i, 0) = 0

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// O 2^n time O m+n space
int LCS_slow(int[] A, int[] B, int i, int j) {
if (i == 0 || j == 0) return 0;
if (A[i] == B[j]) return 1 + LCS_slow(A, B, i-1, j-1);
return Math.max(LCS_slow(A, B, i, j-1), LCS_slow(A, B, i-1, j));
}

// 2-d array dp(bottom up)
// O mn time O mn space
int LCS(int[] A, int[] B, int i, int j) {
for (int i = 0 ... n) dp[i, 0] = 0;
for (int j = 0 ... m) dp[0, j] = 0;
for (int i = 1 ... n) {
for (int j = 1 ... m) {
if (A[i] == B[j]) dp[i][j] = dp[i-1][j-1] + 1;
else dp[i][j] = max(dp[i-1, j], dp[i, j-1]);
}
}
return dp[n, m]
}

// optimized recursive solution (top down)
int LCS(int[] A, int[] B, int i, int j, int[][] dp) {
if (i == 0 || j == 0) return 0;
if (dp[i][j] != -1) return dp[i][j];
if (A[i] == B[j]) {
dp[i][j] = LCS(A, B, i-1, j-1, dp) + 1;
} else {
dp[i][j] = Math.max(LCS(A, B, i-1, j, dp), LCS(A, B, i, j-1, dp)); }

return dp[i][j];
}

// get the result set out of 2-d ap array
// O m+n time
int constructSolution(int[] A, int[] B, int[][] dp) {
List<Integer> l = new ArrayList<>();
int i = A.length;
int j = B.length;
while (i > 0 && j > 0) {
if (A[i] == B[j]) {
l.add(A[i]);
i--;
j--;
} else if (dp[i-1][j] < dp[i][j-1]) {
j--;
} else i--;
}
return l.reverse();
}

// optimized space solution
// O min(m, n) space O mn time
// dp row = 2, col = n
int LCS(int[] A, int[] B, int i, int j) {
for (int i = 1 ... n) {
for (int j = 1 ... m) {
if (A[i] == B[j]) dp[i%2][j] = dp[(i-1)%2][j-1] + 1;
else dp[i%2][j] = max(dp[(i-1)%2, j], dp[i%2, j-1]);
}
}
return dp[n, m]
}

Code(Bottom Up)

1
2
3
4
5
6
7
8
9
10
11
def func(A, B):
m = len(A)
n = len(B)
dp = [[0 for _ in range(n+1)] for _ in range(m+1)]
for i in range(1, m+1):
for j in range(1, n+1):
if A[i-1] == B[j-1]:
dp[i][j] = dp[i-1][j-1]+1
else:
dp[i][j] = max(dp[i][j-1], dp[i-1][j])
return dp[m][n]
1
2
3
4
5
6
7
8
9
10
11
12
int lcs(int[] A, int[] B) {
int m = A.length;
int n = B.length;
int[][] dp = new int[m+1][n+1];
for (int i = 1; i < m+1; i++) {
for (int j = 1; j < n+1; j++) {
if (A[i-1] == B[j-1]) dp[i][j] = dp[i-1][j-1] + 1;
else dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
return dp[m][n];
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
List<Integer> constructSolution(int[] A, int[] B, int[][] dp) {
List<Integer> l = new ArrayList<>();
int i = A.length;
int j = B.length;
while (i > 0 && j > 0) {
if (A[i] == B[j]) {
l.add(A[i]);
i--;
j--;
} else if (dp[i-1][j] < dp[i][j-1]) {
j--;
} else i--;
}
Collections.reverse(l)
return l;
}

0-1 Knapsack Problem

Description

1
2
3
4
5
6
7
8
9
10
Assume we have n items
value = [...]
weight = [...]
capcity = C
We need to get the max value.
f(i, j) = item[0-i], capacity j
f(i, j) = max(v[i] + f(i-1, j-weight[i]), f(i-1, j))

base case
f(0, j) = f(i, 0) = 0

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
// O N*C time O n*c space -> O C space(optimized)
int knapsack(int[] values, int[] weights, int C) {
for (int i = 0; i < values.length; i++) {
for (int j = 0; j <= C; j++) {
if (weights[i] <= j) {
dp[i][j] = Math.max(dp[i-1][j], values[i] + dp[i-1][j-weights[i]];)
} else {
dp[i][j] = dp[i-1][j];
}
}
}
return dp[n][j];
}

Class 7 06/24/2019

Unbounded Knapsack

Description

Input: n types of items, for each type there are inf copies. For each item of type i, we have values[i] and weights[i]. We also have capacity of knapsack.

Naive approach

f(i) maximum value we can get
f(i) = max value given capacity i

1
2
3
4
5
6
7
8
9
a1 = V[1] + f(i-w[1])
a2 = V[2] + f(i-w[2])
...
a3 = V[3] + f(i-w[3])
f(i) = max(v[1], v[2], ... v[n])

base case:
f(0) = 0
T(n) -> n sub problems each time -> O(n^C)

Pseudocode

1
2
3
4
5
6
7
8
9
10
// O Cn time O C space
int knapsack(int[] v, int[] w, int C) {
int[] dp = new int[C+1];
for (int i = 1; i <= C; i++) {
for (int j = 0; j < v.length; j++) {
if (w[j] <= i) dp[i] = Math.max(dp[i], v[j] + dp[i-w[j]]);
}
}
return dp[C];
}

Coin Change

Description

1
2
3
Assume we have coins like 
d1 = 1, d2 = 5, d3 = 10, d4 = 25
If we have to get total = 3.37$, what's the minimum number of coins do we have to pick?

Naive approach

1
2
3
4
5
6
f(i) = min(
f(i - d[k]) + 1,
f(i - d[k-1]) + 1,
...
f(i - d[1]) + 1,
)

PseudoCode

1
2
3
4
5
6
7
8
9
10
11
12
int coinChange(int[] d, int n) {
int[] dp = new int[n+1];
for (int i = 1; i <= n; i++) {
dp[i] = i;
for (int j = 0; j < d.length; j++) {
if (d[i] <= i) {
dp[i] = Math.min(dp[i], dp[i-d[k]] + 1);
}
}
}
return dp[n];
}

Knapsack with multiple constraints

Description

1
2
3
4
5
6
Now the items have a volume constraint
n items v[i], w[i], u[i]
W - weight capacity
U - volume capacity
Only 1 copy of each item
What's the maximum value we can get?

Recursion

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
i = # of items of first i items
j = remaining weight capacity
k remaining volume capacity
f(i, j, k) = max value with constraints
f(i, j, k) = max(
f(i-1, j, k),
f(i-1, j-w[i], k-u[i]) + v[i]
)

base case
f(0, j, k) = f(i, 0, k) = f(i, j, 0) = 0

_________
/ /|
/_______/ |
| | |
| | |
| | /
|________|/

3 surfaces are the base case

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// O(nWU) time O(nWU) space -> O(WU) space
int knapsack(int n, int[] v, int[] w, int[] u, int W, int U) {
int[][][] dp = int[n+1][W+1][U+1];
for (int i = 1; i <= N; i++) {
for (int j = 1; j <= W; j++) {
for (int k = 1; k <= U; k++) {
dp[i][j][k] = dp[i-1][j][k];
if (w[i] <= j && u[i] <= k) {
dp[i][k][k] = Math.max(dp[i][j][k], v[i] + dp[i-1][j-w[i]][k-u[i]]);
}
}
}
}
return dp[n][W][U];
}

Palindrome

Description

1
2
3
4
ana
aaaa
banana -> b + anana
find the minimum palindromic subsequences for the word

Recursion

1
2
3
4
5
6
7
8
9
10
11
12
f(i) := min # of palindromes given A[1....i]
f("banana") = min(
f("banan") + 1,
f("b") + 1,
f("ban") + 1
)
f(i) = min(
f(i-1) + 1, if A[i] is palindrome
f(i-2) + 1, if A[i-1:i] is palindrome
f(i-3) + 1, ...
f(0) + 1, if A is a palindrome
)

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// O n^3 time O n space -> O n^3 time to O n^2 time optimizing isPalindrome, O n space to O n^2
int MinPalindrome(int[] A) {
int[] dp = new int[A.length+1];
for (int i = 1; i <= A.length; i++) {
dp[i] = i;
for (int j = 1; j < i; j++) {
if (isPalindrome(A.substring(j-1, i))) {
dp[i] = Math.min(dp[j-1]+1, dp[i]);
}
}
}
return dp[A.length];
}

/**
base case g(i, i) = true
g(i, i+1) = A[i] == A[i+1]
g(i, j) = A[i] == A[j] && g(i+1,j-1)
*/

boolean isPalindrome(int[] A, int i, int j) {
boolean[][] p = new int[A.length][A.length];
for (int i = 0; i < A.length; i++) p[i][i] = true;
for (int i = 0; i < A.length-1; i++) p[i][i+1] = A[i] == A[i+1];
for (int col = 2; col < A.length; col++) {
for (int row = 0; row <= col-2; row++) {
p[row][col] = A[row] == A[col] && p[row+1][col-1];
}
}
return p[i][j]
}

Class 8 07/08/2019

Matrix Multiplication

Description

1
2
3
4
5
6
7
8
9
10
A1 * A2 * ... * An


for A * B, the time would be O(pqr)

For A1 * A2 * A3, Assume A1(2 * 100), A2(100 * 5), A3(5 * 3)
We can do (A1 * A2) * A3, overall time is 2 * 100 * 5 + 2 * 5 * 3 = 103
Or we can do A1 * (A2 * A3), overall time is 100 * 5 * 3 + 2 * 100 * 3 = 2100

What's the minimum cost after re-arranging the order of calculation?

Solution

Explanation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
f(i, j) = min cost to multiply Ai * ..  * Aj

Assume i < k < j,

f(i, j) = f(i, k) + f(k+1, j) + m0 * nk * mj

For A1 * A2 * A3 * A4,

Option1: A1 * (A2*A3*A4) = f(2, 4) + p0p1p4

Option2: (A1*A2) * (A3*A4) = f(1, 2) + f(3, 4) + p0*p2*p4

Option2: (A1*A2*A3) * A4 = f(1, 3) + p0 * p3 * p4

Thus, f(i, j) = min(option1, option2, option3)

f(i, j) = min(f(i, k) + f(k+1, j) + pi-1*pk*pj, i <= k < j)

We can use dp[i][j] to store all the results, and return dp[0][n]

base case dp[i][i]

0 1 2 3 4
0 0
1 0 y x
2 0 v
3 0
4 0

x = 0 + y + v + 0
We search the left and the bottom of dp[i][j]

So we update dp[i][j] from left to right, from bottom to up

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// O n^3 time O n^2 space
int MatrixMultiplication(int[] p) {
int n = p.length;
int[][] dp = new int[n+1][n+1];
for (int j = 2; j <= n; j++) {
for (int i = j-1; i >= 1; i--) {
dp[i][j] = Integer.MAX_VALUE;
for (int k = i; k <= j-1; k++) {
dp[i][j] = Math.min(dp[i][j], dp[i][k] + dp[k+1][j] + p[i-1]*p[k]*p[j] )
}
}
}
return dp[1][n];
}

Sell products problem

Description

Assume we have a street of n houses, we have 3 colors of them, red, green and yellow. The profit of coloring them should be R[], G[], Y[]. We want to have maximum profit while coloring all n houses. No adjacent houses can be painted the same color.

Solution

Explanation

We assume g(i) = best way to color 1 … i, with ith house colored as green
r(i) = paint first ith house and last house is red
y(i) = same as above

g(i) = G[i] + max(r(i-1), y(i-1))
r(i) = R[i] + max(g(i-1), y(i-1))
y(i) = Y[i] + max(r(i-1), g(i-1))
r[1] = R[1]
g[1] = G[1]
y[1] = Y[1]

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
int maxProfit(int[] G, int[] R, int[] Y) {
int n = G.length;
int[] r = new int[n];
int[] g = new int[n];
int[] y = new int[n];
r[0] = R[0];
g[0] = G[0];
y[0] = Y[0];
for (int i = 1; i < n; i++) {
r[i] = R[i] + Math.max(g[i-1], y[i-1]);
g[i] = G[i] + Math.max(r[i-1], y[i-1]);
y[i] = Y[i] + Math.max(g[i-1], r[i-1]);
}
return Math.max(r[n-1], g[n-1], y[n-1]
);
}

Class 9 07/10/2019

Matrix problem

Description

Given n * n grid filled with positive integers, we want to find longest increasing path.

1
2
3
3 2 1
7 4 2
5 3 6

Solution 1((Bottom up))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
3 5 1
2 1 6
2 4 7

The intuition would be:
f(i, j) = max(
f(i-1, j) + 1 if A[i, j] > A[i-1, j],
f(i+1, j) + 1 if A[i, j] > A[i+1, j],
f(i, j-1) + 1 if A[i, j] > A[i, j-1],
f(i, j+1) + 1 if A[i, j] > A[i, j+1],
1
)

Sort the nums and its coordination.
1,1,2,2,3,4,5,6,7
(1,3), (2,2), (2,1)...

for dp matrix

_ _ 1
_ 1 _ 1
_ _ _

_ _ 1
2 1 _ 2
_ _ _

_ _ 1
2 1 _ 2
2 _ _

3 _ 1
2 1 _ 3
2 _ _

3 _ 1
2 1 _ 4
2 4 _

...

Time Complexity of bottom up approach is
O(n^2logn + n^2) = O(n^2logn)

Solution 2(Top down)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Top down approach, O(mn) time
int longestIncPath(int[][] A) {
int m = A.length, n = A[0].length;
int res = 0;
int[][] dp = new int[m][n];
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
compute(A, dp, i, j);
res = Math.max(res, dp[i][j]);
}
}
return res;
}

void compute(int[][] A, int[][] dp, int i, int j) {
int m = A.length, n = A[0].length;
if (dp[i][j] != -1) return dp[i][[j]];
dp[i][j] = 1;
if (i >= 1 and A[i][j] > A[i-1][j]) {
dp[i][j] = Math.max(dp[i][j], compute(A, dp, i-1, j)+1);
}
if (i < m - 1 and A[i][j] > A[i+1][j]) {
dp[i][j] = Math.max(dp[i][j], compute(A, dp, i+1, j)+1);
}
if (j >= 1 and A[i][j] > A[i][j-1]) {
dp[i][j] = Math.max(dp[i][j], compute(A, dp, i, j-1)+1);
}
if (i < n-1 and A[i][j] > A[i][j+1]) {
dp[i][j] = Math.max(dp[i][j], compute(A, dp, i, j+1)+1);
}
return dp[i][j];
}

Solution 3 (Graph & Toposort)

We can do a topological sort on this graph, in which small vertices point to bigger vertices. Then we start doing topo sort with indegree[0] nodes and search longest path. The maximum level of searching is the longest length of this increasing path.

1
2
3
4
5
6
1 -> 3 -> 2
| | |
7 <- 5 -> 8
| | |
2 <- 1 -> 4
Directed Graph

Longest Increasing Subsequence

Description

Given an array 1, 2, 4, 3, find the longest increasing subsequence, 1, 2, 3

Solution

Explanation

1
2
3
4
5
6
7
8
f(i) = length of LIS given A[1:i]
dp[i] = max(
1,
f(1) + 1, if A[1] < A[i],
f(2) + 1, if A[2] < A[i],
...
f(i-1) + 1, if A[i-1] < A[i],
)

Pseudocode

1
2
3
4
5
6
7
8
9
def func(A):
n = len(A)
dp = [0 for _ in range(n+1)]
for i in range(1, n+1):
dp[i] = 1
for j in range(1, i):
if A[i-1] > A[j-1]:
dp[i] = max(dp[i], dp[j] + 1)
return dp[n]

Edit Distance

Description

Given str1 and str2, we can do insert, delete and replace, find the shortest operations we need to turn str1 to str2

1
2
3
"abc" -> "abch"
"abc" -> "ab"
"abc" -> "abh"

Solution

Explanation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
f(i, j) = edit distance if given A[1:i] and B[1:j]


123456
A apple
B banana
for apple -> banan, then we just need to insert an "a" to A, to get B. Thus the result is f(5, 6) = f(5, 5) + 1.
For Insert,
f(i, j) = f(i, j-1) + 1

for appl -> banana, then it's the same as apple -> bananae, thus we need to remove the extra e.

Thus for removal, =
f(i, j) = f(i-1, j) + 1

for appl -> banan, we just replace e to a, will solve the problem.
For replace, f(i, j) = f(i-1, j-1) + 1


f(i, j) = max(
f(i-1, j) + 1, remove
f(i, j-1) + 1, insert
f(i-1, j-1) + 1, replace if A[i] != A[j],
f(i-1, j-1), doing nothing if A[i] == A[j]
)

Base case: f(0, j) = j, f(i, 0) = i

dp 0 1 2 3 ... n
0 0 1 2 3 ... n
1 1
2 2
...
m m

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int editDistance(String A, String B) {
int m = A.length(), n = B.length();
int[][] dp = new int[m+1][n+1];
for (int j = 0; j <= n; j++) dp[0][j] = j;
for (int i = 0; i <= m; i++) dp[i][0] = i;
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
dp[i][j] = Math.min(dp[i-1][j], dp[i][j-1]);
if (A.charAt(i-1) == B.charAt(j-1)) dp[i][j] = Math.min(dp[i][j], dp[i-1][j-1]);
else dp[i][j] = Math.min(dp[i][j], dp[i-1][j-1]+1);
}
}
return dp[m][n]
}

Class 10 07/15/2019

Graph Introduction

  1. Definition: G = (V, E)

  2. Type: {directed graph, undirected graph}

  3. Notation: |V| = n, |E| = m

  4. DataStructure: {Adjacent List, Adjacent Matrix}

  5. Storage: O(m+n) for Adj List, O(n^2) for Adj Matrix.

  6. Time complexity

Operation Adjacency List Adjacency Matrix
Existence of Edge O(d) O(1)
Traverse the node O(d) O(n)

BFS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
BFS(G, S) {
q.enqueue(S);
visited[S] = T;
while(!q.isEmpty()) {
u = q.dequeue();
// processing on u
for (v in G.neighbors(u)) {
if (!visited[v]) {
q.enqueue(v);
visited[v] = true;
}
}
}
}

// Track the distance from S to nodes
BFS2(G, S) {
q.enqueue(S);
visited[S] = T;
dist[S] = 0;
parent[S] = -1;
while(!q.isEmpty()) {
u = q.dequeue();
// processing on u
for (v in G.neighbors(u)) {
if (!visited[v]) {
q.enqueue(v);
dist[v] = dist[u] + 1;
parent[v] = u;
visited[v] = true;
}
}
}
}


// Traverse all nodes
BFS(G, S, visited){}
BFS_ALL(G) {
visited[1...n] = false;
for (s = 1, 2, ... n) {
if (visited[S] == false) {
BFS(G, S, visited);
}
}
}

Analysis

Time: O(m+n) if we take initialization into account.
We would like to use Adjacency List here.

Word Ladder

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
BFS(graph[], start, end, dic) {
q.enqueue(start);
visited.insert(start);
dist[start] = 0;
while(!q.isEmpty()) {
u = q.dequeue();
if (u == end) return dist[u];
for (v in graph[u]) {
if (!visited.contains(v) && dic.lookup(v)) {
visited.insert(v);
q.enqueue(v);
dist[v] = dist[u] + 1;
}
}
}
return -1;
}

puzzle - 8

Description

1
2
3
4
5
6
7
8
9
10
11
12
13
14
3 _ 5
6 1 4 A
2 7 8

1 2 3
4 5 6 B
7 8 _
After a sequence of moves, We have to turn A to B

It's a graph problem (undirected)

3 _ 5 _ 3 5 3 5 _ 3 1 5
6 1 4 -> 6 1 4 or 6 1 4 or 6 _ 4
2 7 8 2 7 8 2 7 8 2 7 8

Explanation

1
2
3
4
5
6
As you can see, each state is a node.\

1. We can encode each state as a string,
f(A) = "3X5614278"

2. We can use a 3X3 matrix, but need to implement a hash function. e.g. hashcode = 3 * 41^8 + 5 * 41^6 + ...

Pseudocode

1
2
3
4
5
// helper function
getNumber(String) -> List(String)
Puzzle(graph[Vertex -> List(Vertex)], Vertex start, Vertex end) {
// Same BFS
}

Connected Components(Undirected Graph)

Description

1
2
3
4
5
6
7
8
9
10
11
12
1 - 3
\ /
2

4 - 6
\
5

7

Question 1: number of connected components in graph
Question 2: given an arbitrary node, find out which components it belongs to.

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
Same as BFS_ALL
Use a CC array to store the components information with a pointer.
**/
BFS(G) {
visited[1...n] = false;
for (S = 1...n) {
if (!visited[s]) {
BFS(G, S, visited, cc, cur_cc);
cur_cc++;
}
}
}

BFS(G, S, visited, cc, cur) {
q.enqueue(s);
visited[s] = true;
cc[s] = cur;
while (!q.isEmpty()) {
u = q.dequeue();
for (v in G.neighbor(u)) {
if (!visited[v]) {
visited[v] = true;
q.enqueue(v);
cc[v] = cur;
}
}
}
}

Counting islands

Description

1
2
3
4
5
1 1 0 0 
1 0 1 1
0 0 1 1
1 1 0 0
How many islands do we have?

DFS

1
2
3
4
5
6
7
8
9
10
DFS(G, u, visited[]) {
visited[u] = true;
// processing with u;
for (v in G.neighbors(u)) {
if (!visited[v]) {
DFS(G, v, visited);

}
}
}

Connected Components

1
2
3
4
5
6
7
8
9
10
11
DFS_ALL(G) {
visited[1 ... n] = false;
cc[1 ... n] = 0;
cur = 1;
for (S = 1 ... n) {
if (!visited[s]) {
DFS(G, v, visited, cc, cur);
cur++;
}
}
}

Cycle Detection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// O 2^m time 
// backtrack
// we visite every possible path starting from the source node.
DFS(G, u) {
visited[u] = true;
for (v in G.neighbors(u)) {
if (!visited[v]) {
DFS(G, v, visited);
} else {
return false;
}
}
visited[u] = false;
return true;
}


//better solution

/**
white: new vertex
grey: on recursion call stack
black: visited but not on call stack
**/

boolean DFS(G, u, color[]) {
color[u] = grey;
for (v in G.neighbors(u)) {
if (color[v] == white) {
DFS(G, v, color[]);
} else if(color[v] == grey) {
return false;
}
}
color[u] = black;
return true
}

Class 11 07/22/2019

Review

Time Complexity of some algorithms

Algorithm Time Complexity
BFS O(m+n)
Connected Components O(m+n)
DFS O(m+n)
Cycle Detection O(m+n)

DAG

  1. DAG = Directed Acyclic Graph
  2. Topological Sort

Topological Sort

DFS Solution

Idea

1
2
3
4
5
6
7
8
9
G = [[1, 3], [2, 5], [2, 6], [4, 2], [4, 3], [3, 6]]

After DFS, we have
1 2 3 4 5 6
timer = [4, 3, 1, 2, 6, 5]

After reverse,
we have timer = [5, 6, 2, 1, 3, 4]
which is a valid Topological sort.

Step

  1. Use DFS_ALL() find finish time.

  2. Sort vertices by finish time in decreasing order.

  3. Doesn’t work for cyclic graph. Will stuck in the recursion forever.

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// We mark each node in the order we visit them
// if we sort the vertices by decreasing order of finishing time, then I have a valid topological sort
DFS_ALL(G) {
visited[1...n];
finish[1...n];
timer = 1;
for (u = 1 ... n) {
if (visited[u] == false) {
DFS(G, u, visited, finish, timer);
}
}

}

DFS(G, u, visited, finish, timer) {
visited[u] = true;
for (v in G.neighbor(u)) {
if (visited[v] == false) {
DFS(G, v, visited, finish, timer);
}
}
finish[u] = timer;
timer += 1;
}

// updated version using array as finish time recorder
DFS(G, u, visited, finish) {
visited[u] = true;
for (v in G.neighbor(u)) {
if (visited[v] == false) {
DFS(G, v, visited, finish);
}
}
finish.append(u);
}

Non DFS Solution

Idea

Record the indegree of the vertices and keep searching and updating the indegree map.

The running time is O(m+n)

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
TopoSort(G) {
for (u = 1 ... n) {
for (v in G[u]){
in[v]++;
}
}
for (u = 1... n) {
if (in[u] == 0) q.enqueue(u);
}
while (!q.isEmpty()) {
u = q.dequeue();
for (v in G[u]) {
in[v]--;
if (in[v] == 0) {
q.enqueue(v);
}
}
}
}

Boggle

Description

1
2
3
4
5
6
7
8
9
10
11
Given a 4X4 board

HEAT
ELOI
UYXB
PPAS

Goal:
- find all words
- can start at any position
- cannot reuse same pos in the word, but we can reuse them in different words

PseudoCode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// 4^16 Time Complexity. Terrible.
Boggle(B[]) {
for (int i = 1 ... 4) {
for (int j = 1 ... 4) {
DFS(B[], visited[], i, j, "", solution);
}
}
return solution;
}

DFS(B[], visited[], i, j, String cur, solution) {
visited[i, j] = true;
potential_word = cur + B[i, j];
if (dict.contains(potential_word)) {
solution.add(potential_word);
}
if (i > 1 && !visited[i-1, j]) {
DFS(B[], visited, i-1, j, potential_word, solution);
}
if (i < 4 && !visited[i+1, j]) {
DFS(B[], visited, i+1, j, potential_word, solution);
}
if (j > 1 && !visited[i, j-1]) {
DFS();
}
if (j < 4 && !visited[i, j-1]) {
DFS();
}
visited[i, j] = false;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def getWords(root):
temp = root
import collections
q = collections.deque()
q.append(temp)
res = []
while q:
cur = q.popleft()
if cur.isWord:
res.append(cur)
for i in range(26):
if cur.children[i]:
q.append(cur.children[i])
return res

class Trie:
def __init__(self):
self.children = [None for _ in range(26)]
self.word = ""
self.isWord = False

def boggle(board, dic):
# Build Trie
root = Trie()
for word in dic:
buildTrie(root, word)
m, n = len(board), len(board[0])
res = []
# DFS
for i in range(m):
for j in range(n):
dfs(i, j, board, root, res, set())
return res

def buildTrie(root, word):
temp = root
# Add all letters to the Trie as nodes
for letter in word:
ind = ord(letter) - ord("a")
if not temp.children[ind]:
temp.children[ind] = Trie()
# Update the child and temp
temp.children[ind].word = temp.word + letter
temp = temp.children[ind]
# Last letter node is a valid word
temp.isWord = True

def dfs(i, j, board, root, res, visited):
# Check if cur Trie Node exists and has been visited or not
ind = ord(board[i][j]) - ord("a")
if not root.children[ind] or (i, j) in visited:
return
# update cur Trie Node
root = root.children[ind]
# Memoization
visited.add((i, j))
# Check if we find a result
if root.isWord:
res.append(root.word)
# DFS
m, n = len(board), len(board[0])
for dx, dy in [[0, 1], [0, -1], [1, 0], [-1, 0]]:
tempX, tempY = i + dx, j + dy
if 0 <= tempX < m and 0 <= tempY < n:
dfs(tempX, tempY, board, root, res, visited)
# Backtrack
visited.remove((i, j))


board = [
["a", "a", "c", "d"],
["e", "l", "g", "h"],
["i", "j", "k", "l"],
["m", "l", "a", "a"],
]
dic = ["abc", "aal", "gkjlaaeimlaa", "lmijlgcaae"]
print(boggle(board, dic))

SCC

Description

Every node in this group have access to any other nodes.

d c
Strongly connected components u->v and v->u
Weakly connected components u->v or v->u

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
G = [[1, 3], [3, 2], [2, 1], [2, 4], [4, 5], [5, 4], [4, 6], [6, 7], [7, 8], [6, 8], [7, 9], [8, 9], [9, 6]]

1. sort by finish time(topological sort)

1 2 3 4 5 6 7 8 9
9 7 8 6 1 5 4 3 2


2. reverse edge direction and do the topo_sort
G = [[3, 1], [2, 3], [1, 2], [4, 2], [5, 4], [4, 5], [6, 4], [7, 6], [8, 7], [8, 6], [9, 7], [9, 8], [6, 9]]


Sort the sequence by decreasing order, and start BFS to this sequence.
9 7 8 6 1 5 4 3 2

Actually, we can start at any node we want when doing the topological sort.
We start from 1, and get 1, 3, 2
then start from 4, get 4, 5
then 6, get 6, 9, 8, 7
that's the scc groups we want

Steps

  1. Do topological sort on G(DFS)
  2. BFS in GT using sequence produced by step 1

SSSP

Description

SSSP = single source shortest path
given Graph, we need to find every shortest path from the source S to other nodes.

Pseudocode

negative weighted edges DAG not DAG Time
N SP on DAG, O(m+n) Dijkstra’s algo Omlogn(sparse) or O n^2(dense)
Y Same as above Bellmen Ford O(mn)

Class 12 07/29/2019

Single Source Shortest Path

Description

Given G and s, return a dist[u] which stores the shortest disatnce from s to u.

Shortest Path on DAG

Description

1
2
3
4
5
6
7
input: G(V, E), S
step:
0. dist[S] = 0, dist[others] = inf
1. topological sort
2. relax the edge(u, v)

Time Complexity O(m+n)

PseudoCode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// helper function, check if v is the shortest path given u.
relax(u, v) {
// Cuv = Cost from u to v
if (dist[u] + Cuv < dist[v]) {
dist[v] = dist[u] + Cuv
}
}
getShortest(G) {
topological_sort = topo_sort(G);
dist = [inf ... 0...inf]; // Set all to inf except s
for (u in topological_sort) {
for (v in G.neighbor(u)) {
relax(u, v);
}
}
return dist;
}

Example

Notes-2019-7-29-20-11-5.png

1
2
3
4
5
6
7
8
9
topo = [3, 2, 1, 4, 5]
dist = [inf, inf, 0, inf, inf]
-1, 2 , 0, inf, inf 3
-1, 2, 0, 3 , inf 2
-1, 2, 0, 3 , -3 1
-1, 2, 0, 3 , -3 4
-1, 2, 0, 3 , -3 5
Q: Why is Topological sort working?
dist[u] = min(dist[x] + Cxu, dist[y] + Cyu), Assume x and y is before u. We need to finish all prerequisites before we try to update u, which is sorted by topological sort.

Dijkstra’s algorithm

Description

1
2
3
4
5
6
7
Get the dist[], N is set of the vertices we vistited.
n is the total number of vertices.
while (|N| != n) {
1. pick vertex u in V but not in N with the smallest dist[] value
2. N.add(u)
3. for all neighbors of u, relax them.
}

Example

Notes-2019-7-29-20-33-44.png

1
2
3
4
5
6
7
8
9
S = 2
N = {}
dist = [inf, 0, inf, inf, inf]

1. u = 2, N = {2}, relax(2, 1), relax(2, 3), dist = [inf, 0, 4, 1, inf]
2. u = 4, N = {2, 4}, relax(4, 3), dist = [inf, 0, 3, 1, inf]
3. u = 3, N = {2, 4, 3}, relax(3, 1), dist = [5, 0, 3, 1, inf]
4. u = 1, N = {2, 4, 3, 1}, relax(1, 2), relax(1, 5), dist = [5, 0, 3, 1, 6]
4. u = 5, N = {2, 4, 3, 1, 5}, relax(5, 4), dist = [5, 0, 3, 1, 6]

Analysis

1
2
3
4
5
6
7
8
9
10
11
1. pick a smallest distance node, O(n)
2. Add the node to my set, set[node] = true, O(1)
3. relax u's neighbors, O(du), Dout < n.
4. Overall is n * n = n^2

Using min heap
1. pick u, O(1)
2. delete u, O(logn) and add to the set, O(1)
3. relax it's neieghbors, O(Du * logn).
## Here we just need to bubble up the changed node
4. Overall is O(mlogn) = d1logn + d2logn + ... + dnlogn = mlogn

Bellman Ford

Description

Relax all edges n-1 times.

Example

Notes-2019-7-29-21-18-38.png

1
2
3
4
5
6
7
S = 1
dist = [0, inf, inf, inf]
i = 1, we get 2, 3, 4, 1, dist = [0, 3, inf, inf]
i = 2, we get 2, 3, 4, 1, dist = [0, 3, -7, 0]
i = 3, dist = [0, 3, -7, 0]

Since we can at least update one true shortest distance at each loop, then we need at most n-1 times to update all true shorteset distances. After that, the dist array shouldn't change. If it changes, it means there's a cycle with negative weights.

PseudoCOde

1
2
3
4
5
6
7
for (i = 1... n-1) {// we do the algo n-1 times
for (u = 1...n) {
for (V in G.neighbors(u)) {
relax(u, v);
}
}
}

Class 13 08/05/2019

Review

  1. Shortest path on DAG, O(m+n)
  2. Dijkstra’s Algorithm, O(mlogn) for pq, O(n^2) for matrix
  3. Bellman Ford Algorithm, O(mn)

  4. How do we track the actual paths? we need to modify the relax function.

1
2
3
4
5
6
relax(u, v) {
if (dist[u] + cur < dist[v]) {
dist[v] = u;
parent[v] = u;
}
}

All Pair Shortest Path

Description

Goal: Build dist[u, v] = shortest distance from u to v

negative weighted edges DAG not DAG Time
N O(mn) = On^2 ~ On^3 Dijkstra’s algo Onmlogn = O n2logn ~ n3logn or O n^3
Y m = n(n-1) at most Bellman Ford O(mn^2) = On3 ~On4

To improve the time complexity of Bellman Ford Algorithm interms of number of vertices, we can use Floyd-Warshall Algorithm.

Floyd - rWarshall Algorithm

Description

We define f_k(u, v) = shortest distance from u to v with only vertices{1, 2, … k} on shortest path

Notes-2019-8-5-18-26-14.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Example:
f_1(u, v) = 20
f_2(u, v) = 7

f_k(u, v) = cur, if (u, v) in E
inf, otherwise

base case:
f_0(u, v) = inf, if such edge doesn't exist
weight(u, v), otherwise
f(u, u) = 0

Recurrence:
f_1(u, v) = min(f_0(u, v), f_0(u, 1) + f_0(1, v))
f_k+1(u, v) = min(f_k(u, v), f_k(u, k+1) + f_k(k+1, v))

f_n(u, v) = true shortest distance {1, 2, 3, ..., n}

PseudoCode

1
2
3
4
5
6
7
8
9
10
FW(G) {
dist[,] = adj matrix
for (k = 1, 2, ..., n) {
for ( u = 1 ... n) {
for (v = 1 ... n) {
dist[u, v] = min(dist[u, v], dist[u, k] + dist[k, v]);
}

}
}

Example

Notes-2019-8-5-18-45-38.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Adj matrix

0, 1, -2, inf
inf, 0, 3, inf
inf, inf, 0, -1
inf, 0, inf, 0

k = 1
dist[1, 1] = inf
dist[1, 2] = min(1, dist[1, 1] + dist[1, 2]) = 1
dist[1, 3] = min(-2, dist[1, 1] + dist[1, 3]) = -2
dist[1, 4] = min(inf, dist[1, 1] + dist[1, 4]) = inf

dist[2, 3] = min(dist[2, 3], dist[2, 1] + dist[1, 3]) = min(3, inf + (-2)) = 3

The first iteration basically doesn't change
k = 2
dist[1, 1] = min(inf, dist[1, 2] + dist[2, 1]) = inf
dist[1, 3] = min(-2, dist[1, 2] + dist[2, 3]) = min(1, 4) = 1

dist[4, 3] = min(inf, dist[4, 2] + dist[2, 3]) = 3


0, 1, -2, inf
inf, 0, 3, inf
inf, inf, 0, -1
inf, 0, 3, 0

k = 3
dist[1, 4] = min(inf, -2 + -1) = -3
dist[2, 1] = min(inf, 0 + inf) = inf
dist[2, 3] = min(3, 3 + 0) = 3
dist[2, 4] = min(inf, 3 + -1) = 2

...

Negative weighted cycle is not applicable!

Minimum Spanning Tree

Description

Notes-2019-8-5-19-12-17.png

1
2
3
Input: Connected undirected graph
we want to find a weighted undicted graph that has the smallest cost.
We have Prim's Algorithm and Kruskal's Algorithm to solve this.

Prim’s Algorithm

Description

Notes-2019-8-5-19-24-45.png

1
2
3
4
5
6
7
8
9
10
11
12
N = {}
dist[u] = distance from set N to vertex U
N is a set of vertices, we pick the shortest distacne from amongst all possible edges from U to this set.

Example,
N = {}, dist = inf, inf, inf, inf
1. N = {3}, dist = 2(3, 1), 1(3, 2), 0, inf
2. N = {3, 2}, dist = 2(3, 1), _, _, 2(4, 2)
3. N = {3, 2, 4}, dist = 2(3, 1), _, _, _
4. N = {3, 2, 4, 1}, dist = _, _, _, _

Final dist = [2, 1, 0, 2]

Final Spanning Tree
Notes-2019-8-5-19-31-6.png

PseudoCode

1
2
3
4
5
6
7
8
9
10
11
12
Prims(G) {
N = {}, T = {};
while (N != V) {
1. pick vertex u in (V/N) with the smallest distance value
2. N = N U {u}, T = {(u, parent(u))} U T
3. for (v in G.neighbor(u)) {
if (Cost(u, v) < dist[v]) {
dist[v] = Cost(u, v), T[v] = u;
}
}
}
}

Example

Notes-2019-8-5-19-24-45.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
N = {3}
dist = [2, 1, inf, inf]
parent = [3, 3, -1, -1]

N = {3, 2}
dist = [2, 1, inf, 2]
parent = [3, 3, -1, 2]

N = {3, 2, 4}
dist = [2, 1, inf, 2]
parent = [3, 3, -1, 2]

N = {3, 2, 4, 1}
dist = [2, 1, inf, 2]
parent = [3, 3, -1, 2]

Our Tree is (1, 3), (2, 3), (4, 2)

Actually here we don't need T here, and we can use a priority Queue to pick the smallest distance not in N.

Kruskal’s Algorithm

Description

1
2
3
4
5
6
7
8
- Sort edges by their weights
- T = {}
for (e in sorted order) {
if (T U {e} has no cycle) {
T = T U {e}
}
}
To detect cycle, we can either use disjoint sets.

Disjoint Sets

1
2
- We have n items {1, 2, ..., n}
- group them into disjoint sets {1, 2}, {4, 5, 6}
1
2
3
4
5
6
7
// mlogn time 
for (e = (u, v) in sorted order) {
if (Find(u) != Find(v)) {
T = T U {(u, v)}; // union
Union(u, v);
}
}

Union Find

Logical View

Notes-2019-8-5-20-32-33.png

1
2
1 2 3 4 5 6 7
2 2 2 4 4 5 7
PseudoCode
1
2
3
4
5
6
7
8
9
10
Find(u) {
if (parent[u] == u) {
return u;
}
return Find(parent[u]);
}

Union(u, v) {
parent[u] = v;
}
Union By Rank

Notes-2019-8-5-20-46-13.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
1 2 3 4 5 6 7
1 2 1 3 2 1 1
Find O logn Union O 1

num of item of a root node >= 2 ^(rank - 1)
proof:
if rank = 1, then 1 >= 2^1 - 1
assume rank = k, node >= 2^(k-1)
when rank = k+1, assume target = 2^k, we have node + node >= 2^(k-1) * 2 = 2^k >= target. Thus the time complexity is O logn
**/
// we need to find first
u = find(u);
v = find(v);
if (rank[u] < rank[v]) {
parent[u] = v;
} else if (rank[u] > rank[v]) {
parent[v] = u;
} else {
parent[u] = v;
rank[v]++;
}

Path Compression

We dont want a long chain here, we want less levels of nodes. So we modify the parent of the node to be the root each time we find.

Assume for a sequence of m U&F operations.
overall running time , O(mlog*n).

eg.
Notes-2019-8-5-21-5-1.png
log 2 = 1
log
4 = 2
log 16 = 3
log
65536 = 4
log* 2^65536 = 5

1
2
3
4
5
6
7
Find(u) {
if (parent[u] == u) {
return u;
}
parent[u] = Find(parent[u]);
return parent[u];
}

Google Bigtable

Posted on 2019-05-03 | In CS
Symbols count in article: 6.2k | Reading time ≈ 6 mins.

Please comment one line (fmt.Fprintf(w, “Search received: %s %s %s”, lat, lon, ran)) as IOS does not like to get string. Make sure the response of \post is empty

Bigtable

BigTable is one of the first NoSQL solutions in the world. It’s the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail.

Apache HBase is the open source version of BigTable.

By definition, A Bigtable is a sparse, distributed, persistent multidimensional sorted map. (In my word, it’s a key­value store.) The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

Main features that you must know about BigTable

  • Map
  • Persistent: BigTable vs. Memcache
  • Distributed (GFS/HDFS, etc.): data is replicated across a number of participating nodes vs. MySQL
  • Multidimensional: row, column (column family), timestamp. You may scan based on their ranges.
  • Sparse: A given row can have any number of columns in each column family, or none at
    all. The other type of sparseness is row­based gaps, which merely means that there may be gaps between keys.

BigTable vs. ElasticSearch vs. Datastore vs….

  • BigTable is persistent storage (ES is not persistent, may lose data)
  • ElasticSearch is search engine with complicated query support and better read
    performance
  • BigQuery is for offline analysis not for serving user traffic (scale is small)
  • MongoDB is NoSQL.
  • BigTable vs. ElasticSearch vs. MongoDB vs. BigQuery vs. Dataflow
    • BigTable = NoSQL + Cloud
    • MongoDB = NoSQL
    • ElasticSearch = NoSQL + Query Optimization
    • BigQuery = MySQL­like + Cloud
    • Dataflow = MapReduce + Cloud

Interview question:

Why you need BigTable (or BigQuery) while you already have ElasticSearch (BigTable/BigQuery)

More readings

  • http://chouqin.github.io/blog/2013/10/24/bigtable/
  • http://blog.csdn.net/opennaive/article/details/7532589
  • https://www.quora.com/What­problems­did­Spanner­solve­that­BigTable­fall­short­of

Bigtable has a concept of column family and column. Column family is a set of columns.

Question: Why column family?
Answer: allow better management of data such as versioning strategy.

6. Bigtable-2019-5-3-10-50-24.png

Create BigTable in Google Cloud

Click this link https://cloud.google.com/bigtable/docs/quickstart­cbt and then click ‘ENABLE THE APIS’

To Enable an API.

Open ‘BigTable’ in Google Cloud,

Create an instance called ‘around­post’. Change instance type to development.

Set the type to ssd and instance type to development

Enter gcloud in your terminal

Install cbt in your cloud terminal (not your local terminal). When asked ‘Do you want to continue?’ Enter y. Proceed without waiting it to finish.

1
2
3
4
sudo gcloud components update sudo gcloud components install cbt
```

To ensure cbt is installed, enter `which cbt` in your cloud terminal If you don’t see it, you cbt is not installed, please do the install again.

/google/google-cloud-sdk/bin/cbt

1
2
3


In your cloud terminal, enter

echo project = YOUR_PROJECT_ID > ~/.cbtrc echo instance = around­post >> ~/.cbtrc
cbt createtable post
cbt createfamily post post
cbt createfamily post location

1
2
3
4
5
6
7
8

If you see such results, the bigtable column and column family are both created (This step is a MUST for next lessons). Please check it carefully.

![6. Bigtable-2019-5-3-10-58-1.png](https://raw.githubusercontent.com/Luorinz/images/master/6.%20Bigtable-2019-5-3-10-58-1.png)


## Write data into BigTable
### Open your local terminal and cloud terminal, enter

go get cloud.google.com/go/bigtable

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

Instructions on writing data to BigTable
+ https://cloud.google.com/bigtable/docs/go/reference


### Update your code in handlerPost and saveToBigTable. Replace project name with your project.
```Go
func saveToBigTable(p *Post, id string) {
ctx := context.Background()
// you must update project name here
bt_client, err := bigtable.NewClient(ctx, PROJECT_ID, BT_INSTANCE)
if err != nil {
panic(err)
return
}

tbl := bt_client.Open("post")
mut := bigtable.NewMutation()
t := bigtable.Now()

mut.Set("post", "user", t, []byte(p.User))
mut.Set("post", "message", t, []byte(p.Message))
mut.Set("location", "lat", t, []byte(strconv.FormatFloat(p.Location.Lat, 'f', -1, 64)))
mut.Set("location", "lon", t, []byte(strconv.FormatFloat(p.Location.Lon, 'f', -1, 64)))

err = tbl.Apply(ctx, id, mut)
if err != nil {
panic(err)
return
}
fmt.Printf("Post is saved to Big Table : %s\n", p.Message)

}

Based on what Google suggest, how to save our Post into BT?

1
2
3
4
5
6
7
8
tbl := client.Open("mytable")
mut := bigtable.NewMutation()

mut.Set("links", "maps.google.com", bigtable.Now(), []byte("1"))

mut.Set("links", "golang.org", bigtable.Now(), []byte("1"))

err := tbl.Apply(ctx, "com.google.cloud", mut)

Uncomment the two lines in const and replace the PROJECT_ID with your own project id.

1
2
3
4
5
6
7
8
const (
INDEX = "around"
TYPE = "post"
DISTANCE = "200km"
// Needs to update with your real project id PROJECT_ID = "xxxx"
BT_INSTANCE = "around­post"
// Needs to update this URL if you deploy it to cloud. ES_URL = "http://AWS_ADDRESS:9200"
)

Add two imports

1
2
3
4
5
6
7
8
9
10
11
package main

import (
elastic "gopkg.in/olivere/elastic.v3" "fmt"
"net/http"
"encoding/json"
"log"
"strconv"
"reflect"
"context" "cloud.google.com/go/bigtable" "github.com/pborman/uuid"
)

Try to build it locally to see if there are any errors.

1
go build *.go

Update your codes in cloud terminal (Either copy or github) Offer share

After you copy or update codes into cloud terminal, do a deployment again.

Open Postman, test post and search

Read data from BigTable

In the cloud terminal, enter

1
cbt read post

You should be able to find all the entries stored in BT.

How to debug? Check Logging­>logs.

Note that: if you close your cloud terminal, you may or may not need to install cbt again as cloud terminal is actually a vm.

Things you need to know

  1. setup bigtable
  2. BigTable vs. ElasticSearch vs. MongoDB vs. BigQuery vs. Dataflow
    • BigTable = NoSQL + Cloud
    • MongoDB = NoSQL
    • ElasticSearch = NoSQL + Query Optimization
    • BigQuery = MySQL + Cloud
    • Dataflow = MapReduce + Cloud

6. Bigtable-2019-5-3-11-6-46.png

Authentication

Posted on 2019-05-02 | Edited on 2019-05-01 | In CS
Symbols count in article: 15k | Reading time ≈ 13 mins.

Credential based Authentication

Login each time when you need to operate something

Cookie Based Authentication

  • The traditional approach of authentication
  • Stateful. This means that an authentication record or session must be kept both server and client-side. The server needs to keep track of active sessions in a database (memory/local store), while on the front-end a cookie is created that holds a session identifier, thus the name cookie based authentication.

5. Authentication-2019-5-1-17-29-44.png

  1. User enters their login credentials
  2. Server verifies the credentials are correct and creates a session which is then stored in a database
  3. A cookie with the session ID is placed in the users browser
  4. On subsequent requests, the session ID is verified against the database and if valid the request processed
  5. Once a user logs out of the app, the session is destroyed both client and server side

Question: what’s the problem of this method?

Imagine when you need to replace your ID?

  1. Go to government, apply for a temporary ID.
  2. It may take them 1 months to send you a new ID.
  3. During this period, you may use your temporary ID for some purposes.

Token-Based Authentication

  • More and more popular (Yelp API, etc.)
  • Usually JSON Web Tokens (JWTs).

How it works

  • User enters their login credentials (=username + password)
  • Server verifies the credentials are correct and returns an encrypted and signed token with a private key.
    • Text: (JSON = {username: “abcd”, uuid=”1.2.3.4”}, private key) => token
    • Code: abccdeedddee
    • Decode: JSON = {username: “abcd”, uuid=”1.2.3.4”}
    • Symmetric vs Asymmetric encryption.
    • This token is stored client-side, most commonly in local storage - but can be stored in session storage or a cookie as well
  • Subsequent requests to the server include this token as an additional Authorization header or through one of the other methods mentioned above
  • The server decodes the JWT and if the token is valid processes the request
  • Once a user logs out, the token is destroyed client-side, no interaction with the server is necessary

Encryption using asymmetric keys.

5. Authentication-2019-5-1-17-30-10.png

Sign using asymmetric keys:

5. Authentication-2019-5-1-17-30-17.png

More reading (symmetric vs asymmetric encryption):
https://www.ssl2buy.com/wiki/symmetric-vs-asymmetric-encryption-what-are-differences

Advantages of Token-Based Authentication

Stateless, Scalable and Decoupled

  • Stateless: The back-end does not need to keep a record of tokens.
  • Self-contained, containing all the data required to check its validity..
  • No DB look up is needed.

Store Data in the JWT

  • With a cookie based approach, you simply store the session id in a cookie.
  • JWT’s on the other hand allow you to store any type of metadata, as long as it’s valid JSON. (username in our project, for example)

Mobile Friendly

  • Native mobile platforms and cookies do not mix well.
  • In our project, the same Go backend will serve traffic to both iOS and browser (React JS)

    Disadvantages of Token-Based Authentication

  • Usually the size of token is larger than a session id.

Auth0 and Golang

Source of codes: https://auth0.com/blog/authentication-in-golang/

JWT = JSON Web Toolkit

use jwt to protect post and search endpoints (reject if without auth token)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package main

import (
// new libs
"github.com/auth0/go-jwt-middleware"
"github.com/dgrijalva/jwt-go"
"github.com/gorilla/mux"
)

// Other codes

var mySigningKey = []byte("secret")

func main() {
// Create a client
client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))
….
fmt.Println("Started service successfully")
// Here we are instantiating the gorilla/mux router
r := mux.NewRouter()

var jwtMiddleware = jwtmiddleware.New(jwtmiddleware.Options{
ValidationKeyGetter: func(token *jwt.Token) (interface{}, error) {
return mySigningKey, nil
},
SigningMethod: jwt.SigningMethodHS256,
})

r.Handle("/post", jwtMiddleware.Handler(http.HandlerFunc(handlerPost))).Methods("POST")
r.Handle("/search", jwtMiddleware.Handler(http.HandlerFunc(handlerSearch))).Methods("GET")
r.Handle("/login", http.HandlerFunc(loginHandler)).Methods("POST")
r.Handle("/signup", http.HandlerFunc(signupHandler)).Methods("POST")

http.Handle("/", r)
log.Fatal(http.ListenAndServe(":8080", nil))
}

Let’s explain these codes line by line

1
r := mux.NewRouter()

Create a new router on top of the existing http router as we need to check auth.

1
2
3
4
5
6
var jwtMiddleware = jwtmiddleware.New(jwtmiddleware.Options{
ValidationKeyGetter: func(token *jwt.Token) (interface{}, error) {
return mySigningKey, nil
},
SigningMethod: jwt.SigningMethodHS256,
})

Create a new JWT middleware with a Option that uses the key ‘mySigningKey’ such that we know this token is from our server. The signing method is the default HS256 algorithm such that data is encrypted.

1
r.Handle("/post", jwtMiddleware.Handler(http.HandlerFunc(handlerPost))).Methods("POST")

It means we use jwt middleware to manage these endpoints and if they don’t have valid token, we will reject them.

First handle the jwt, then chandle the normal http request

Question: if we reject it, what HttpResponse code we should return?
https://golang.org/src/net/http/status.go

Install new packages

1
2
3
go get "github.com/auth0/go-jwt-middleware"
go get "github.com/dgrijalva/jwt-go"
go get "github.com/gorilla/mux"

Under the same folder, add a new file called user.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package main

import (
elastic "gopkg.in/olivere/elastic.v3"

"encoding/json"
"fmt"
"net/http"
"reflect"
"regexp"
"time"

"github.com/dgrijalva/jwt-go"
)

const (
TYPE_USER = "user"
)

var (
usernamePattern = regexp.MustCompile(`^[a-z0-9_]+$`).MatchString
)

type User struct {
Username string `json:"username"`
Password string `json:"password"`
Age int `json:"age"`
Gender string `json:"gender"`
}

Implement checkUser method in user.go.

What does this method do?

We need to check whether a pair of username and password is stored in ES.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// checkUser checks whether user is valid
func checkUser(username, password string) bool {
es_client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))
if err != nil {
fmt.Printf("ES is not setup %v\n", err)
return false
}

// Search with a term query
termQuery := elastic.NewTermQuery("username", username)
queryResult, err := es_client.Search().
Index(INDEX).
Query(termQuery).
Pretty(true).
Do()
if err != nil {
fmt.Printf("ES query failed %v\n", err)
return false
}

var tyu User
for _, item := range queryResult.Each(reflect.TypeOf(tyu)) {
u := item.(User)
return u.Password == password && u.Username == username
}
// If no user exist, return false.
return false
}

Implement addUser method.

Student question

  1. After we send the term query, how do we know whether this user has existed?
  2. How to insert this user into ES?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Add a new user. Return true if successfully.
func addUser(user User) bool {
es_client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))
if err != nil {
fmt.Printf("ES is not setup %v\n", err)
return false
}

termQuery := elastic.NewTermQuery("username", user.Username)
queryResult, err := es_client.Search().
Index(INDEX).
Query(termQuery).
Pretty(true).
Do()
if err != nil {
fmt.Printf("ES query failed %v\n", err)
return false
}

if queryResult.TotalHits() > 0 {
fmt.Printf("User %s already exists, cannot create duplicate user.\n", user.Username)
return false
}

_, err = es_client.Index().
Index(INDEX).
Type(TYPE_USER).
Id(user.Username).
BodyJson(user).
Refresh(true).
Do()
if err != nil {
fmt.Printf("ES save user failed %v\n", err)
return false
}

return true
}

Implement signupHandler method.

Student question: finish this method to support

  1. Decode a user from request (POST)
  2. Check whether username and password are empty, if any of them is empty, call http.Error(w, "Empty password or username", http.StatusInternalServerError)
  3. Otherwise, call addUser, if true, return a message “User added successfully”
  4. If else, call http.Error(w, “Failed to add a new user”, http.StatusInternalServerError)
  5. Set header to be w.Header().Set(“Content-Type”, “text/plain”) w.Header().Set(“Access-Control-Allow-Origin”, “*”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// If signup is successful, a new session is created.
func signupHandler(w http.ResponseWriter, r *http.Request) {
fmt.Println("Received one signup request")

decoder := json.NewDecoder(r.Body)
var u User
if err := decoder.Decode(&u); err != nil {
panic(err)
return
}

if u.Username != "" && u.Password != "" && usernamePattern(u.Username) {
if addUser(u) {
fmt.Println("User added successfully.")
w.Write([]byte("User added successfully."))
} else {
fmt.Println("Failed to add a new user.")
http.Error(w, "Failed to add a new user", http.StatusInternalServerError)
}
} else {
fmt.Println("Empty password or username.")
http.Error(w, "Empty password or username", http.StatusInternalServerError)
}

w.Header().Set("Content-Type", "text/plain")
w.Header().Set("Access-Control-Allow-Origin", "*")
}

Implement loginHandler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// If login is successful, a new token is created.
func loginHandler(w http.ResponseWriter, r *http.Request) {
fmt.Println("Received one login request")

decoder := json.NewDecoder(r.Body)
var u User
if err := decoder.Decode(&u); err != nil {
panic(err)
return
}

if checkUser(u.Username, u.Password) {
token := jwt.New(jwt.SigningMethodHS256)
claims := token.Claims.(jwt.MapClaims)
/* Set token claims */
claims["username"] = u.Username
claims["exp"] = time.Now().Add(time.Hour * 24).Unix()

/* Sign the token with our secret */
tokenString, _ := token.SignedString(mySigningKey)

/* Finally, write the token to the browser window */
w.Write([]byte(tokenString))
} else {
fmt.Println("Invalid password or username.")
http.Error(w, "Invalid password or username", http.StatusForbidden)
}

w.Header().Set("Content-Type", "text/plain")
w.Header().Set("Access-Control-Allow-Origin", "*")
}

Let’s explain the codes

1
2
3
4
5
6
decoder := json.NewDecoder(r.Body)
var u User
if err := decoder.Decode(&u); err != nil {
panic(err)
return
}

Decode a user from request’s body

1
if checkUser(u.Username, u.Password) {

Make sure user credential is correct.

1
token := jwt.New(jwt.SigningMethodHS256)

Create a new token object to store.

1
claims := token.Claims.(jwt.MapClaims)

Convert it into a map for lookup

1
2
claims["username"] = u.Username
claims["exp"] = time.Now().Add(time.Hour * 24).Unix()

Store username and expiration into it.

1
tokenString, _ := token.SignedString(mySigningKey)

Sign (Encrypt) and token such that only server knows it.

1
w.Write([]byte(tokenString))

Write it into response

Update handlerPost in main.go to populate username

Question: why use the username in context? Why not ask user to send it as param?

Answer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func handlerPost(w http.ResponseWriter, r *http.Request) {
// other codes
user := r.Context().Value("user")
claims := user.(*jwt.Token).Claims
username := claims.(jwt.MapClaims)["username"]

// 32 << 20 is the maxMemory param for ParseMultipartForm
// After you call ParseMultipartForm, the file will be saved in the server memory with maxMemory size.
// If the file size is larger than maxMemory, the rest of the data will be saved in a system temporary file.
r.ParseMultipartForm(32 << 20)

// Parse from form data.
fmt.Printf("Received one post request %s\n", r.FormValue("message"))
lat, _ := strconv.ParseFloat(r.FormValue("lat"), 64)
lon, _ := strconv.ParseFloat(r.FormValue("lon"), 64)
p := &Post{
User: username.(string),
Message: r.FormValue("message"),
Location: Location{
Lat: lat,
Lon: lon,
},
}

...
}

Test

Local test

In the same folder where you have main.go and user.go

1
go run *.go

In windows, execute go run main.go users.go

Open Postman and enter ‘http://localhost:8080/signup’ in the url, in the Body add a new json object as

1
2
3
4
5
6
{
"username":"jack",
"password":"ABCD",
"age":16,
"gender":"female"
}

Click Send and you should see a message like ‘User added successfully’.

Change the url to be ‘http://localhost:8080/login’, and in the body use the same json object.

It will return a token as response copy it

Post

change the url to ‘http://localhost:8080/post’ and then in the header add a new key value with a key as ‘Authorization’ and a value as ‘Bearer YOUR_TOKEN’. In the body, still add related input params and an image file.

bearer is also fine

5. Authentication-2019-5-1-15-39-33.png

Click Send and make it works.

在这里elastic search默认只显示10条结果,需要在search handler那里修改一下结果数量。

5. Authentication-2019-5-1-17-21-39.png

Search

change the url to ‘http://localhost:8080/search?lat=37.5&lon=-120.5&range=200’ and the method to be GET. Then in the Headers, similarly, add a new Authorization key value pair if not there.

Click send and you should get the same results as last time.
You can also verify the signed content on:
https://jwt.io/

Remote test (Homework)

Deploy a new instance in GCloud and then test it again

REMEMBER: Install new packages on Google Cloud Shell!
Note: some new package may need to be installed on Google cloud shell:

1
2
3
go get github.com/googleapis/gax-go
go get google.golang.org/api/googleapi
go get go.opencensus.io/trace

Homework

  1. Try to send some random string and see what’s the response
  2. Why we don’t protect the two endpoints of ‘login’ and ‘signup’?
  3. How to protect credentials from man-in-the-middle attack?

Google Cloud Storage

Posted on 2019-04-30 | In CS
Symbols count in article: 13k | Reading time ≈ 12 mins.

Google Cloud Storage (GCS)

Question: we have so many storage, why we still need GCS (Google Cloud Storage)?

Answer: to store unstructured data (file, image, video, etc.) and implement CDN. BigTable, BigQuery, ElasticSearch are all for structured data.

Definition: A Powerful, Simple and Cost Effective Object Storage Service.
Think it as Google Drive, Dropbox, Amazon S3.

Features

Durable

  • Google Cloud Storage is designed for 99.999999999% durability. It stores data redundantly, with automatic checksums to ensure data integrity. With Multi-Regional storage, your data is maintained in geographically distinct locations.

    Available

  • All storage classes offer very high availability around the world.

    Scalable

    Inexpensive

  • Good to store unstructured data like image, file, video, etc.

Content Delivery Network (CDN)

A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers.

CDN, offers an efficient, cost-effective way of reducing both network I/O costs and content delivery latency for regularly accessed website assets.

A CDN can be understood as group of geographically distributed caches, with each cache locaœted in one of several global points of presence.

Traditional server vs. CDN
4. Google Cloud Storage-2019-4-29-11-31-29.png
Companies like Akamai, Cloudflare, etc.

GCS Bucket

First of all, we will need a gcs bucket. Why we need this bucket? It’s like a folder which we can have files (images/videos posted from user) in it. Bucket == Folder.

Open your console.cloud.google.com and choose Storage -> Browser.

Then click ‘CREATE BUCKET’ to create a new bucket.

Pick a name for it, starts with post-images-, and the suffix is your project ID to avoid any conflict with other users. If used by others, add a random number. For example, I use post-images-75015 while yours will be different. Please remember this bucket name and we will use it later.

When asked about ‘Default storage class’, Regional is ok as our servers (GAE flex) are in us-central-, so we may put this bucket in us-central- too. Click ‘save’ to save your changes.

Multi part Form

A HTTP multipart request is a HTTP request that HTTP clients construct to send files and data over to a HTTP Server.

It is commonly used by browsers and HTTP clients to upload files to the server.

The content type “application/x-www-form-urlencoded“ is inefficient for sending large quantities of binary data or text containing non-ASCII characters.

The content type “multipart/form-data“ should be used for submitting forms that contain files, non-ASCII data, and binary data.

How to send multipart request in Postman?

4. Google Cloud Storage-2019-4-30-9-46-47.png

How does it look like?

4. Google Cloud Storage-2019-4-30-9-47-2.png

How to parse multipart form in Go?

1
func (r *Request) ParseMultipartForm(maxMemory int64) error

https://golang.org/pkg/net/http/#Request.ParseMultipartForm

https://github.com/golang-samples/http/blob/master/fileupload/main.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41


package main

import (
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
)

// 1MB
const MAX_MEMORY = 1 * 1024 * 1024

func upload(w http.ResponseWriter, r *http.Request) {
if err := r.ParseMultipartForm(MAX_MEMORY); err != nil {
log.Println(err)
http.Error(w, err.Error(), http.StatusForbidden)
}

for key, value := range r.MultipartForm.Value {
fmt.Fprintf(w, "%s:%s ", key, value)
log.Printf("%s:%s", key, value)
}

for _, fileHeaders := range r.MultipartForm.File {
for _, fileHeader := range fileHeaders {
file, _ := fileHeader.Open()
path := fmt.Sprintf("files/%s", fileHeader.Filename)
buf, _ := ioutil.ReadAll(file)
ioutil.WriteFile(path, buf, os.ModePerm)
}
}
}

func main() {
http.HandleFunc("/upload", upload)
http.Handle("/", http.FileServer(http.Dir("static")))
log.Fatal(http.ListenAndServe(":8080", nil))
}

Code changes

Question: Why we need GCS in this project?
Answer: to save the image uploaded from user and serve it.

Update handlerPost to support save image into GCS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import (
"context"
"cloud.google.com/go/storage"
// other libs
)
const (
INDEX = "around"
TYPE = "post"
DISTANCE = "200km"
PROJECT_ID = "around-75015"
BT_INSTANCE = "around-post"
// other configs

// Needs to update this bucket based on your gcs bucket name.
BUCKET_NAME = "post-images-75015"
)

type Post struct {
// `json:"user"` is for the json parsing of this User field. Otherwise, by default it's 'User'.
User string `json:"user"`
Message string `json:"message"`
Location Location `json:"location"`
Url string `json:"url"`
}

...

func handlerPost(w http.ResponseWriter, r *http.Request) {
// Other codes
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Headers", "Content-Type,Authorization")


// 32 << 20 is the maxMemory param for ParseMultipartForm, equals to 32MB (1MB = 1024 * 1024 bytes = 2^20 bytes)
// After you call ParseMultipartForm, the file will be saved in the server memory with maxMemory size.
// If the file size is larger than maxMemory, the rest of the data will be saved in a system temporary file.
r.ParseMultipartForm(32 << 20)

// Parse from form data.
fmt.Printf("Received one post request %s\n", r.FormValue("message"))
lat, _ := strconv.ParseFloat(r.FormValue("lat"), 64)
lon, _ := strconv.ParseFloat(r.FormValue("lon"), 64)
p := &Post{
User: "1111",
Message: r.FormValue("message"),
Location: Location{
Lat: lat,
Lon: lon,
},
}

id := uuid.New()

file, _, err := r.FormFile("image")
if err != nil {
http.Error(w, "Image is not available", http.StatusInternalServerError)
fmt.Printf("Image is not available %v.\n", err)
return
}
defer file.Close()

ctx := context.Background()

// replace it with your real bucket name.
_, attrs, err := saveToGCS(ctx, file, BUCKET_NAME, id)
if err != nil {
http.Error(w, "GCS is not setup", http.StatusInternalServerError)
fmt.Printf("GCS is not setup %v\n", err)
return
}

// Update the media link after saving to GCS.
p.Url = attrs.MediaLink

// Save to ES.
saveToES(p, id)

// Save to BigTable.
//saveToBigTable(p, id)

}

func saveToGCS(ctx context.Context, r io.Reader, bucketName, name string) (*storage.ObjectHandle, *storage.ObjectAttrs, error) {
// Student questions
}

Implement saveToGCS.

Google has provided a good example of writing objects to GCS. https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-go

Google example of open a client connection to GCS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package main

import (
"fmt"
"log"

// Imports the Google Cloud Storage client package.
"cloud.google.com/go/storage"
"golang.org/x/net/context"
)

func main() {
ctx := context.Background()

// Sets your Google Cloud Platform project ID.
projectID := "YOUR_PROJECT_ID"

// Creates a client.
client, err := storage.NewClient(ctx)
if err != nil {
log.Fatalf("Failed to create client: %v", err)
}

// Sets the name for the new bucket.
bucketName := "my-new-bucket"

// Creates a Bucket instance.
bucket := client.Bucket(bucketName)

// Creates the new bucket.
if err := bucket.Create(ctx, projectID, nil); err != nil {
log.Fatalf("Failed to create bucket: %v", err)
}

fmt.Printf("Bucket %v created.\n", bucketName)
}

Google example of writing an object to GCS (copied from https://github.com/GoogleCloudPlatform/golang-samples/blob/master/storage/objects/main.go
)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func write(client *storage.Client, bucket, object string) error {
ctx := context.Background()
// [START upload_file]
f, err := os.Open("notes.txt")
if err != nil {
return err
}
defer f.Close()

wc := client.Bucket(bucket).Object(object).NewWriter(ctx)
if _, err = io.Copy(wc, f); err != nil {
return err
}
if err := wc.Close(); err != nil {
return err
}
// [END upload_file]
return nil
}

Answer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Save an image to GCS.
func saveToGCS(ctx context.Context, r io.Reader, bucketName, name string) (*storage.ObjectHandle, *storage.ObjectAttrs, error) {
client, err := storage.NewClient(ctx)
if err != nil {
return nil, nil, err
}
defer client.Close()

bucket := client.Bucket(bucketName)
// Next check if the bucket exists
if _, err = bucket.Attrs(ctx); err != nil {
return nil, nil, err
}

obj := bucket.Object(name)
w := obj.NewWriter(ctx)
if _, err := io.Copy(w, r); err != nil {
return nil, nil, err
}
if err := w.Close(); err != nil {
return nil, nil, err
}


if err := obj.ACL().Set(ctx, storage.AllUsers, storage.RoleReader); err != nil {
return nil, nil, err
}

attrs, err := obj.Attrs(ctx)
fmt.Printf("Post is saved to GCS: %s\n", attrs.MediaLink)
return obj, attrs, err
}

Remember to install package:

go get -u cloud.google.com/go/storage

这里遇到了迷之bug error: elastic: Error 400 (Bad Request): failed to parse [type=mapper_parsing_exception] 问题出在postman上面, 重新开个post或者试试把url的类型改成plain text

Test

Local test (on your own computer)

cd to the folder where you have main.go

1
go run main.go

Open your Postman, change the method to POST, in the url enter ‘http://localhost:8080/post’

4. Google Cloud Storage-2019-4-30-14-5-10.png

In the Body part, choose form-data, and then enter lat, lon, message and image. For image, the type is ‘file’ such that you may upload an image from your local storage.

Verify the search API is working

Open Postman, change the method to GET, in the url change it to ‘http://localhost:8080/search?lat=37.5&lon=-120.5&range=200’

Copy the url into your browser and download the image to verify that it’s the same image that you’ve downloaded.

If you have any auth issue, try

1
gcloud auth application-default login

If you have permission problem, try

1
sudo chown -R $(whoami):staff ~/.config/gcloud/

Integration test (TBD)

Remote test (homework)

Commit your changes to github and then open a new cloud terminal.

Check out from github

1
git pull origin master

cd to your folder with main.go and app.yaml

1
gcloud app deploy

Wait until the deployment is done

Open Postman.

Instead of using ‘http://localhost:8080’ change it to your service url (without port) For example, ‘http://around-179500.appspot.com/post’ and ‘http://around-179500.appspot.com/search?lat=37.5&lon=-120.5&range=200’

The others are the same

Cors

cross-origin sharing standard

Pre-flight requests: You may have a content type like JSON, or some other custom header that’s triggering a pre-flight request, which your server may not be handling.

Try adding this one, if you’re using the ever-common AJAX in your front-end: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Requested-With
Gorilla’s handlers.CORS() will set sane defaults to get the basics of CORS working for you;

however, you can (and maybe should) take control in a more functional manner.
https://stackoverflow.com/questions/40985920/making-golang-gorilla-cors-handler-work

(Optional) Vim go plugin:

https://github.com/fatih/vim-go

Spam and Abuse Detection

Any user-facing IT company has to address the Spam and Abuse problem in production.

Question: Why our project needs to think about spam problem?

Answer:
Categories of Spam and Abuse in Industry
Racy/Nudity (Child Porn especially)

  • Harassment or Hate Speech (n* word, etc.)
  • Fake News (Facebook)
  • Keyword Spam (Keyword stuffing)
  • Drug Abuse
  • Violence or Bully
  • Spam
  • Copycat (IP Infringement)
  • Phishing (CEO Phishing for SSN)
  • Privacy Leak

    Solutions

  • Keyword based (rule based)
    • n words, s words, f* words, etc.
  • Machine Learning Model (images and videos)
    • Open NSFW model
  • Geo location based
    • Strip clubs, etc.
  • User based
    • User ban
    • User warning
    • Aggregated user signals
  • User feedback
    • report
    • Machine Learning
    • Moderators
  • External channels
    • Internal users like employees
    • Media coverage
  • Others

Question:
In our current design, we kind of spam filters we may enforce?

Answer: For example, keyword based spam filter

GAE Basic

Posted on 2019-04-29 | In CS
Symbols count in article: 9.8k | Reading time ≈ 9 mins.

Google Cloud Platform

GAE (Google App Engine)

Google App Engine is a cloud computing platform for developing and hosting web applications in Google-managed data centers.

GCE (Google Compute Engine)

Google Compute Engine delivers virtual machines running in Google’s innovative data centers and worldwide fiber network.

GKE (Google Container Engine)

https://cloud.google.com/containers/

Dataflow

Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation.

BigTable

Cloud Bigtable is Google’s NoSQL Big Data database service. It’s the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail.

BigQuery

BigQuery is Google’s fully managed, petabyte scale, low cost enterprise data warehouse for analytics.

Others

  • PubSub
  • Datastore
  • Storage
  • Monitoring

What is a typical BigData project (web+analysis)

3. GAE Basic-2019-4-28-21-11-47.png

  • Service (GAE/GKE) to receive user generated contents
  • Datastore (BigTable/Spanner) to save contents
  • Index (ElasticSearch) to query contents in runtime
  • Service (GAE/GKE) to serve contents in runtime
  • Dataflow (MapReduce) to dump data into BigQuery
  • BigQuery (SQL) to perform further analysis

GKE is more like a simple GCE

Google App Engine Flex

App Engine flexible environment automatically scales your app up and down while balancing the load. Microservices, authorization, SQL and NoSQL databases, traffic splitting, logging.

Step 3.1.1 In your Intellij, create a new file called app.yaml

The content has just two lines.

GAE 通过flex实现负载均衡

1
2
runtime: go
env: flex

Recommended: upload your two files (main.go and app.yaml) into github.

Worst cast: just copy and paste in terminals.

More readings

  • Overview https://cloud.google.com/docs/overview/
  • Flex in GAE https://cloud.google.com/appengine/docs/flexible/
  • GAE on Golang https://cloud.google.com/appengine/docs/go/
  • GAE in go: https://cloud.google.com/appengine/docs/flexible/go/quickstart

Setup cloud terminal.

Question: Why use cloud terminal?
Answer: Such that we do not rely on the local environment (Windows vs. Mac).

Click ‘Activate Google Cloud Shell’. Wait until the init is done

Then you will see a vm available for you to play. This is similar to the EC2 vm.

Install go libraries in gcloud terminal.

1
2
go get gopkg.in/olivere/elastic.v3
go get github.com/pborman/uuid

(Recommended) Checkout your codes from github

git clone https://github.com/Luorinz/Around.git

Question: what’s the purpose of app.yaml?

Answer: to tell google that your program is a go program and use go compiler to run it.

Then enter this line of code to make sure you copy past is done.

1
cat main.go | head -n 5

You’re expected to see this

1
2
3
4
5
package main

import (
elastic "gopkg.in/olivere/elastic.v3"
"fmt"

Enter gcloud app deploy

When asked about site, enter 1, which is us-west2

When asked Y/n, enter Y. The deployment may take more than 10 minutes to finish (next deployment will be faster).

Error: If you have error like this, you may have a wrong ES_URL. Update the value and deploy again.

1
2
3
4
5
Updating service [default]...failed.                                                                                         
ERROR: (gcloud.app.deploy) Error Response: [9]
Application startup error:
+ exec app
panic: no Elasticsearch node available

Open Postman, find the post query from history.

(POST, url=http://xxx.appspot.com/post)

Replace xxx with your real project id

1
2
3
4
5
6
7
8
{
"user":"1111",
"message":"一生必去的100个地方",
"location":{
"lat":37.5,
"lon":-120.1
}
}

And then find the get query from history. (GET, url=http://around-xxxx.appspot.com/search?lat=37.5&lon=-120)
You should get the results back

1
2
3
4
5
6
7
8
9
10
[
{
"user": "1111",
"message": "一生必去的100个地方",
"location": {
"lat": 37.5,
"lon": -120.1
}
}
]

Play with more examples. If you do not have any results back, try to change your lat or lon to make sure it’s within 200km.

Check the logging to see if there is any error message

Choose SATCKDRIVER -> Logging -> Logs

(Optional) Stop Instance to save credit.

(Optional) Start the instance again. In the same page, choose one instance and click START. If the START is disabled, refresh it or wait a minute.

(Optional) How to filter word

1
2
3
4
5
6
7
8
9
10
11
12
func containsFilteredWords(s *string) bool {
filteredWords := []string{
"fuck",
"100",
}
for _, word := range filteredWords {
if strings.Contains(*s, word) {
return true
}
}
return false
}

Update relevant part from your code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func handlerSearch(w http.ResponseWriter, r *http.Request) {

...

for _, item := range searchResult.Each(reflect.TypeOf(typ)) {
p := item.(Post)
fmt.Printf("Post by %s: %s at lat %v and lon %v\n", p.User, p.Message, p.Location.Lat, p.Location.Lon)
// TODO(vincent): Perform filtering based on keywords such as web spam etc.
if !containsFilteredWords(&p.Message) {
ps = append(ps, p)
}

}

...
}

Related string operations
https://golang.org/pkg/strings/

(Optional) Useful Go information

Error vs Panic

  • Error: normal way to handle error status.
  • Panic: normally used when some “impossible” situation happens with no intend recover.

    • Use “recover” to handle when exiting function.
    • recover is like catch

“defer” clause

  • Preferred way to recycle resources
    • E.g. file handles, network connections, etc.
    • Defer executes after return
    • First come last executes
    • Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
func Foo() {
var f File
if f, err = os.Open(“Somefile”); err != nil {
panic(“Error open file”)
}
defer f.Close()

...

// some other error happens
if err != nil {
panic(“Unexpected error”)
}
}
  • Combined with recover to handle panic
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
func Foo() result string, err error {
type openFileError struct{}
type unexpectedError struct{}

defer func() {
switch p := recover(); p {
case nil:
// no panic
case openFileError{}:
err = fmt.Errorf(“Open file error”)
case unexpectedError{}:
err = fmt.Errorf(“Unexpected error”)
default:
panic(p)
}
}()

var f File
if f, err = os.Open(“Somefile”); err != nil {
panic(openFileError{})
}
defer f.Close()

// Some work...

// Some other error happens
if err != nil {
panic(unexpectedError{})
}

return result, nil
}
  • Go’s multiprocessing facility: Communication Sequential Processes
    • https://en.wikipedia.org/wiki/Communicating_sequential_processes
    • Lightweight concurrency, synchronize using message passing
  • Go routine
    • Go’s concurrency mechanism
    • Lightweight (corresponding to fiber)
      • Growable stacks
      • Scheduling
        • m:n scheduling - m goroutine on n OS threads
        • Scheduler invoked implicitly by language construct
          • E.g. time.Sleep, I/O, etc.
  • Have no identity - nothing like thread id
    • Avoid thread-local storage (TLS, tend to be misused, similar to global variable)
    • Easy go routine migration
  • Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// clock.go
package main

import (
"io"
"log"
"net"
"time"
)

func handleConn(c net.Conn) {
defer c.Close()
for {
_, err := io.WriteString(c, time.Now().Format("15:04:05\n"))
if err != nil {
return // e.g., client disconnected
}
time.Sleep(1 * time.Second)
}
}

func main() {
listener, err := net.Listen("tcp", "localhost:8000")
if err != nil {
log.Fatal(err)
}
for {
conn, err := listener.Accept()
if err != nil {
log.Print(err) // e.g., connection aborted
continue
}
handleConn(conn)
//go handleConn(conn) // handle connections concurrently
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// netcat.go
package main

import (
"io"
"log"
"net"
"os"
)

func main() {
conn, err := net.Dial("tcp", "localhost:8000")
if err != nil {
log.Fatal(err)
}
defer conn.Close()
mustCopy(os.Stdout, conn)
}

func mustCopy(dst io.Writer, src io.Reader) {
if _, err := io.Copy(dst, src); err != nil {
log.Fatal(err)
}
}
  • Go channel
    • Communication mechanism for Go routine
    • Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// channel_example.go
package main

import "fmt"

func main() {
messages := make(chan string)

go func() {
for i := 0; i < 5; i++ {
fmt.Println(i)
}
messages <- "done"
}()

msg := <-messages

fmt.Println(msg)
}
  • More reading
    • https://golang.org/doc/
    • The Go Programming Language - http://www.gopl.io/
    • https://blog.golang.org/

ElasticSearch and GCE

Posted on 2019-04-29 | Edited on 2019-04-28 | In CS
Symbols count in article: 15k | Reading time ≈ 14 mins.

Introduction to Around

Demo: http://recordit.co/awrQb1zn2I

Around: a Geo-index based social network

• Built a scalable web service in Go to handle posts and deployed to Google Cloud (GAE flex) for better scaling

• Utilized ElasticSearch (GCE) to provide geo-location based search functions such that users can search nearby posts within a distance (e.g. 200km)

• Used Google Dataflow to implement a daily dump of posts to BigQuery table for offline analysis

• Aggregated the data at the post level and user level to improve the keyword based spam detection (BigQuery)

Create a new cloud project.

Step 2.1.1 Open https://console.cloud.google.com/ and sign in with your gmail account.

Choose Sign up for free trial, enter a credit card (Must do). GCP has $300 discount for new users, which is enough for project purpose.

Step 2.1.2 Click ‘My Project’. Create a new Project called ‘Around’. Google Cloud will automatically generate a new project id for you.

Geo-Index and ElasticSearch

how to perform 1-d range search? For example, given a 1-d binary search tree, find all the values between [3, 18].
ElasticSearch and GCE-2019-4-28-14-47-57.png

What about 2d data?

  • Geo index: lat between [10.2, 10.3] and lon between [120.0, 120.5]
  • Price between [$10, $100], date between [Jan-01-2017, Jun-01-2017]
  • Weight between [120pb,140pb], height between [150cm, 180cm]
  • …

K-D tree is one implementation to solve such k-dimensional search problem.

ElasticSearch and GCE-2019-4-28-14-48-11.png

Then we choose one median point and split the space into two parts horizontally.

ElasticSearch and GCE-2019-4-28-14-48-18.png

Continue to split them into more parts (vertically)

ElasticSearch and GCE-2019-4-28-14-48-25.png

Question: how to find all the points within a Range (R)?

ElasticSearch and GCE-2019-4-28-14-48-32.png

Range Search Algorithm:

  • If query box doesn’t overlap bounding box, stop recursion
  • If bounding box is a subset of query box, report all the points in current subtree
  • If bounding box overlaps query box, recurse left and right

ElasticSearch and GCE-2019-4-28-14-48-40.png

ELK (ElasticSearch, Logstash and Kibana)

Elasticsearch is an open source, distributed, RESTful search engine. As the heart of the Elastic Stack, it centrally stores your data so you can query the data quickly.

SQL is slower. Since ELK uses a pair to search directly.

1
SELECT * FROM LOCATIONS WHERE lat <= 12 AND lat >= 10 AND lon >= 120 AND lon <= 130

Google Compute Engine (GCE)

Cloud Model

  • IaaS (Infrastructure as a service)

Offers virtual machines (Xen, KVM, VirtualBox, VMware ESX, etc.). Amazon EC2 and Google Compute Engine belong to IaaS as well.

  • PaaS (Platform as a service)

computing platform including programming language execution environment, database and web server. Develop and run their software solutions on a cloud platform without the cost and complexity of buying and managing hardware and software layers Microsoft Azure, Google App Engine

  • SaaS (Software as a service)

users are provided access to application software and databases.
Google Apps, GoToMeeting

The overall structure (tech stack) of our project:

1. Go and Web Service-2019-4-27-13-55-46.png

Question: why we need such a complicated infrastructure

Answer: because in industry, we need to handle very different requirements compared to in school.

Configure

Find NETWORKING -> VPC network -> Firewall rules

Click CREATE FIREWALL RULE

In the next page, give it a name like ‘elasticsearch’. Set the Target tags to be ‘es’, source IP ranges to be ‘0.0.0.0/0’, and the specified protocols and ports to be ‘tcp:9200’.

Wait until the firewall rules is created.

Find Compute Engine->VM instances

Choose ‘Change’ and switch to Ubuntu 16. Keep the size of 10GB is fine.

Then in the Networking -> Network tags, set it to be ‘es’ (the firewall rule that we created).

After one minute, you will see the instance is started. Choose ‘SSH’ and then ‘Open in browser window’.

In the terminal, enter

1
2
sudo apt-get update
sudo apt-get install default-jre

It will install java to your vm. To verify, enter ‘which java’ and ‘java -version’ to check

Install ElasticSearch as below

1
2
3
wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.3.1/elasticsearch-2.3.1.deb
sudo dpkg -i elasticsearch-2.3.1.deb
sudo systemctl enable elasticsearch.service

Edit the configuration file

1
sudo vim /etc/elasticsearch/elasticsearch.yml

Add two lines to the config, to allow all traffic and listen on port 9200.

1
2
network.host: 0.0.0.0
http.port: 9200

Save this and Start ElasticSearch

1
sudo service elasticsearch start

Check the status of ElasticSearch

1
sudo service elasticsearch status

You should be able to see ‘active’ in the status

Back to your console.cloud.google.com, find the public IP address(external)

Put the IP address and port together (:9200) in a new tab and see whether the server is on. The name or version might be different.

NOTE: Don’t use https! Should use http.

You should get something like this:

1
2
3
4
5
6
7
8
9
10
11
12
{
"name" : "Honey Lemon",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.3.1",
"build_hash" : "bd980929010aef404e7cb0843e61d0665269fc39",
"build_timestamp" : "2016-04-04T12:25:05Z",
"build_snapshot" : false,
"lucene_version" : "5.5.0"
},
"tagline" : "You Know, for Search"
}

Which means your ES server is running as expected.

Update Go Code

Update handlerSearch

Original source of codes (https://olivere.github.io/elastic/)

In here we have to import source code of elastic.(v3)

1
2
3
4
5
6
7
8
9
To get the package, execute:

go get gopkg.in/olivere/elastic.v3

To import this package, add the following line to your code:

import "gopkg.in/olivere/elastic.v3"

Refer to it as elastic.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
func handlerSearch(w http.ResponseWriter, r *http.Request) {
fmt.Println("Received one request for search")
lat, _ := strconv.ParseFloat(r.URL.Query().Get("lat"), 64)
lon, _ := strconv.ParseFloat(r.URL.Query().Get("lon"), 64)

// range is optional
ran := DISTANCE
if val := r.URL.Query().Get("range"); val != "" {
ran = val + "km"
}

fmt.Printf( "Search received: %f %f %s\n", lat, lon, ran)

// Create a client
client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))
if err != nil {
panic(err)
return
}

// Define geo distance query as specified in
// https://www.elastic.co/guide/en/elasticsearch/reference/5.2/query-dsl-geo-distance-query.html
q := elastic.NewGeoDistanceQuery("location")
q = q.Distance(ran).Lat(lat).Lon(lon)

// Some delay may range from seconds to minutes. So if you don't get enough results. Try it later.
searchResult, err := client.Search().
Index(INDEX).
Query(q).
Pretty(true).
Do()
if err != nil {
// Handle error
panic(err)
}

// searchResult is of type SearchResult and returns hits, suggestions,
// and all kinds of other information from Elasticsearch.
fmt.Printf("Query took %d milliseconds\n", searchResult.TookInMillis)
// TotalHits is another convenience function that works even when something goes wrong.
fmt.Printf("Found a total of %d post\n", searchResult.TotalHits())

// Each is a convenience function that iterates over hits in a search result.
// It makes sure you don't need to check for nil values in the response.
// However, it ignores errors in serialization.
var typ Post
var ps []Post
for _, item := range searchResult.Each(reflect.TypeOf(typ)) { // instance of
p := item.(Post) // p = (Post) item
fmt.Printf("Post by %s: %s at lat %v and lon %v\n", p.User, p.Message, p.Location.Lat, p.Location.Lon)
// TODO(student homework): Perform filtering based on keywords such as web spam etc.
ps = append(ps, p)

}
js, err := json.Marshal(ps)
if err != nil {
panic(err)
return
}

w.Header().Set("Content-Type", "application/json")
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Write(js)
}

Code explanation

Let’s explain these codes line by line

1
client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))

It means we create a connection to ES. If there is err, return.

1
q := elastic.NewGeoDistanceQuery("location")

Prepare a geo based query to find posts within a geo box.

1
2
3
4
5
searchResult, err := client.Search().
Index(INDEX).
Query(q).
Pretty(true).
Do()

Get the results based on Index (similar to dataset) and query (q that we just prepared). Pretty means to format the output.

1
for _, item := range searchResult.Each(reflect.TypeOf(typ)) {

Iterate the result in results which are type of Post (typ)

1
p := item.(Post)

Cast an item to Post, equals to p = (Post) item in java

1
ps = append(ps, p)

Add the p to an array, equals ps.add(p) in java

1
js, err := json.Marshal(ps)

Convert the go object(array) to a string

1
w.Header().Set("Access-Control-Allow-Origin", "*")

Allow cross domain visit for javascript.

We also need to put ES library in import

1
2
3
4
5
6
7
8
9
import (
elastic "gopkg.in/olivere/elastic.v3"
"fmt"
"net/http"
"encoding/json"
"log"
"strconv"
"reflect"
)

Add const before main function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const (
INDEX = "around"
TYPE = "post"
DISTANCE = "200km"
// Needs to update
//PROJECT_ID = "around-xxx"
//BT_INSTANCE = "around-post"
// Needs to update this URL if you deploy it to cloud.
ES_URL = "http://YOUR_ES_IP_ADDRESS:9200"
)

func main() {
fmt.Println("started-service")
http.HandleFunc("/post", handlerPost)
http.HandleFunc("/search", handlerSearch)
log.Fatal(http.ListenAndServe(":8080", nil))
}

Replace YOUR_ES_IP_ADDRESS with your ES address (public IP).

http://externalIp:9200

Step 2.2.5 Open your terminal (Mac and Windows). Enter

1
go get  gopkg.in/olivere/elastic.v3

Once it is done, you won’t be able to see any more compile errors in Intellij. If it does not work (elastic is still red), try to restart it.

More readings

  • Reflect type https://golang.org/pkg/reflect/
  • ES in Go https://github.com/olivere/elastic

Step 2.2.6 In order to create index in ES, we need to update main

What’s mapping in ES? It tells you what’s the type of a variable if not default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
func main() {
// Create a client
client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))
if err != nil {
panic(err)
return
}

// Use the IndexExists service to check if a specified index exists.
exists, err := client.IndexExists(INDEX).Do()
if err != nil {
panic(err)
}
if !exists {
// Create a new index.
mapping := `{
"mappings":{
"post":{
"properties":{
"location":{
"type":"geo_point"
}
}
}
}
}`
_, err := client.CreateIndex(INDEX).Body(mapping).Do()
if err != nil {
// Handle error
panic(err)
}
}

fmt.Println("started-service")
http.HandleFunc("/post", handlerPost)
http.HandleFunc("/search", handlerSearch)
log.Fatal(http.ListenAndServe(":8080", nil))
}
1
exists, err := client.IndexExists(INDEX).Do()

Check if the index exists.

1
2
3
4
5
6
7
8
9
10
mapping := `{
"mappings":{
"post":{
"properties":{
"location":{
"type":"geo_point"
}
}
}
}

If not, create a new mapping. For other fields (user, message, etc.) no need to have mapping as they are default. For geo location (lat, lon), we need to tell ES that they are geo points instead of two float points such that ES will use Geo-indexing for them (K-D tree)

1
_, err := client.CreateIndex(INDEX).Body(mapping).Do()

Create this index

More readings

  • ES in Go https://github.com/olivere/elastic
  • Mapping in ES https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Step 2.2.7 Update handlerPost

Add external import of uuid

1
2
3
4
5
6
7
8
9
10
import (
elastic "gopkg.in/olivere/elastic.v3"
"fmt"
"net/http"
"encoding/json"
"log"
"strconv"
"reflect"
"github.com/pborman/uuid"
)

How to add a new document to the index?

https://github.com/olivere/elastic has an example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Add a document to the index
// Refresh(true) means if duplicate appears, overwrite it
tweet := Tweet{User: "olivere", Message: "Take Five"}
_, err = client.Index().
Index("twitter").
Type("tweet").
Id("1").
BodyJson(tweet).
Refresh(true).
Do(ctx)
if err != nil {
// Handle error
panic(err)
}

Student Question: how to complete this one?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
func handlerPost(w http.ResponseWriter, r *http.Request) {
// Parse from body of request to get a json object.
fmt.Println("Received one post request")
decoder := json.NewDecoder(r.Body)
var p Post
if err := decoder.Decode(&p); err != nil {
panic(err)
return
}
id := uuid.New()
// Save to ES.
saveToES(&p, id)

}

// Save a post to ElasticSearch
func saveToES(p *Post, id string) {
// Create a client
es_client, err := elastic.NewClient(elastic.SetURL(ES_URL), elastic.SetSniff(false))
if err != nil {
panic(err)
return
}

// Save it to index
_, err = es_client.Index().
Index(INDEX).
Type(TYPE).
Id(id).
BodyJson(p).
Refresh(true).
Do()
if err != nil {
panic(err)
return
}

fmt.Printf("Post is saved to Index: %s\n", p.Message)
}

Step 2.2.8 Open terminal, enter

1
go get github.com/pborman/uuid

Make sure your ES_URL is updated from your aws instance in step 1.

Step 2.2.9 Run your main.go again

Errors

If you have such an error, your ES_URL is incorrect.

1
panic: no Elasticsearch node available

If you have such error, you have started two main.go, stop the other one.

1
2
2017/06/30 07:07:01 listen tcp :8080: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.
exit status 1

If you have such error, you forgot to comment one line fmt.Fprintf(w, "Search received: %s %s %s", lat, lon, ran).

1
Search received: %!s(float64=37.4) %...

Step 2.2.10 Open Postman, find the post query from history. (POST, http://localhost:8080/post)

1
2
3
4
5
6
7
8
{
"user":"1111",
"message":"一生必去的100个地方",
"location":{
"lat":37.5,
"lon":-120.1
}
}

And then find the get query from history. (GET, url=http://localhost:8080/search?lat=37.5&lon=-120)
You should get the results back

1
2
3
4
5
6
7
8
9
10
[
{
"user": "1111",
"message": "一生必去的100个地方",
"location": {
"lat": 37.5,
"lon": -120.1
}
}
]

Post more examples with different locations and then get with different lat/lon to see how many results you may get. If you have zero result back and no error, probably your geo location is wrong.

Go Basic & Around project

Posted on 2019-04-28 | In CS
Symbols count in article: 8.6k | Reading time ≈ 8 mins.

Project Introduction

Around: a Geo-index based social network

• Built a scalable web service in Go to handle posts and deployed to Google Cloud (GAE flex) for better scaling

• Utilized ElasticSearch (GCE) to provide geo-location based search functions such that users can search nearby posts within a distance (e.g. 200km)

• Used Google Dataflow to implement a daily dump of posts to BigQuery table for offline analysis

• Aggregated the data at the post level and user level to improve the keyword based spam detection (BigQuery)

constants.js

1
2
3
export const API_ROOT = 'https://around-75015.appspot.com';
export const TOKEN_KEY = 'TOKEN';
export const AUTH_PREFIX = 'Bearer';

1. Go and Web Service-2019-4-27-13-55-46.png

Why we need to learn about Go?

Answer: server language for next generation (both computer friendly and developer friendly)

Who is using Golang

  • Google (Creator), Uber, AirBnb
  • Dropbox, Facebook, eBay, Heroku, Douban, Jingdong, Meituan

As opposed to Java, Go is compiled to machine code and is executed directly. Much like C.

Go example

1
2
3
4
f, err := os.Open("filename.ext")
if err != nil {
log.Fatal(err)
}

Other benefits

  • Multiple return values
  • Multi-threading
    • Go Routine
    • Channels
  • Has borrowed many good ideas from python

Python is really easy to break.

  • Does not compile. You will never know it breaks until it breaks.
  • Language is confusing.
  • Good for testing and evaluation with many excellent machine learning libraries.
1
2
3
4
5
6
>>> a = set(['hippo'])
>>> a
set(['hippo'])
>>> a = set('hippo')
>>> a
set(['h', 'i', 'p', 'o'])

Setup Local Environment (Go)

Make sure you have installed Go based on Prerequisite.
Mac Users (Windows User wait a minute)

Step 1.1.1 Open Terminal, enter

1
go version

It should show something like go version go1.10.2 darwin/amd64.

Step 1.1.2(Optional: GOPATH default to ~/go) In the same Terminal, enter

1
2
3
4
mkdir ~/go
echo export GOPATH=~/go >> ~/.bash_profile
source ~/.bash_profile
echo $GOPATH

In here we must not overwrite the bin of go, add export PATH=$PATH:/usr/local/go/bin to .bash_profile and update zsh profile

Step 1.1.3 In the same Terminal, enter

1
2
3
mkdir -p  ~/Documents/Around/service
cd ~/Documents/Around/service
vim main.go

Step 1.1.4 Type ‘a’ or ‘i’ and then copy a hello world program into main.go

1
2
3
4
5
6
7
package main

import "fmt"

func main() {
fmt.Println("Hello, world")
}

Step 1.1.5 Exit with :wq , in the terminal, type

1
go run main.go

You should be able to see the output of “Hello, world”

First Go program

First, we need to define some objects in a Go program to represent the data we store.

Step 1.3.1 let’s define the struct for post and location.

Encode json object (https://golang.org/pkg/encoding/json/)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package main

import "fmt"

type Location struct {
Lat float64 `json:"lat"`
Lon float64 `json:"lon"`
}

type Post struct {
// `json:"user"` is for the json parsing of this User field. Otherwise, by default it's 'User'.
User string `json:"user"`
Message string `json:"message"`
Location Location `json:"location"`
}

func main() {
fmt.Println("Hello, world")
}

Step 1.3.2 Add one method handlerPost() after main() to handle Post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func main() {
fmt.Println("Hello, world")
}

func handlerPost(w http.ResponseWriter, r *http.Request) {
// Parse from body of request to get a json object.
fmt.Println("Received one post request")
decoder := json.NewDecoder(r.Body)
var p Post
if err := decoder.Decode(&p); err != nil {
panic(err)
return
}

fmt.Fprintf(w, "Post received: %s\n", p.Message)
}

In here if you add rawString json:"lat" after variable, it will parse it automatically

What does this method do? If user sends a http request with a body as

1
2
3
4
{
“user”: “jack”
“message”: “this is a message”
}

Then it will automatically construct a Post object named p and update its values to be p.User = “jack” and p.Message = “this is a message”

Fmt %s means string

Just one line of code to decode json object into go object. In comparison, if you do it in java.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
JSONObject venue = getVenue(event);
if (venue != null) {
if (!venue.isNull("address")) {
JSONObject address = venue.getJSONObject("address");
StringBuilder sb = new StringBuilder();
if (!address.isNull("line1")) {

sb.append(address.getString("line1"));
}
if (!address.isNull("line2")) {
sb.append(address.getString("line2"));
}
if (!address.isNull("line3")) {
sb.append(address.getString("line3"));
}
builder.setAddress(sb.toString());
}
if (!venue.isNull("city")) {
JSONObject city = venue.getJSONObject("city");
builder.setCity(getStringFieldOrNull(city, "name"));
}
if (!venue.isNull("country")) {
JSONObject country = venue.getJSONObject("country");
builder.setCountry(getStringFieldOrNull(country, "name"));
}
if (!venue.isNull("state")) {
JSONObject state = venue.getJSONObject("state");
builder.setState(getStringFieldOrNull(state, "name"));
}
builder.setZipcode(getStringFieldOrNull(venue, "postalCode"));
if (!venue.isNull("location")) {
JSONObject location = venue.getJSONObject("location");
builder.setLatitude(getNumericFieldOrNull(location, "latitude"));
builder.setLongitude(getNumericFieldOrNull(location, "longitude"));
}
}

Step 1.3.3 Replace main function to call handlerPost when started. Final version:

1
2
3
4
5
6
7
8
9
10
11
12
import (
"fmt"
"net/http"
"encoding/json"
"log"
)

func main() {
fmt.Println("started-service")
http.HandleFunc("/post", handlerPost)
log.Fatal(http.ListenAndServe(":8080", nil))
}

Step 1.3.5 Open your postman,
Choose ‘POST’, the url is http://localhost:8080/post, in the request body, enter

1
2
3
4
5
6
7
8
{
"user":"1111",
"message":"一生必去的100个地方",
"location":{
"lat":37.5,
"lon":-120.1
}
}

Click ‘Send’, you should be able to see a response with

1
Post received: 一生必去的100个地方

Add another handler for search (called it handlerSearch), the request has a url pattern like
http://localhost:8080/search?lat=10.0&lon=20.0. Parse it and then print out the lat and lon.

Note: to get request parameters from url

1
lat := r.URL.Query().Get("lat")

Step 1.3.6 Answer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
func main() {
fmt.Println("started-service")
http.HandleFunc("/post", handlerPost)
http.HandleFunc("/search", handlerSearch)
log.Fatal(http.ListenAndServe(":8080", nil))
}

func handlerSearch(w http.ResponseWriter, r *http.Request) {
fmt.Println("Received one request for search")
lat := r.URL.Query().Get("lat")
lon := r.URL.Query().Get("lon")

fmt.Fprintf(w, "Search received: %s %s", lat, lon)
}

Step 1.3.7 return a JSON object. Change handlerSearch to be

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Import (
…
"strconv"
)

const (
DISTANCE = "200km"
)

func handlerSearch(w http.ResponseWriter, r *http.Request) {
fmt.Println("Received one request for search")
lat, _ := strconv.ParseFloat(r.URL.Query().Get("lat"), 64)
lon, _ := strconv.ParseFloat(r.URL.Query().Get("lon"), 64)
// range is optional
ran := DISTANCE
if val := r.URL.Query().Get("range"); val != "" {
ran = val + "km"
}

fmt.Println("range is ", ran)

// Return a fake post
p := &Post{
User:"1111",
Message:"一生必去的100个地方",
Location: Location{
Lat:lat,
Lon:lon,
},
}

js, err := json.Marshal(p)
if err != nil {
panic(err)
return
}

w.Header().Set("Content-Type", "application/json")
w.Write(js)
}

在Go里面,变量必须得用到,不然会报错,除非用下划线做变量名

Step 1.3.8 Open Postman, url is http://localhost:8080/search?lat=10.0&lon=20.0&range=300 click send again. You should see something like this

1
2
3
4
5
6
7
8
{
"user": "1111",
"message": "一生必去的100个地方",
"location": {
"lat": 10,
"lon": 20
}
}
123

Anda Luo

Notes of study, work and life
29 posts
2 categories
28 tags
GitHub E-Mail
© 2019 Anda Luo | 277k | 4:12
Powered by Hexo v3.8.0
|
Theme – NexT.Muse v7.0.1