Posts

Design Web Crawler

Image
 Functional Requirements: Download all webpages addressed by the urls Generate reverse index of words to pages for search engine Generate title and snippet Pages with duplicate content should be ignored URLs should be prioritised Non Functional Requirements: High Availability Scalability using parallelisation Robustness: Handle, unresponsive servers, crashes, malicious links, bad HTML Politeness: Crawler should not make too many requests to a website within a short period of time. Details: HTML Parser and Content parser service are worker with threads for each URL. Content Parser passes redis key in the queue, next service can fetch the content from redis and process further. Duplicate content service checks if same cintent id already present in the content storage. It may compare the hash values or Sim hash used by Google. Reverse Index and documented service is additional uses on content. URL extractor extracts the url from the page URL filter excludes certain type, file extensio...

Design Chat System

Image
  Functional Requirements: Login, Authorisation, User profile One to One chat Group Chat Online Presence Message Delivery Notification Push notification when user is offline Non Functional Requirements: High Availability Scalability, high throughput Keep chat history forever 1. Login Authorisation and User Profile 2. One to One Chat 3. Group Chat

Design Notification System

Image
 Functional Requirements: Push notification should be send to subscribers System should be able to prioritise the notification. Example : OTP is the highest priority notification No same notification should be send twice. Avoid duplicate notification Retry Mechanism Single or bulk notification Log notification status, dispatched, delivered, seen Non Functional Requirements: Notification system should be highly available and reliable. No duplicate message Message should not be loss. High Level Architecture Diagram: Component Wise Diagram (As above diagram is too small to visualise) 1. Load Balanacer and API Gateway: 2. Databases: 3. Prioritiser and Router: 4. Router, Queue,Workers and Third party Modules:  App registration flow:

Three-phase Commit (3PC)

 Three-phase Commit (3PC) Pre-requisite: Read about  Two-phase commit   (2PC), then come back here for better understanding.   Two-phase commit ( 2PC ) is called a blocking atomic commit protocol due to fact that 2PC can become stuck waiting for the coordinator to recover.   3PC assumes a network with bounded delay and nodes with bounded response time.   In general, nonblocking atomic commit requires a perfect failure detector. I.e. a reliable mechanism for telling whether a node has crashed or not.   In a network with unbounded delay, a timeout is not a reliable failure detector, because a request may timeout due to a network problem even if no node has crashed.   For this reason, 2PC continues to be used, despite the known problem with coordinator failure.  

Two-Phase Commit (2PC)

 Two-Phase Commit (2PC) Two-phase commit (2PC) is an algorithm for achieving Atomic transaction commit across multiple nodes. 2PC is used internally in some databases and also made available to applications in the form of XA (extended architecture) transactions. XA transaction is supported by JAVA transaction API or via WSAtomicTransaction for SOAP web service.   Commit/abort process in 2PC is split into two phases.   Coordinator or Transaction Manager:   The coordinator is often implemented as a library within the same application process that is requesting the transaction, but it can also be separate process or service.   Phase1:   When application is ready to commit, the coordinator begins phase1.   It sends prepare request to each node, asking them whether they are able to commit. The coordinator then track the responses from the participants .   Phase2:   If all participants reply “yes ”, indicating they are ready to commit, th...