The document discusses LinkedIn's communication architecture and network updates service. It describes how LinkedIn built scalable communication platforms to support its large professional network. The system evolved from handling 0 to 22 million members. It uses Java, databases like Oracle and MySQL, application servers like Tomcat and Jetty, and technologies like ActiveMQ, Lucene and Spring. The communication service handles messages and email delivery while the network updates service distributes short-lived notifications across LinkedIn's various clients and services.
2. Learn how we at LinkedIn built and evolved scalable
communication platform for the world’s largest
professional network
2008 JavaOneSM Conference | java.sun.com/javaone | 2
3. Agenda
Why are we doing this talk
LinkedIn Communication Platform at a glance
• Evolution of LinkedIn Communication System
• Evolution of the Network Updates System
Scaling the system: from 0 to 22M members
Q&A
2008 JavaOneSM Conference | java.sun.com/javaone | 3
4. Why are we doing this talk?
Share our experience in building the world-largest
professional network in Java™
Describe the evolution of the communication platform
Share lessons we learned so you could benefit from our
successes, mistakes and experience
2008 JavaOneSM Conference | java.sun.com/javaone | 4
8. LinkedIn Communication Platform
The Numbers
22M members
130M connections
2M email messages per day
250K invitations per day
2008 JavaOneSM Conference | java.sun.com/javaone | 8
9. LinkedIn Communication Platform
The Setup
Sun™ x86 platform and Sparc production hardware
running Solaris™ Operating System
100% Java programming language
Tomcat and Jetty as application servers
Oracle and MySQL as DBs
ActiveMQ for JMS
Lucene as a foundation for search
Spring as a glue
Mac for development
2008 JavaOneSM Conference | java.sun.com/javaone | 9
10. LinkedIn Communication Platform
The Communication Service The network updates service
• Permanent message storage • Short-lived notifications (events)
• InBox messages • Distribution across various
• Emails affiliations and groups
• Batching, delayed delivery • Time decay
• Bounce, cancellation • Events grouping and
• Actionable content prioritization
• Rich email content
2008 JavaOneSM Conference | java.sun.com/javaone | 10
11. The Communication Service
How is it different:
• Workflow oriented
• Messages reference other objects in the system
• Incorporates email delivery
• Batching of messages
• Message cancellation
• Delayed delivery, customer service review queues, abuse controls
• Supports reminders and bounce notifications to users
Has undergone continuous improvements throughout life
of LinkedIn
2008 JavaOneSM Conference | java.sun.com/javaone | 11
14. The Communication Service
Message Creation Message Delivery
• Clients post messages via • Message delivery is triggered by
asynchronous Java Communications clients or by scheduled processes
API using JMS • Delivery actions are asynchronous
• Messages then are routed via • Messages can be batched for
routing service to the appropriate delivery into a single email message
mailbox or directly for email • Message content is processed
processing through the JavaServer Page™ (JSP™)
• Multiple member or guest technology for pretty formatting
databases are supported • The scheduler can take into account
the time, delivery preferences,
system load
• Bounced messages are processed
and redelivered if needed
• Reminder system works the same
way as message delivery system
2008 JavaOneSM Conference | java.sun.com/javaone | 14
15. The Communication Service
SOA architecture
Wireable components build around LinkedIn Spring
extensions
Spring HTTP-RPC
Heavy use of JMS and asynchronous flows
2008 JavaOneSM Conference | java.sun.com/javaone | 15
16. The Communication Service
Failure Recovery
Possible failures:
• Messages can bounce
• Messages can get lost:
• Database problems
• Bugs in the code
• Bugs in the content processing of emails
• Various services may become unavailable
Avoiding the downtime
2008 JavaOneSM Conference | java.sun.com/javaone | 16
17. The Communication Service
How do we scale it?
Functional partitioning:
• sent, received, archived, etc.
Class partitioning:
• Member mailboxes, guest mailboxes, corporate mailboxes
Range partitioning:
• Member ID range
• Email lexicographical range
Asynchronous flows
2008 JavaOneSM Conference | java.sun.com/javaone | 17
18. Network Updates Service
What is your network up to?
The goal is to have a flexible service for distributing many
types of short-lived updates
Availability across a number of clients (web apps, RSS,
API, LinkedIn Mobile, third-party…)
2008 JavaOneSM Conference | java.sun.com/javaone | 18
19. Network Updates Service
Motivation
Homepage circa 2007
Poor UI
• Cluttered
• Where does new content go?
Poor Backend Integration
• Many different service calls
• Takes a long time to gather all
of the data
2008 JavaOneSM Conference | java.sun.com/javaone | 19
20. Network Updates Service
Motivation
Homepage circa 2008
Clean UI
• Eliminates contention for
homepage real estate
Clean Backend
• Single call to fetch updates
• Consistent update format
2008 JavaOneSM Conference | java.sun.com/javaone | 20
21. Network Updates Service
Iteration 1
Move existing homepage logic into a remote service,
refactor homepage to use the new service
Advantages
• No user impact while API is being finalized
• Improve performance by fetching updates in parallel
• Reduce complexity of the web app
• Updates become easily accessible to other clients
2008 JavaOneSM Conference | java.sun.com/javaone | 21
23. Network Updates Service
Iteration 1 - API
Pull-based architecture
Collectors
• Responsible for gathering data
• Parallel collection to improve performance
Resolvers
• Fetch state, batch lookup queries, etc…
• Use EHCache to cache global data (e.g., member info)
Rendering
• Transform each object into its XML representation
2008 JavaOneSM Conference | java.sun.com/javaone | 23
26. Network Updates Service
Iteration 1
Lessons learned:
• Centralizing updates into a single service leaves a single point of
failure
• Be prepared to spend time tuning the HttpConnectionManager
(timeouts, max connections)
• While the system was stabilizing, it was affecting all users; should
have rolled the new service out to a small subset!
• Don’t use “Least Frequently Used” (LFU) in a large EHCache—very
bad performance!
2008 JavaOneSM Conference | java.sun.com/javaone | 26
27. Network Updates Service
Iteration 2
Hollywood Principle: “Don’t call me, I’ll call you”
Push update when an event occurs
Reading is much quicker since we don’t have to search for
the data!
Tradeoffs
• Distributed updates may never be read
• More storage space needed
2008 JavaOneSM Conference | java.sun.com/javaone | 27
30. Network Updates Service
Iteration 2
Pushing Updates
• Updates are delivered via JMS
• Aggregate data stored in 1 CLOB column for each target user
• Incoming updates are merged into the aggregate structure using
optimistic locking to avoid lock contention
Reading Updates
• Add a new collector that reads from the Update Database
• Use Digesters to perform arbitrary transformations on the stream
of updates (e.g, collapse 10 updates from a user into 1)
2008 JavaOneSM Conference | java.sun.com/javaone | 30
31. Network Updates Service
Iteration 2
Lessons learned:
• Underestimated the volume of updates to be processed
• CLOB block size was set to 8k, leading to a lot of wasted space
(which isn’t reclaimed!)
• Real-time monitoring/configuration with Java Management
Extension (JMX™) specification was extremely helpful
2008 JavaOneSM Conference | java.sun.com/javaone | 31
32. Network Updates Service
Iteration 3
Updating a CLOB is expensive
Goal: Minimize the number of CLOB updates
• Use an overflow buffer
• Reduce the size of the updates
2008 JavaOneSM Conference | java.sun.com/javaone | 32
33. Network Updates Service
Iteration 3 - Overflow Buffer
Add VARCHAR(4000) column
that acts as a buffer
When the buffer is full, dump it
to the CLOB and reset
Avoids over 90% of CLOB
updates (depending on type),
while still retaining the
flexibility for more storage
2008 JavaOneSM Conference | java.sun.com/javaone | 33
34. Scaling the system
What you learn as you scale: What to do:
• A single database does not work • Partition everything:
• Referential integrity will not be • by user groups
possible • by domain
• Cost becomes a factor: • by function
databases, hardware, licenses, • Caching is good even when it’s
storage, power only modestly effective
• Any data loss is a problem • Give up on 100% data integrity
• Data warehousing and analytics • Build for asynchronous flows
becomes a problem • Build with reporting in mind
• Your system becomes a target • Expect your system to fail at any
for spamming exploits, data point
scraping, etc.
• Never underestimate growth
trajectory
2008 JavaOneSM Conference | java.sun.com/javaone | 34
35. LinkedIn Communication Architecture
Build with scalability in mind - never know when your
business will take off
Expect to do a lot of architecture and code refactoring as
you learn and appreciate growth challenges
2008 JavaOneSM Conference | java.sun.com/javaone | 35
37. The Communication Service
LinkedIn Spring Extensions
Automatic context instantiation Support for Builder Pattern
from multiple spring files Custom property editors:
LinkedIn Spring Components • Timespan (30s, 4h34m, etc.)
Property expansion • Memory Size, etc.
Automatic termination handling
2008 JavaOneSM Conference | java.sun.com/javaone | 37
38. The Communication Service
LinkedIn Spring Extensions
Comm-server/
cmpt/
components/
ccsServiceExporter.spring
comm.spring
jmx.spring
comm-server.properties
corpMboxServiceExporter.spring
main.spring
comm-server.spring
memberMboxServiceExporter.spring
comm.properties
guestMboxServiceExporter.spring
build.xml
impl/
…
2008 JavaOneSM Conference | java.sun.com/javaone | 38
39. The Communication Service
LinkedIn Spring Extensions
…
<bean id=quot;resolver”
class=quot;com.linkedin.comm.pub.impl.MessageAddressResolverquot;>
<lin:config>
<property name=quot;resolverDBquot; ref=quot;resolverDBquot;/>
<property name=quot;eosquot; ref=quot;eosquot;/>
<property name=quot;elsquot; ref=quot;eosquot;/>
<property name=quot;memberAccessorquot;
ref=quot;coreMemberAccessorquot;/>
</lin:config>
</bean>
…
2008 JavaOneSM Conference | java.sun.com/javaone | 39
40. The Communication Service
LinkedIn Spring Extensions (Builder)
private final MessageAddressResolverDB _resolverDB;
…
MessageAddressResolver(Config config) { … }
…
public static class Config {
private MessageAddressResolverDB _resolverDB;
public MessageAddressResolverDB getResolverDB() {
return ConfigHelper.getRequired(_resolverDB);
}
public void setResolverDB(MessageAddressResolverDB
resolverDB) {
_resolverDB = resolverDB;
}
}/*Config*/
2008 JavaOneSM Conference | java.sun.com/javaone | 40
41. The Communication Service
LinkedIn Spring Extensions (Components)
…
<lin:component
id=quot;remoteContentCommunicationServicequot;
location=quot;comm-server-client-cmptquot;>
<lin:wire property-name=quot;activemq.producer.brokerURLquot;
property-
value=quot;${activemq.producer.brokerURL}quot;/>
<lin:wire property-name=quot;comm.server.httpRpc.urlquot;
property-
value=quot;${leo.comm.server.httpRpc.url}quot;/>
</lin:component>
…
2008 JavaOneSM Conference | java.sun.com/javaone | 41