Sapien Blogs

Know the Unknown

Thursday Sep 26, 2013

Java socket properties for tuning and resiliency

Socket Connection Timeout

See Socket Api
public void connect(SocketAddress endpoint,int timeout) throws IOException

Socket Read Timeout/  SoTimeout

Timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets
See Socket API
public void setSoTimeout(int timeout) throws SocketException

TCPNoDelay
OOBInline
Send BufferSize
Receive BufferSize

Socket Keep Alive

You can't send keep-alive on demand, and for most operating systems the TCP keepalive timeout is only configurable on a system-wide level and set far too high to be generally useful for applications. In Java you can turn on keep alive for a socket, but you cant configure keep-alive timing. SO_KEEPALIVE is implemented in the OS network protocol stacks without sending any "real" data. The keep-alive interval is operating system dependent, and may be tuneable via a kernel parameter.

See Socket Api
public void setKeepAlive(boolean on) throws SocketException

Java 7 Sockets Direct Protocol(SDP)

Write directly to Infiniband. By configuring the Java VM’s specific join-point to the InfiniBand OS device driver and libraries (aka InfiniBand’s VERBs layer API) the application code’s use of java.net.* and java.io.* - which is Java’s API for Transport Layer OS resources (OSI layer 4) can by-pass the traditional network protocol stack (i.e. it can by-pass OSI layer 3 and bypass OSI layer 2) and go *directly* to InfiniBand (OSI layer 1). The performance implications and payoffs are very significant.
http://www.infoq.com/articles/Java-7-Sockets-Direct-Protocol

HttpURLConnection

Http Keep-Alive is used for Http connection reuse  or persistenet connections. If keep-alive is set, after a http request-response is completed, Java will reuse the underlying socket/TCP connection for another http request-response. Keep-alive is set by "keepAlive:true" http header and terminated by "Connection: close" header. In order to determine the end of a request/response, each response should have a Content-Length header set, or each chunk start with size in case of chunked transfer encoding. Java will clean-up the connection and reuse the socket after the application has finished reading  the response body or made a close() on the URLConnection. See more details -http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html

JDBC Connection

MySql JDBC Driver - See connection imeout and read timeout as connection properties in http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html
Oracle - oracle.jdbc.ReadTimeout property and others- see driver documentation
SQLServer - see jTDS driver or Microsoft driver documentation.

Half Open, Half-Closed Connections

http://www.codeproject.com/Articles/37490/Detection-of-Half-Open-Dropped-TCP-IP-Socket-Conne
http://www.pcvr.nl/tcpip/tcp_conn.htm

See socket Apis. Causes FIN to be sent to the other side as one way close.
public void shutdownOutput() throws IOException

See other shutdownInput() which places the socket InputStream at end of stream. Any data sent to the input stream side of the socket is acknowledged and then silently discarded.

Linger Option - Orderly vs. Abortive connection close in Java

Orderly close -socket.close() without 0 linger timout causes socket to shutdown orderly. This sends a FIN to the other side of the TCP connection, which means finished sending. However the other side of the TCP connection may continue sending data because only one side of the stream is closed. Since the socket is closed on this side, the data other side had sent after the close will be lost. One solution is to call linger timeout 0 before close() which causes abandoning the socket by stopping both read and write( the write buffer still may have some data to send). Other option(clean) is to wait for read to finish before calling close(). Else consider calling shutdownOutput().
Abortive close- call setSoLinger(true,0) before calling close(). This will send RST instead of FIN to the other side of the TCP connection.

See
http://docs.oracle.com/javase/7/docs/technotes/guides/net/articles/connection_release.html
http://ia600609.us.archive.org/22/items/TheUltimateSo_lingerPageOrWhyIsMyTcpNotReliable/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.html

TCP Lifecycle

Remote, Client* Local, Server*
*or vice-versa, though requests typically originate at clients.
2 Listening
Awaiting connection request.
3 Syn-Sent
Sent connection-request.1
Awaiting acknowledgement.1
Awaiting connection-request.2
Received acknowledgement.1
Received connection-request.2
Sent acknowledgement.2


4 Syn-Received
Received connection-request.1
Sent acknowledgement.1
Sent connection-request.2
Awaiting acknowledgement.2
5 Established
The connection is open.
Data moves both directions.
5 Established
Received acknowledgement.2
The connection is open.
Data moves both directions.
6 Fin-Wait.1
Sent close-request.a
Awaiting acknowledgement.a
Awaiting close-request.b
8 Close-Wait
Received close-request.a
Sent acknowledgement.a
When finished sending data,
will send close-request.b
7 Fin-Wait.2
Received acknowledgement.a
Still awaiting close-request.b
or
10 Closing
Received close-request.b
Sent acknowledgement.b
Still awaiting acknowledgement.a
9 Last-Ack
Sent close-request.b
Awaiting acknowledgement.b
11 Time-Wait
Received acknowledgement.a
Received close-request.b
Sent acknowledgement.b
Allowing time for delivery
of acknowledgement.b
1 Closed
A "fictional" state;
there is no connection.
2 Listening
Awaiting connection request.

References

  1. http://en.wikipedia.org/wiki/Transmission_Control_Protocol
  2. http://en.wikipedia.org/wiki/Embryonic_connection
  3. http://docs.oracle.com/javase/7/docs/technotes/guides/net/articles/connection_release.html
  4. http://ia600609.us.archive.org/22/items/TheUltimateSo_lingerPageOrWhyIsMyTcpNotReliable/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable.html
  5. http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html
  6. http://www.sdsusa.com/connections
  7. https://forums.oracle.com/thread/2231493
  8. http://www.infoq.com/articles/Java-7-Sockets-Direct-Protocol
  9. http://www.codeproject.com/Articles/37490/Detection-of-Half-Open-Dropped-TCP-IP-Socket-Conne
  10. http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html

Tuesday Sep 24, 2013

How atomic variables work in Java

Locking is expesive in JVMs. If the the the thread that aquired lock has long running logic or page fault or some other scheduling delay, then the waiting threads won't get run. When there are many threads trying to aquire the same lock, performance will suffer due to heavy contention. Locks have other bigger issue of deadlock due to bad logic or programming. Dead locks are often not discovered in the initial stages of  the development , but surfaces when the code-base has grown to certain level or the number of users have have grown to certain extend. At that time, fixing a complicated deadlock issue can be a tough nut.
Volatile variables are more efficient than locking. However they are not suitable for common read-update-write operations.

Hardwate Primitives

Most modern processors include support for multiprocessing, like the ability for multiple processors to share peripherals and main memory, and also the instructions for updating shared variables in a way that can either detect or prevent concurrent access from other processors.

Compare and swap (CAS) and Lock-Free algorithms

In modern microprocessors, a CAS operation includes three operands -- a memory location (V), the expected old value (A), and a new value (B). The processor will atomically update the location to the new value if the value that is there matches the expected old value, otherwise it will do nothing. In either case, it returns the value that was at that location prior to the CAS instruction. Instructions like CAS allow an algorithm to execute a read-modify-write sequence without fear of another thread modifying the variable in the meantime, because if another thread did modify the variable, the CAS would detect it (and fail) and the algorithm could retry the operation.

Concurrent algorithms based on CAS are called lock-free, because threads do not ever have to wait for a lock. Either the CAS operation succeeds or it doesn't, but in either case, it completes in a predictable amount of time. If the CAS fails, the caller can retry the CAS operation or take other action as it sees fit.

Atomic Classes

Atomic variables classes in the java.util.concurrent.atomic package, all expose a compare-and-set primitive (similar to compare-and-swap), which is implemented using the fastest native construct available on the platform (compare-and-swap, load linked/store conditional, or, in the worst case, spin locks).They offer better performance without the drawbacks of synchronization or locking.

References

  1. http://www.ibm.com/developerworks/library/j-jtp11234/

Monday Sep 23, 2013

Volatile, Synchronized, Final, Static and Singletons in Multi-Threading

Visibility, Ordering and Cache Coherency

One of the key concepts inJMM(Java Memory Model JSR 133) is that of visibility -- how do you know that if thread A executes someVariable = 9, other threads will see the value written there by thread A? A number of reasons exist for why another thread might not immediately see the value 9 for someVariable: it could be because the compiler has reordered instructions in order to execute more efficiently, or that someVariable was cached in a register, or that its value was written to the cache on the writing processor but not yet flushed to main memory, or that there is an old (or stale) value in the reading processor's cache. It is the memory model that determines when a thread can reliably "see" writes to variables made by other threads. In particular, the memory model defines semantics for volatile, synchronized, and final that make guarantees of visibility of memory operations across threads

Cache Coherency is a hardware semantic, not JVM. Not all multiprocessor systems exhibit cache coherency; if one processor has an updated value of a variable in its cache, but one which has not yet been flushed to main memory, other processors may not see that update. In the absence of cache coherency, two different processors may see two different values for the same location in memory.

Volatile

Under the new memory model, when thread A writes to a volatile variable V, and thread B reads from V, any variable values that were visible to A at the time that V was written are guaranteed now to be visible to B. The result is a more useful semantics of volatile, at the cost of a somewhat higher performance penalty for accessing volatile fields.
This means After Thread a writes to volatile V, it flushes all writes to Main memory so that other memory reads can see them. Thread B, which reads V subsequently, since the valiable is volatile , loads all variables from Main Memory and can see all the changes Thread A flushed to memory.
  • You can use Volatile variable if you want to read and write long and double variable atomically. long and double both are 64 bit data type and by default writing of long and double is not atomic and platform dependence.
  • Volatile variable can be used as an alternative way of achieving synchronization in Java in some cases, like Visibility. with volatile variable its guaranteed that all reader thread will see updated value of volatile variable once write operation  completed
  • volatile variable can be used to inform compiler that a particular field is subject to be accessed by multiple threads, which will prevent compiler from doing any reordering or any kind of optimization
  • volatile keyword in Java guarantees that value of volatile variable will always be read from main memory and not from Thread's local cache.
  • Volatile is different from synchronized, because it's doesn't provide mutual exclusion, and only gives visibility and ordering guarantee.
See http://cs.oswego.edu/pipermail/concurrency-interest/2012-May/009440.html for issue with syncing non-volatile vaiables on volatile read
See http://javarevisited.blogspot.in/2011/06/volatile-keyword-java-example-tutorial.html for more programming aspects of volatile

Synchronized

Synchronization enforces certain memory visibility rules as specified by the JMM. It ensures that caches are flushed when exiting a synchronized block and invalidated when entering one, so that a value written by one thread during a synchronized block protected by a given monitor is visible to any other thread executing a synchronized block protected by that same monitor. It also ensures that the compiler does not move instructions from inside a synchronized block to outside (although it can in some cases move instructions from outside a synchronized block inside). The JMM does not make this guarantee in the absence of synchronization

Immutable objects, like String, are supposed to require no synchronization. However, because of potential delays in propagating changes in memory writes from one thread to another, there is a possible race condition that would allow a thread to first see one value for an immutable object, and then at some later time see a different value. - See Final

Final

The final field is a means of what is sometimes called safe publication. Here, "publication" of an object means creating it in one thread and then having that newly-created object be referred to by another thread at some point in the future. When the JVM executes the constructor of your object, the pointer to the object could be stored to main memory and accessed before the fields within the object have been committed to main memory. This lead to other thead seeing the object ina partially constructed state. This can happen because of compiler ordering. With final keyword, JVM will ensure, once the object pointer is available to other threads, so are the correct values of that object's final fields (only final fields!).

Static 

Static fields which are class level fields, resides in Thread local cache or CPU cache. This means each thread/CPU will have its own copy of static field. You will have to use synchronized or volatile keywords to synchronize static fields if you require.

Singleton objects

Singleton is not any special object in JMM. Threads will have their own copy of Singleton's fields. Use synchronized or volatile if you need to synchronize fields among threads.

References

  1. http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
  2. http://www.ibm.com/developerworks/library/j-jtp02244/index.html
  3. http://www.ibm.com/developerworks/library/j-jtp03304/
  4. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
  5. http://javarevisited.blogspot.in/2011/06/volatile-keyword-java-example-tutorial.html
  6. http://www.infoq.com/presentations/click-crash-course-modern-hardware
  7. https://community.jboss.org/wiki/ThreadSafeCode?_sscc=t
  8. http://www.javamex.com/tutorials/synchronization_final.shtml


Saturday Sep 21, 2013

JVM on Multi-Core CPUs

Today one of my teammate brought up the discussion of JVM utilizing multiple cores in modern CPUs, and what application engineers need to understand and do to fully utilize such capabilities. At high level this is a question of parallelization in applications. Following points touches upon this topic.
  • JVM makes use of multiple cores
    • Configure JVM to use one or more CPUs - No such configuration? If the application uses threads and, there are multiple CPUs available, and the OS supports multiple CPUs or cores, then JVM threads will run on multiple CPUs. You can configure OS to use only one CPU("/onecpu" option in Windows)
    • Configure compiler to generate code to make use of multiple CPUs -   No such configuration?
    • Green Threads vs. Native Threads -  Gree threads are the threads simulated by the JVM- Discontinued after JDK1.1. Native threads are OS threads. JVM spawns OS threads in this case.
    • The G1 Garbage Collector introduced in newer releases also makes use of multi-core hardware. http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
  • Parallellization performance on multiple CPUs
  • Application design
    • Design for scalability- divide and conquer by splitting the job to multiple tasks(May require a design mindset change)
    • Low performance if the application is never CPU-bound (meaning the CPU is not the bottleneck)
    • locking and unlocking objects requires a large amount of IPI (inter-processor interrupts) and the processors may spend more time discarding their L1 and L2 caches and re-fetching data from other CPUs than they actually spend making progress on solving the problem at hand
    • Use java.util.concurrent package (for application- locking, barriers, executors etc.)
      • java.util.concurrent
        • Executors, Callable, Futures
        • Synchronizers(Semaphone, CountDownLarch, CyclicBarrier, Exchangers)
        • Queues (BlockingQueues, ConcurrentLinkedQueue)
        • ConcurrentCollections(ConcurrentHashMap, CopyOnWriteHashMap, CopyOnWriteHashSet)
    • Java 7 Fork/Join Framework
    • Java 7 ParallerArray
    • Java Lock Monitor - http://perfinsp.sourceforge.net/jlm.html

Check number processors available to JVM

int n = Runtime.getRuntime().availableProcessors(); See CR 5048379.

Further Thoughts

Behavior of static, volatile, Singletons etc. in muti CPU environment
SMT, CMT, CMP etc. - http://en.wikipedia.org/wiki/Simultaneous_multithreading

References

https://wikis.oracle.com/display/HotSpotInternals/Synchronization
http://mailinator.blogspot.in/2010/02/how-i-sped-up-my-server-by-factor-of-6.html
http://www.cs.hut.fi/u/tlilja/multicore/slides/java_multicore.pdf
http://www.oracle.com/technetwork/articles/javase/index-140767.html
http://embarcaderos.net/2011/01/23/parallel-processing-and-multi-core-utilization-with-java/
http://docs.oracle.com/cd/E26576_01/doc.312/e24936/tuning-java.htm
http://www.slideshare.net/indicthreads/optimizing-your-java-applications-for-multi-core-hardware
http://stackoverflow.com/questions/1898374/does-the-jvm-create-a-mutex-for-every-object-in-order-to-implement-the-synchron





Friday Sep 20, 2013

Apache Axis- Thread Safety Issues

Axis has been a very popular Java library for implementing soap clients. Unfortunately Axis 1.3 and Axis 1.4 have serious thread safety issues, which can even mix data from one client with another. These issues may not arise in low load conditions, however I have experienced some of these in production environment and had to find a workaround to get past the issue.  In this blog, I will try to list down the issues I have seen and the reason and solution to those issues.

1) Data of one soap response getting mixed with another response

In my case this was due to sharing a single instance of Axis stub across all threads or requests. This is a big no-no. Axis stub is not thread safe. However the documentation says the Axis service locator or factory is thread safe, which means you can use a single instance of Service Locator and create new instance of stub for each request. However creating new stub instance may have a small overhead. In our load testing this was not very significant. If you are concerned about this, you can use a stub pool (Apache generic Object Pool http://commons.apache.org/proper/commons-pool/)to reuse instances.

Most of the thread safety issues are either in sharing the stub in application code(bad practice!) or in the stub creation code in Axis. Good thing about using pool is, it will take care of both these issues.

2) Another scenario was CPU going 100% from famous hash map resize. (see http://mailinator.blogspot.in/2009/06/beautiful-race-condition.html). In this case, we were creating new stubs for each request. Unlike the first scenario, we could not reproduce it. The unsynchronized HashMap was called from the stub creation code in Axis. We switched to object pool instead of creating stub on each request to be safe agaist this issue.

See bugs
https://issues.apache.org/jira/browse/AXIS-2498
https://issues.apache.org/jira/browse/AXIS-2800

Thursday Aug 22, 2013

Migrating RestEasy+Spring application to JBoss-AS7 /WildFly


If you are tryig to migrate a restEasy+Spring web application from JBoss-6 to JBoss-7 or the new WildFly server, you may come across some roadblocks. One of the major change in AS-7 is the new modular classloading instead of the traditional hierarchical classloading.
So if you are migrating your webapp war file, which has resteasy and spring jars in WEB-INF/lib folder, you may find many classloading errors during server startup and your application may not deploy. AS-& and Wildfly comes packaged with resteasy jars. So there is no need to package them again in your application. You can upgrade the resteasy modules in AS-7 if they are not the right version you are looking for. Spring jars are not packaged with JBoss, you can include them in your application.
  • Spring jars in WEBINF/lib
RestEasy servlet resistration in web.xml also changes. If you are not using Spring, you can use the following snippen in web.xml to get RestEasy goig in AS-7.
<servlet-mapping>
<servlet-name>javax.ws.rs.core.Application</servlet-name>
<url-pattern>/rest/*</url-pattern>
</servlet-mapping>
That is all it takes. No need to register Bootstrap listener, Dispatcher servlet etc.
If you are using RestEasy on top of Spring beans, you need to include the SpringContextLoaderListener to register annotated Spring beans as resources. Here is the configuration
<listener>
<listener-class>org.jboss.resteasy.plugins.server.servlet.ResteasyBootstrap</listener-class>
</listener>
<listener>
<listener-class>org.jboss.resteasy.plugins.spring.SpringContextLoaderListener</listener-class>
</listener>
<servlet-mapping>
<servlet-name>javax.ws.rs.core.Application</servlet-name>
<url-pattern>/rest/*</url-pattern>
</servlet-mapping>
Hope this helps you get going quickly.

Wednesday Oct 17, 2012

HashMap vs ConcurrentHashMap vs Synchronized HashMap vs Hashtable

Discuss the differences between HashMap, ConcurrentHashMap, Synchronized HashMap, and Hashtable

[Read More]

Java Performance: Collections

Java collections offer various general-purpose  data structures. However, it is important to understand which collection can be used under what circumstances and their differences, considering the speed, concurrency, memory footprint, data size, throughput, etc. This article discusses various List, Map and Set collections, their concurrent versions, and libraries for general, scientific and big data sets.
[Read More]

Tuesday Oct 16, 2012

Linked Data, Social Graph and RDF

This digest aggregates various readings on Linked Data and Semantic Web topic and my writings on the same
[Read More]

Monday Oct 15, 2012

Java Performance: Synchronization and Parallel Processing

This article discusses various aspects of synchronization and its effect on scalability and parallel processing. This also suggests some strategies to deal with synchronization in general. [Read More]

Sunday Oct 07, 2012

Cloud Scalability & Performance- Blog Digest

Digest on blogs about performance and scalability and the personalities driving the thoughts

[Read More]

Saturday Oct 06, 2012

No servlet class has been specified for servlet template & metadata-complete

Discuss the metadata-complete=true attribute in servlet 2.5 and its implication on server startup time. See the web.xml configuration where this attribute can cause "No servlet class has been specified for servlet template" error in Tomcat.

[Read More]