Tools We Use

Website:  http://www.h2o.ai

H2O is the world’s leading open source machine learning platform. H2O is used by over 60,000 data scientists and more than 7,000 organizations around the world.

H2O.ai emerged in 2011 from a grassroots culture of data transformation. H20’s goal was to democratize data science by making scalable machine learning open source and accessible to everyone. With H2O, the main product, a plethora of machine learning models (from linear models to tree-based ensemble methods to Deep Learning) can be trained from R, Python, Java, Scala, JSON, H2O’s Flow GUI, or the REST API, on laptops or servers running Windows, Mac or Linux, in the cloud or on premise, on clusters of up to hundreds of nodes, on top of Hadoop or with the Sparkling Water API for Apache Spark. H2O also features the fastest distributed data ingest and data munging capabilities, and a multitude of enterprise features such as security, authentication, model comparison and rapid deployment.

Today, H2O is used by over 60,000 data scientists and more than 7,000 organizations around the world.

 

Website:  https://hadoop.apache.org/

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

 

Website:  https://www.tensorflow.org/

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

 

Website:  https://www.python.org/

Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than possible in languages such as C++ or Java. The language provides constructs intended to enable writing clear programs on both a small and large scale.  Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It features a dynamic type system and automatic memory management and has a large and comprehensive standard library. Python interpreters are available for many operating systems, allowing Python code to run on a wide variety of systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation.  Source: Wikipedia/Python