Guide - Data Science with F#
Data science is the application of statistical analysis, machine learning, data visualization and programming to real-world data sources to bring understanding and insight to data-oriented problem domains. F# is an excellent solution for programmatic data science as it combines efficient execution, REPL-scripting, powerful libraries and scalable data integration.
To contribute to this guide edit this page. These resources are for educational purposes. The F# Software Foundation does not endorse any commercial products, processes, or services.
- Jupyter Notebooks
- Integrated Packages
- Interactive Charting
- Individual Packages
- Commercial packages
.NET Interactive provides data scientists and developers a way to explore data, experiment with code, and try new ideas effortlessly using .NET Core. Use .NET Interactive to build .NET Jupyter notebooks or custom interactive coding experiences.
IfSharp implements F# for Jupyter notebooks. On Linux, Mono is used as the .NET implementation.
SciSharp Stack - A .NET based Open Source Ecosystem for Data Science, Machine Learning and AI.
SciSharp provides ports and bindings to cutting edge Machine Learning frameworks like TensorFlow, Keras, PyTorch, Numpy and many more in .NET Core. Since the APIs of the ported libraries are so similar to the originals you can easily re-use all existing resources, documentation and community solutions to common problems in C# or F# without much effort.
License: Various, mostly Apache 2.0 or MIT
Math.NET Numerics - provides a large collection of algorithms needed in science and engineering, including linear algebra, special functions, statistics, probability models, interpolation and FFTs.
In addition to the core .NET package, Numerics specifically supports F# 4.0 with idiomatic extension modules and maintains mathematical data structures like BigRational that originated in the F# PowerPack. If a performance boost is needed, the managed-code provider backing its linear algebra routines and decompositions can be exchanged with wrappers for optimized native implementations such as Intel MKL. Supports Mono and .NET 4.0 on Linux, Mac and Windows. The portable version also SL5 and .NET for Windows Store apps.
ML.NET - ML.NET is an open source and cross-platform machine learning framework sponsored by Microsoft.
Encog Machine Learning Framework - An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks. See, for example, ENCOG Neural Network XOR example in F#
FsLab is an integrated, cross-platform collection of open source data science packages for F#, including FSharp.Data, Deedle, RProvider, Math.NET Numerics and more.
Accord.MachineLearning - Contains Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework. See also First steps with Accord.NET SVM in F#
FSharp.Plotly - a powerful and free charting library. FSharp.Plotly provides Plotly’s awesome graphing support with strongly typed style options for F#.
FSharp.Charting - an interactive charting library frequently used on Windows.
If a resource specific to F# can’t be found, then search for C# instead and adjust the technique appropriately.
TensorFlow.NET - .NET Standard bindings for Google’s TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
NumSharp - High Performance Computation for N-D Tensors in .NET, similar API to NumPy
Torch.NET - .NET bindings for PyTorch. Machine Learning with C# / F# with Multi-GPU/CPU support
SharpCV - A Computer Vision library combines OpenCV and NDArray together in .NET Standard.
MxNet.Sharp - .NET Standard bindings for Apache MxNet with Imperative, Symbolic and Gluon Interface for developing, training and deploying Machine Learning models in C# and F#.
DiffSharp - An automatic differentiation (AD) library for incorporating derivative calculations with minimal changes into existing code, providing exact and efficient gradients, Jacobians and Hessians for machine learning and optimization applications.
FsAlg - A lightweight linear algebra library that supports generic types.
The library provides generic Vector and Matrix types that support most of the commonly used linear algebra operations, including matrix–vector operations, matrix inverse, determinants, eigenvalues, LU and QR decompositions. Its intended use is to enable writing generic linear algebra code with custom numeric types. It can also be used as a lightweight library for prototyping and scripting with primitive floating point types.
Ariadne - Library for fitting Gaussian process regression models.
Numl - A machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering
Synapses - A lightweight Neural Network library, for js, jvm and .net.
Alea GPU - a framework for developing GPU-accelerated algorithms in F# on .NET and Mono.
Utilizing F# quotations and the LLVM compiler it is able to compile GPU kernels on-the-fly and schedule them on one or more nVidia GPU’s. Advanced GPU features such as textures and shared memory are supported. Available from Quantalea.
ILNumerics - an open- or closed-source library offering high- performance numerical algorithms as well as charting and plotting capabilities.
The library is based on efficient, general-purpose array classes implementing vectors, matrices, and n-dimensional arrays. Provided algorithms include standard linear algebra transforms, a high-performance Fast Fourier Transform (FFT) library, and a collection of sorting and machine learning algorithms. Plotting is based on OpenGL and supports both 2D and 3D plots. ILNumerics supports .NET 4.0 as well as Mono (recommend 2.10 or above).
License: GPLv3 or commercial (paid) license.
Extreme Optimization Numerical Libraries for .NET - a set of three libraries focused on vector and matrix processing, linear algebra methods, and statistics functions.
The library includes a large selection of standard algorithms from matrix factorization, function optimization, numerical integration, K-means clustering, and PCA (principal component analysis). Options are provided to run
using pure managed code for portability or to utilize highly tuned native code for additional performance. Extreme Optimization supports .NET 3.5 and 4.0 (2.0 version available) and execution on Mono.
NMath, NMath Stats - a suite providing core math and statistics functions.
NMath provides sparse- and dense-matrix manipulations, FFT algorithms, and numeric algorithms such as curve-fitting, integration, and differentiation. NMath Stats is built on NMath and provides statistics functions such multiple linear regression, hypothesis testing, and nonnegative matrix factorization. NMath and NMath Stats support .NET 4.5 and are available from CenterSpace Software.
F# for Numerics - a collection of numeric algorithms including matrix operations, optimization and interpolation functions, 1D and 2D FFTs, and pseudo-random number generation.
The library uses the standard F# PowerPack Matrix for compatibility. F# for Numerics supports .NET. The library is available from Flying Frog Consultancy.
F# for Visualization - a 2D and 3D vector graphics library with a native F# interface.
The package provides interactive plotting from within Visual Studio and support for generating animations. F# for Visualization supports .NET. The library is available from Flying Frog Consultancy.
Deedle is an easy-to-use, high quality package for data and time series manipulation and for scientific programming. It uses a design similar to the Pandas library from Python and the ‘tseries’ or ‘zoo’ packages in R, though with stronger typing. Deedle supports working with structured data frames, ordered and unordered data, as well as time series. Deedle is designed to work well for exploratory programming using F# and C# interactive console, but can be also used in efficient compiled .NET code.
Tutorials and Introductions
Introductions to different machine learning algorithms with F#:
- Understanding the world with F# (article)
- Understanding the world with F# (video)
- FSML - A machine learning project in F#
- Gaussian process regression in F#
- K-Means clustering in F#
- Simplify data with SVD and Math.NET in F#
- Recommendation Engine using Math.NET, SVD and F#
- Setting up F# Interactive for Machine Learning with Large Datasets
- Random Forests in F# - first cut
- Nearest Neighbor Classification, Part 1
- Nearest Neighbor Classification, Part 2
- Decision Tree Classification in F#
- Naïve Bayes Classification
- Logistic Regression in F#
- Support Vector Machine in F#: getting there
- AdaBoost in F#
- Support Vector Machines in F#
- Kaggle/StackOverflow contest field notes
- F# Data Mining
- Parallel Programming in F#: Aggregating Data:
- Particle Swarm Optimization in F#
- Hype - An experimental deep learning library, where you can perform optimization on compositional machine learning systems of many components, even when such components themselves internally perform optimization. Underlying computations are run by a BLAS/LAPACK backend (OpenBLAS by default).
F# and Excel
Excel-DNA is an independent project to integrate .NET into Excel. With Excel-DNA you can make native (.xll) add-ins for Excel using C#, Visual Basic.NET or F#, providing high-performance user-defined functions (UDFs), custom ribbon interfaces and more. Your entire add-in can be packed into a single .xll file requiring no installation or registration:
- Excel-DNA home pages
- Combining F# and Excel using Excel DNA
- Async and event-streaming Excel UDFs with F#
- Machine Learning with Excel: Combine the power of Excel, F# and R
NPOI is .NET version of POI Java project at http://poi.apache.org/. POI is an open source project which can help you read/write xls, doc, ppt files.
EPPlus is a .NET library that reads and writes Excel 2007/2010/2013 files using the Open Office XML format (xlsx)
ExcelPackageF is a simple F# wrapper over the EPPlus library.
Both NPOI and EPPLus manipulate the Open Office XML format directly so they do not require having Excel installed and do not use Interop. You can read, create, and edit Excel documents using this approach.
The F# Excel Type Provider is a prototypical F# type providerfor Excel that allows you to read Excel files using typed data provided by the type provider.
There are also some F# versions of Excel functions, useful when migrating code:
F# and R
- R Type Provider for F# - An F# type provider for high fidelity integration between F# and R
- R.NET - Core interoperability component used by the R Type Provider with some F# extensions.
- Using F#, R and GGPlot2
F# and MATLAB
- MATLAB Type Provider for F# - An F# type provider for higher-fidelity integration between F# and MATLAB
F# and Python
- Python for .NET - Allows Python to be integrated into F# and C# programs
- Tutorial: Charting with Gnuplot from F#
F# and Mathematica/Wolfram Language