Guide - Data Science with F#
Data science is the application of statistical analysis, machine learning, data visualization and programming to real-world data sources to bring understanding and insight to data-oriented problem domains. F# is an excellent solution for programmatic data science as it combines efficient execution, REPL-scripting, powerful libraries and scalable data integration.
To contribute to this guide edit this page. These resources are for educational purposes.
- Jupyter Notebooks
- Integrated Packages
- Interactive Charting
- Individual Packages
- Commercial packages
- Interoperability
Jupyter Notebooks
-
.NET Interactive provides data scientists and developers a way to explore data, experiment with code, and try new ideas effortlessly using .NET Core. Use .NET Interactive to build .NET Jupyter notebooks or custom interactive coding experiences.
Integrated Packages
-
FsLab is the F# Community Project Incubation Space For Data Science.
-
ML.NET - ML.NET is an open source and cross-platform machine learning framework sponsored by Microsoft. With ML.NET, you can create custom ML models using C# or F# without having to leave the .NET ecosystem. ML.NET lets you re-use all the knowledge, skills, code, and libraries you already have as a .NET developer so that you can easily integrate machine learning into your web, mobile, desktop, games, and IoT apps.
-
SciSharp Stack - A .NET based Open Source Ecosystem for Data Science, Machine Learning and AI. SciSharp provides ports and bindings to cutting edge Machine Learning frameworks like TensorFlow, Keras, PyTorch, Numpy and many more in .NET Core. Since the APIs of the ported libraries are so similar to the originals you can easily re-use all existing resources, documentation and community solutions to common problems in C# or F# without much effort. License: Various, mostly Apache 2.0 or MIT
Interactive Charting
-
XPlot - XPlot is a data visualization package for the F# programming language powered by popular JavaScript charting libraries. It uses Google and Plotly’s powerful and free data visualization libraries based on HTML5/SVG technology. You can access the HTML for the charts programatically and use the library from F# Interactive by displaying browser windows.
-
Plotly.NET - a powerful and free charting library. Plotly.NET provides Plotly’s awesome graphing support with strongly typed style options for F#.
Individual Packages
If a resource specific to F# can’t be found, then search for C# instead and adjust the technique appropriately.
-
Math.NET Numerics - provides a large collection of algorithms needed in science and engineering, including linear algebra, special functions, statistics, probability models, interpolation and FFTs. In addition to the core .NET package, Numerics specifically supports F# 4.0 with idiomatic extension modules and maintains mathematical data structures like BigRational that originated in the F# PowerPack. If a performance boost is needed, the managed-code provider backing its linear algebra routines and decompositions can be exchanged with wrappers for optimized native implementations such as Intel MKL. License: MIT/X11
-
TensorFlow.NET - .NET Standard bindings for Google’s TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
-
TorchSharp - .NET bindings for PyTorch. Machine Learning with C# / F# with Multi-GPU/CPU support
-
DiffSharp - An automatic differentiation (AD) library for incorporating derivative calculations with minimal changes into existing code, providing exact and efficient gradients, Jacobians and Hessians for machine learning and optimization applications.
-
SharpCV - A Computer Vision library combines OpenCV and NDArray together in .NET Standard.
-
MxNet.Sharp - .NET Standard bindings for Apache MxNet with Imperative, Symbolic and Gluon Interface for developing, training and deploying Machine Learning models in C# and F#.
-
FsAlg - A lightweight linear algebra library that supports generic types.
The library provides generic Vector and Matrix types that support most of the commonly used linear algebra operations, including matrix–vector operations, matrix inverse, determinants, eigenvalues, LU and QR decompositions. Its intended use is to enable writing generic linear algebra code with custom numeric types. It can also be used as a lightweight library for prototyping and scripting with primitive floating point types.
-
Ariadne - Library for fitting Gaussian process regression models.
-
Numl - A machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering
-
Deedle is an easy-to-use, high quality package for data and time series manipulation and for scientific programming. It uses a design similar to the Pandas library from Python and the ‘tseries’ or ‘zoo’ packages in R, though with stronger typing. Deedle supports working with structured data frames, ordered and unordered data, as well as time series. Deedle is designed to work well for exploratory programming using F# and C# interactive console, but can be also used in efficient compiled .NET code.
Commercial packages
-
ILNumerics - an open- or closed-source library offering high- performance numerical algorithms as well as charting and plotting capabilities.
The library is based on efficient, general-purpose array classes implementing vectors, matrices, and n-dimensional arrays. Provided algorithms include standard linear algebra transforms, a high-performance Fast Fourier Transform (FFT) library, and a collection of sorting and machine learning algorithms. Plotting is based on OpenGL and supports both 2D and 3D plots.
License: GPLv3 or commercial (paid) license.
-
Extreme Optimization Numerical Libraries for .NET - a set of three libraries focused on vector and matrix processing, linear algebra methods, and statistics functions.
The library includes a large selection of standard algorithms from matrix factorization, function optimization, numerical integration, K-means clustering, and PCA (principal component analysis). Options are provided to run
using pure managed code for portability or to utilize highly tuned native code for additional performance. -
NMath, NMath Stats - a suite providing core math and statistics functions.
NMath provides sparse- and dense-matrix manipulations, FFT algorithms, and numeric algorithms such as curve-fitting, integration, and differentiation. NMath Stats is built on NMath and provides statistics functions such multiple linear regression, hypothesis testing, and nonnegative matrix factorization. NMath and NMath Stats support .NET 4.5 and are available from CenterSpace Software.
-
F# for Numerics - a collection of numeric algorithms including matrix operations, optimization and interpolation functions, 1D and 2D FFTs, and pseudo-random number generation.
The library uses the standard F# PowerPack Matrix for compatibility. F# for Numerics supports .NET. The library is available from Flying Frog Consultancy.
-
F# for Visualization - a 2D and 3D vector graphics library with a native F# interface.
The package provides interactive plotting from within Visual Studio and support for generating animations. F# for Visualization supports .NET. The library is available from Flying Frog Consultancy.
Interoperability
F# and Excel
Excel-DNA is an independent project to integrate .NET into Excel. With Excel-DNA you can make native (.xll) add-ins for Excel using C#, Visual Basic.NET or F#, providing high-performance user-defined functions (UDFs), custom ribbon interfaces and more. Your entire add-in can be packed into a single .xll file requiring no installation or registration:
- Excel-DNA home pages
- Async and event-streaming Excel UDFs with F#
- Machine Learning with Excel: Combine the power of Excel, F# and R
Sharp Cells is another independent project which integrates F# scripting with Excel. It exposes the scripts as either user-defined functions (UDFs) using Excel’s XLL API or commands using Excel’s COM API. Compilation takes place at runtime allowing rapid iteration of your code and the scripts are embedded with the workbook maintaining single-file portability similar to VBA.
- Getting started
- Working with asynchronous calculations
- Integration with AngouriMath to perform symbolic algebra
NPOI is .NET version of POI Java project at http://poi.apache.org/. POI is an open source project which can help you read/write xls, doc, ppt files.
NPOI manipulates the Open Office XML format directly so does not require having Excel installed and do not use Interop. You can read, create, and edit Excel documents using this approach.
There are also some F# versions of Excel functions, useful when migrating code:
F# and R
Resources:
- R Type Provider for F# - An F# type provider for high fidelity integration between F# and R
- R.NET - Core interoperability component used by the R Type Provider with some F# extensions.
- Using F#, R and GGPlot2
F# and MATLAB
Resources:
- MATLAB Type Provider for F# - An F# type provider for higher-fidelity integration between F# and MATLAB
F# and Python
Resources:
- Python for .NET - Allows Python to be integrated into F# and C# programs
- Tutorial: Charting with Gnuplot from F#
F# and Mathematica/Wolfram Language
Resources:
-
Calling Mathematica from F# - techniques to call Mathematica from C#, F# and other .NET languages
-
Calling Wolfram Language from F# - techniques to call Mathematica from .NET