1. Darwinex Knowledge Base
  2. Algorithmic Trading
  3. DARWIN Data Analytics for Algorithmic Trading

Raw DARWIN Data User Guide

User guide to the raw DARWIN data offered through FTP.

Table of contents

  1. Main FTP Directory Structure
  2. Structure of the Quotes folder
  3. Charts
    1. AVG_LEVERAGE
    2. BADGES
    3. BEHAVIOR
    4. CLOSE_STRATEGY
    5. DAILY_FIXED_DIVERGENCE
    6. DAILY_REAL_DIVERGENCE
    7. DURATION_CONSISTENCY
    8. EXPERIENCE
    9. LOSING_CONSISTENCY
    10. LOSS_AVERSION
    11. LOSS_AVERSION_UNADJUSTED_VAR
    12. MARKET_CORRELATION
    13. MONTHLY_DIVERGENCE
    14. OPEN_STRATEGY
    15. ORDER_DIVERGENCE
    16. PERFORMANCE
    17. POSITIONS
    18. RETURN
    19. RETURN_DIVERGENCE
    20. RISK_ADJUSTMENT
    21. RISK_STABILITY
    22. ROTATION
    23. SCALABILITY
    24. TRADE_CONSISTENCY
    25. TRADE_LOSS_AVERSION
    26. TRADES
    27. TRADE_UNADJUSTED_LOSS_AVERSION
    28. WINNING_CONSISTENCY
    29. INVESTMENT_CHART
    30. INVESTORS_CHART

 

1. Main FTP Directory Structure

Upon entering the FTP root directory, you will find a folder for each DARWIN ever listed on the DARWIN Exchange, with the folder names corresponding to DARWIN ticker symbols (e.g. NTI, PLF, etc.).

Each of these folders contains two folders, one named "quotes" that holds the asset's most up-to-date Quote time series in tick precision. The other one, named "{DARWIN_TICKER}_former_var10", holds the data for the former version of the DARWIN.

You will also find up to 30 more time series datasets in a DARWIN’s folder, containing data used to construct graphs and charts on its DARWIN listing page, e.g. Open Trades, D-Leverage, DARWIN volatility vs EURUSD volatility, etc.

The folder with the data for the former Darwin version follows the exact same structure: a folder with the quote time series and 30 time series datasets.

There is an exception to this format, though. If a DARWIN was closed before the creation of most recent version of the DARWINs, only the former folder will appear inside the ticker folder but nothing regarding the current version.

2. Structure of the Quotes folder

Inside this folder exists another set of folders organized by year and month, following the naming convention “YYYY-MM”.

These contain a collection of gzip-compressed CSV files. These files have the following naming convention:

{DARWIN_TICKER}.{PRODUCT_RISK}.{COLOUR}_{PRODUCTID}_YYYY-MM-DD.HH.csv.gz

For example:

AGD.4.9_6112_2018-01-01.01.csv.gz

Inside each file, there will be several lines showing the quote values for several moments during that hour. The csv contains just 2 fields:

  1. Timestamp in milliseconds from epoch (UTC time zone)
  2. Quote value (rounded to the 4th decimal place)

3. Charts

There are 30 different charts available. It is possible that some DARWINs have no information regarding some of the charts. If that is the case, there will be no file for that chart. The names of the different files are identical to the names of the charts themselves.

The different charts and their structure are listed below.

a) AVG_LEVERAGE

Contains information about the DARWIN’s D-Leverage, i.e. the volatility of the DARWIN compared with that of the EURUSD. The last field is an array with a series of values. The size of this array can be up to 24 elements (one per hour). In practice taking the last one is enough. The included fields are:

  1. Timestamp in milliseconds from epoch (UTC time zone)
  2. Number of periods at the end of the day
  3. An array with a variable number of values for D-Leverage

b) BADGES

Contains the history of the DARWIN’s attribute scores. Each line has the following structure:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with the attribute scores in that moment. The attributes are ordered following the table below.
  4. The time when the calculation of the first attribute was made. It is a timestamp in milliseconds from epoch (UTC time zone).
  5. The time when the calculation of the last attribute was made. It is a timestamp in milliseconds from epoch (UTC time zone).

Attribute order

Web symbol Description
EX Experience
MC Market Correlation
RS Risk Stability
RA Risk Aversion
OS Open Strategy
CS Close Strategy
R+ Winning Consistency
R- Losing Consistency
DC Duration Consistency
LA Loss Aversion
PF Performance
CP Scalability
D-Score Darwinex Score

c) BEHAVIOR

This chart shows the amount of orders per hour. Field distribution is:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. [[Optional]] If the trader has traded that day, there will be an array with up to 24 elements containing the number of orders. The first element will contain the orders placed from 00:00 AM to 01:00 AM, the second one from 01:00 AM to 02:00 AM, and so on.

d) CLOSE_STRATEGY

A series of values showing the effect of closing a trade with a certain time difference compared to the actual closing time (expressed as a percentage). This one is analogous to the OPEN_STRATEGY chart. The line structure is as follows:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. [[Optional]] If the trader traded that day, an array with 12 elements. It contains a comparison of the closing time with doing it a certain amount of time before or after. The strategy ranks are added by incrementally changing the closing time. For example, if a position was closed a 20% later it will rank 3rd, 7th and 5th. So this strategy is 15/3 = 5th average position.
Position in array Variation Description
1 -- Number of closed trades during the day.
2 0% Sum of ranks on the real strategy.
3 -50% Sum of ranks closing a 50% before.
4 -40% Sum of ranks closing a 40% before.
5 -30% Sum of ranks closing a 30% before.
6 -20% Sum of ranks closing a 20% before.
7 -10% Sum of ranks closing a 10% before.
8 +10% Sum of ranks closing a 10% after.
9 +20% Sum of ranks closing a 20% after.
10 +30% Sum of ranks closing a 30% after.
11 +40% Sum of ranks closing a 40% after.
12 +50% Sum of ranks closing a 50% after.

e) DAILY_FIXED_DIVERGENCE

Analyzes the effect of applying a fixed divergence (10-5) on the profit. Each line is composed of 2 fields:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Profit difference.

f) DAILY_REAL_DIVERGENCE

Analyzes the effect of applying the investors divergence on the profit. Again each line is composed of 2 fields:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Profit difference.

g) DURATION_CONSISTENCY

A series of data for every single position. The structure goes as follows:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with data of the last 100 positions in that day. It could be an empty array if there are no trades. Each position contains:
    1. The start time of the position expressed in milliseconds.
    2. The lifespan of the position in milliseconds.
    3. Gross profit of the position.
    4. The VaR at the beginning of the day of that position.
    5. The leverage at the beginning of the position.

h) EXPERIENCE

Contains information about D-periods for each day. The structure is:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of D-periods at the end of the day.
  3. An array with
    1. Number of days in that period.
    2. Number of decisions in that period.

i) LOSING_CONSISTENCY

Analogous to the DURATION_CONSISTENCY chart, this time with losing trades and limited to 50 trades. Contents are:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with data about the last 50 losing positions in that day. It could be an empty array if there are no trades. Each position has:
    1. Start time of the position expressed in milliseconds from epoch.
    2. Position lifespan in milliseconds.
    3. Gross profit of the position.
    4. VaR at the start of the day on that position.
    5. Leverage at the start of the position. 

j) LOSS_AVERSION

Contains information of trades grouped by day and different profits: maximum, minimum and closed. There is a twin chart for traders (LOSS_AVERSION_UNADJUSTED_VAR) with the only change of not adjusting the size of trades by VaR. The content structure is:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with date for each trade in the day:
    1. Instrument Id.
    2. Trade start date in milliseconds.
    3. Max profit of trade.
    4. Min profit of trade.
    5. Real profit of trade when it was closed.
    6. Trade still open (1 means “yes”, 0 means “no”)

k) LOSS_AVERSION_UNADJUSTED_VAR

Just like the previous one, only this time the size of the trades is not adjusted by VaR. The contents are also equal to the LOSS_AVERSION chart:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with date for each trade in the day:
    1. Instrument Id.
    2. Trade start date in milliseconds.
    3. Max profit of trade.
    4. Min profit of trade.
    5. Real profit of trade when it was closed.
    6. Trade still open (1 means “yes”, 0 means “no”)

l) MARKET_CORRELATION

Analyzes the correlation between the trader’s strategy with the market evolution in different durations, namely 3, 6 and 12 D-periods. It has the following format:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with different data for each period:
    1. 3 Periods.
    2. 6 Periods.
    3. 12 Periods.
      Each of those positions has another array with information regarding that time span:
      1. Instrument Id.
      2. Correlation between market and trader for that instrument.
      3. Instrument’s weight towards total position.
      4. Number of consecutive long positions.
      5. Number of consecutive short positions.
      6. Ratio between time with long positions and time of the whole day.
      7. Ratio between time with short positions and time of the whole day.
      8. Time percentage, weighted with leverage of long positions.
      9. Time percentage, weighted with leverage of short positions.
      10. Time percentage, weighted with leverage of closed positions.

m) MONTHLY_DIVERGENCE

Contains data regarding the average and monthly divergence. Does not contain information about periods:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Average divergence.
  3. Monthly divergence.

n) OPEN_STRATEGY

Similar to the CLOSE_STRATEGY chart, this is a series of values showing the effect of opening a trade with a certain time difference (expressed in percentage). Each line goes as follows:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. [[Optional]] If the trader traded that day, an array with 12 elements. It contains a comparison of the opening time with doing it a certain amount of time before or after. The strategy ranks are added changing percentually the opening time. For example, if a position was closed a 20% later it will rank 3rd, 7th and 5th. So this strategy is 15/3 = 5th average position.
Position in array Variation Description
1 -- Number of open trades during the day.
2 0% Sum of ranks on the real strategy.
3 -50% Sum of ranks opening 50% before.
4 -40% Sum of ranks opening 40% before.
5 -30% Sum of ranks opening 30% before.
6 -20% Sum of ranks opening 20% before.
7 -10% Sum of ranks opening 10% before.
8 +10% Sum of ranks opening 10% after.
9 +20% Sum of ranks opening 20% after.
10 +30% Sum of ranks opening 30% after.
11 +40% Sum of ranks opening 40% after.
12 +50% Sum of ranks opening 50% after.

o) ORDER_DIVERGENCE

Offers information about detailed divergence per order. It is a set of arrays with the following structure:

  1. Timestamp in milliseconds from epoch. It is the time when the DARWIN places an order for investors.
  2. Instrument ID.
  3. Investors volume in USD.
  4. Latency.
  5. Divergence experienced by investors on this order.

p) PERFORMANCE

This one contains data about the DARWIN’s profit compared to that of several random strategies, simulated during several time ranges: 3 periods, 6 periods and 12 periods. The data contains:

  1. Timestamp in milliseconds from epoch. It is the time when the DARWIN places an order for investors.
  2. Number of periods.
  3. An array with the following data for the day.
    1. Number of positions in the day.
    2. An array with data for 3 periods.
    3. An array with data for 6 periods.
    4. An array with data for 12 periods.
      All of the arrays above hold the following information:
      1. The percentile where the current strategy belongs.
      2. Profit in the current strategy time window.
      3. Simulation profits. This is also an array with elements for each percentile.
        1. Profit in percentile 99
        2. Profit in percentile 95
        3. Profit in percentile 90
        4. Profit in percentile 80
        5. Profit in percentile 70
        6. Profit in percentile 60
        7. Profit in percentile 50
        8. Profit in percentile 40
        9. Profit in percentile 30
        10. Profit in percentile 20
        11. Profit in percentile 10
        12. Profit in percentile 5
        13. Profit in percentile 1
      4. Activity (This field is usually null): The number of days with activity and without anomalies.

q) POSITIONS

Offers information about positions every day:

  1. Timestamp in milliseconds from epoch. It is the time when the DARWIN places an order for investors.
  2. Number of periods
  3. An array with aggregated data in the day per open instruments in that moment. That means it could have an array for “EURUSD+GBPJPY+XAUUSD” and another one only for “EURUSD”.
    1. Array with the following:
      1. Instrument ID
      2. Number of positions
      3. Number of winning positions
      4. Number of losing positions
      5. Performance of winning positions
      6. Performance of losing positions
      7. Duration of winning positions
      8. Duration of losing positions
    2. Total position number
    3. Max number of open trades

r) RETURN

This chart contains information regarding profitability. The structure has the following pattern:


  1. Timestamp in milliseconds from epoch. It is the time when the DARWIN places an order for investors.
  2. Number of periods
  3. An array with profitability info in key moments:
    1. First return of the day
    2. From position 2 up to N-1 → [Optional] A series of points used to calculate drawdown.
    3. [Optional] In the last position of the array we will find the last return of the day.

s) RETURN_DIVERGENCE

Here you will find information regarding the DARWIN’s divergence. It contains:

  1. Timestamp in milliseconds from epoch. It is the time when the DARWIN places an order for investors.
  2. DARWIN’s quote in this moment.
  3. DARWIN's quote after applying real divergence.

t) RISK_ADJUSTMENT

Contains information about the amount of risk adjustment applied to the strategy. The structure:

  1. Timestamp in milliseconds from epoch. It is the time when the DARWIN places an order for investors.
  2. Number of periods
  3. An array with risk adjustment information. Each element contains
    1. Default Leverage Target. It can be null if it is not available.
    2. Normalized daily VaR.
    3. A set of data for each position:
      1. Time of the adjustment in milliseconds
      2. Duration
      3. D-Leverage
      4. VaR adjustment
      5. Number of simultaneous trades
      6. Position profitLeverage on the day it was opened

u) RISK_STABILITY

Information about the VaR variability or stability within the strategy. The chart includes:

  1. Timestamp in milliseconds from epoch. It is the time of the DARWIN actuation.
  2. Number of periods.
  3. An array with VaR information. Each position contains:
    1. Current VaR.
    2. Max VaR.
    3. Min VaR.
    4. Normalized current VaR.
    5. Normalized max VaR.
    6. Normalized min VaR.
    7. Return.

v) ROTATION

Contains information about daily rotation. Contains a value for each day with the following data:

  1. Timestamp in milliseconds from epoch. It is the time of the strategy actuation.
  2. Number of periods
  3. Daily rotation

w) SCALABILITY

This chart offers data for several capacity graphs shown on the web, structured as follows:

  1. Timestamp in milliseconds from epoch. It is the time of the strategy actuation.
  2. Number of periods
  3. An array with scalability information. Each position has:
    1. An array containing how the quotes would look like with some hypothetical spread increases:
      1. Original quote value
      2. 0.00002 ( +0.2 spread pips)
      3. 0.00005 ( +0.5 spread pips)
      4. 0.0001 ( +1.0 spread pips)
      5. 0.0002 ( +2.0 spread pips)
    2. Max allowed divergence. That means, the spread that, when added, makes the investor’s profit lower than 98.5% the DARWIN’s profit.
    3. Average leverage per trade.
    4. Volume without leverage.

x) TRADE_CONSISTENCY

Shows information regarding the results of a group of trades per day.

  1. Timestamp in milliseconds from epoch. It is the time when the underlying strategy trades.
  2. Number of periods
  3. An array showing information about the last 100 closed trades during the day, with the following data per trade:
    1. Open time.
    2. Duration.
    3. Return.
    4. VaR.
    5. Leverage.

y) TRADE_LOSS_AVERSION

Shows how the profits or losses would have changed in the best and worst case scenario for a DARWIN. There is a twin chart TRADE_UNADJUSTED_LOSS_AVERSION that does not take into account the VaR and leverage adjustments.

  1. Timestamp in milliseconds from epoch. It is the time when the underlying strategy trades.
  2. Number of periods
  3. An array showing information about the last 100 closed trades during the day, with the following data per trade:
    1. Instrument ID.
    2. Start time in milliseconds.
    3. Max benefit the trader could have gotten if the trade had been close before.
    4. Min benefit the trader could have gotten if the trada had been closed before.
    5. The actual benefit the trade got if it was closed or the benefit it would get right now if it is still open.
    6. “true” if the trade is still open, “null” otherwise.

z) TRADES

Gives information about the result of closed trades, aggregated by instrument:

  1. Timestamp in milliseconds from epoch. It is the time of the strategy’s trades.
  2. Number of periods.
  3. An array with aggregated information about trades grouped by instrument.
    1. Instrument ID.
    2. Number of trades.
    3. Number of winning trades.
    4. Number of losing trades.
    5. Profitability of winning trades (will be 1 if there are no trades).
    6. Profitability of losing trades (will be 1 if there are no trades).
    7. Duration of winning trades (will be 0 if there are no trades).
    8. Duration of losing trades (will be 0 if there are no trades)

aa) TRADE_UNADJUSTED_LOSS_AVERSION

Similar to TRADE_LOSS_AVERSION but applied to the underlying strategy instead of the DARWIN. So this chart uses trades without VaR or leverage adjustments. It shows how the profits or losses of certain trades would have changed in the best and worst moments to close. It contains information for up to 100 trades per day:

  1. Timestamp in milliseconds from epoch. It is the time of the strategy’s trades.
  2. Number of periods
  3. An array showing information about the last 100 closed trades during the day, with the following data per trade:
    1. Instrument ID.
    2. Start time in milliseconds.
    3. Max benefit the trader could have gotten if the trade had been close before.
    4. Min benefit the trader could have gotten if the trada had been closed before.
    5. The actual benefit the trade got if it was closed or the benefit it would get right now if it is still open.
    6. “true” if the trade is still open, “null” otherwise.

ab) WINNING_CONSISTENCY

Analogous to the LOSING_CONSISTENCY chart, but this time using just winning trades and limited to 50 trades. Contents are also equal to the other chart:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Number of periods at the end of the day.
  3. An array with data about the last 50 winning positions in that day. It could be an empty array if there are no operations. Each position has:
    1. Start time of the position expressed in milliseconds from epoch.
    2. Position lifespan in milliseconds.
    3. Gross profit of the position.
    4. VaR at the start of the day on that position.
    5. Leverage at the start of the position. 

ac) INVESTMENT_CHART

This chart shows information about how much money has been invested into the DARWIN. The contents are very simple:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Amount of money invested.

ad) INVESTORS_CHART

This one is similar to the INVESTMENT_CHART. This time instead of focusing on how much money has been invested, we show how many investors have invested money in the DARWIN. Again very simple contents:

  1. Timestamp in milliseconds from epoch (UTC time zone).
  2. Amount of investors with investment in the DARWIN.