Writing a Foreign Data Wrapper

行是知之始,知是行之成。这篇文章主要讲述Writing a Foreign Data Wrapper相关的知识,希望能为你提供帮助。
转自:greenplum 官方的一片文档  https://gpdb.docs.pivotal.io/6-0/admin_guide/external/g-devel-fdw.html  pg 是类似的This chapter outlines how to write a new foreign-data wrapper.
All operations on a foreign table are handled through its foreign-data wrapper (FDW), a library that consists of a set of functions that the core Greenplum Database server calls. The foreign-data wrapper is responsible for fetching data from the remote data store and returning it to the Greenplum Database executor. If updating foreign-data is supported, the wrapper must handle that, too.
The foreign-data wrappers included in the Greenplum Database open source github repository are good references when trying to write your own. You may want to examine the  file_fdw  and  postgres_fdw  modules in the  contrib/  directory. The  CREATE FOREIGN DATA WRAPPER  reference page also provides some useful details.
Note:  The SQL standard specifies an interface for writing foreign-data wrappers. Greenplum Database does not implement that API, however, because the effort to accommodate it into Greenplum would be large, and the standard API hasn‘t yet gained wide adoption.This topic includes the following sections:

  • Requirements
  • Known Issues and Limitations
  • Header Files
  • Foreign Data Wrapper Functions
  • Foreign Data Wrapper Callback Functions
  • Foreign Data Wrapper Helper Functions
  • Greenplum Database Considerations
  • Building a Foreign Data Wrapper Extension with PGXS
  • Deployment Considerations
Parent topic:  Accessing External Data with Foreign TablesRequirementsWhen you develop with the Greenplum Database foreign-data wrapper API:
  • You must develop your code on a system with the same hardware and software architecture as that of your Greenplum Database hosts.
  • Your code must be written in a compiled language such as C, using the version-1 interface. For details on C language calling conventions and dynamic loading, refer to  C Language Functions  in the PostgreSQL documentation.
  • Symbol names in your object files must not conflict with each other nor with symbols defined in the Greenplum Database server. You must rename your functions or variables if you get error messages to this effect.
  • Review the foreign table introduction described in  Accessing External Data with Foreign Tables.
Known Issues and LimitationsThe Greenplum Database 6 foreign-data wrapper implementation has the following known issues and limitations:
  • The Greenplum Database 6 distribution does not install any foreign data wrappers.
  • Greenplum Database uses the  mpp_execute  option value for foreign table scans only. Greenplum does not honor the  mpp_execute  setting when you write to, or update, a foreign table; all write operations are initiated through the master.
Header FilesThe Greenplum Database header files that you may use when you develop a foreign-data wrapper are located in the  greenplum-db/src/include/  directory (when developing against the Greenplum Database open source github repository), or installed in the  $GPHOME/include/postgresql/server/  directory (when developing against a Greenplum installation):
  • foreign/fdwapi.h - FDW API structures and callback function signatures
  • foreign/foreign.h - foreign-data wrapper helper structs and functions
  • catalog/pg_foreign_table.h - foreign table definition
  • catalog/pg_foreign_server.h - foreign server definition
Your FDW code may also be dependent on header files and libraries required to access the remote data store.
Foreign Data Wrapper FunctionsThe developer of a foreign-data wrapper must implement an SQL-invokable  handler  function, and optionally an SQL-invokable  validator  function. Both functions must be written in a compiled language such as C, using the version-1 interface.
The  handler  function simply returns a struct of function pointers to callback functions that will be called by the Greenplum Database planner, executor, and various maintenance commands. The  handler  function must be registered with Greenplum Database as taking no arguments and returning the special pseudo-type  fdw_handler. For example:
CREATE FUNCTION NEW_fdw_handler() RETURNS fdw_handler AS ‘MODULE_PATHNAME‘ LANGUAGE C STRICT;

Most of the effort in writing a foreign-data wrapper is in implementing the callback functions. The FDW API callback functions, plain C functions that are not visible or callable at the SQL level, are described in  Foreign Data Wrapper Callback Functions.
The  validator  function is responsible for validating options provided in  CREATE  and  ALTER  commands for its foreign-data wrapper, as well as foreign servers, user mappings, and foreign tables using the wrapper. The  validator  function must be registered as taking two arguments, a text array containing the options to be validated, and an OID representing the type of object with which the options are associated. For example:
CREATE FUNCTION NEW_fdw_validator( text[], oid ) RETURNS void AS ‘MODULE_PATHNAME‘ LANGUAGE C STRICT;

【Writing a Foreign Data Wrapper】The OID argument reflects the type of the system catalog that the object would be stored in, one of  ForeignDataWrapperRelationId,  ForeignServerRelationId,  UserMappingRelationId, or  ForeignTableRelationId. If no  validator  function is supplied by a foreign data wrapper, Greenplum Database does not check option validity at object creation time or object alteration time.
Foreign Data Wrapper Callback FunctionsThe foreign-data wrapper API defines callback functions that Greenplum Database invokes when scanning and updating foreign tables. The API also includes callbacks for performing explain and analyze operations on a foreign table.
The  handler  function of a foreign-data wrapper returns a  palloc‘d  FdwRoutine  struct containing pointers to callback functions described below. The  FdwRoutine  struct is located in the  foreign/fdwapi.h  header file, and is defined as follows:
/* * FdwRoutine is the struct returned by a foreign-data wrapper‘s handler * function.It provides pointers to the callback functions needed by the * planner and executor. * * More function pointers are likely to be added in the future.Therefore * it‘s recommended that the handler initialize the struct with * makeNode(FdwRoutine) so that all fields are set to NULL.This will * ensure that no fields are accidentally left undefined. */ typedef struct FdwRoutine { NodeTagtype; /* Functions for scanning foreign tables */ GetForeignRelSize_function GetForeignRelSize; GetForeignPaths_function GetForeignPaths; GetForeignPlan_function GetForeignPlan; BeginForeignScan_function BeginForeignScan; IterateForeignScan_function IterateForeignScan; ReScanForeignScan_function ReScanForeignScan; EndForeignScan_function EndForeignScan; /* * Remaining functions are optional.Set the pointer to NULL for any that * are not provided. */ /* Functions for updating foreign tables */ AddForeignUpdateTargets_function AddForeignUpdateTargets; PlanForeignModify_function PlanForeignModify; BeginForeignModify_function BeginForeignModify; ExecForeignInsert_function ExecForeignInsert; ExecForeignUpdate_function ExecForeignUpdate; ExecForeignDelete_function ExecForeignDelete; EndForeignModify_function EndForeignModify; IsForeignRelUpdatable_function IsForeignRelUpdatable; /* Support functions for EXPLAIN */ ExplainForeignScan_function ExplainForeignScan; ExplainForeignModify_function ExplainForeignModify; /* Support functions for ANALYZE */ AnalyzeForeignTable_function AnalyzeForeignTable; } FdwRoutine;

You must implement the scan-related functions in your foreign-data wrapper; implementing the other callback functions is optional.
Scan-related callback functions include:
Callback SignatureDescription
void GetForeignRelSize (PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid)

Obtain relation size estimates for a foreign table. Called at the beginning of planning for a query on a foreign table.
void GetForeignPaths (PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid)

Create possible access paths for a scan on a foreign table. Called during query planning. Note:  A Greenplum Database-compatible FDW must call  create_foreignscan_path()  in its  GetForeignPaths()  callback function.
ForeignScan * GetForeignPlan (PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid, ForeignPath *best_path, List *tlist, List *scan_clauses)

Create a  ForeignScan  plan node from the selected foreign access path. Called at the end of query planning.
void BeginForeignScan (ForeignScanState *node, int eflags)

Begin executing a foreign scan. Called during executor startup.
TupleTableSlot * IterateForeignScan (ForeignScanState *node)

Fetch one row from the foreign source, returning it in a tuple table slot; return NULL if no more rows are available.
void ReScanForeignScan (ForeignScanState *node)

Restart the scan from the beginning.
void EndForeignScan (ForeignScanState *node)

End the scan and release resources.
Refer to  Foreign Data Wrapper Callback Routines  in the PostgreSQL documentation for detailed information about the inputs and outputs of the FDW callback functions.
Foreign Data Wrapper Helper FunctionsThe FDW API exports several helper functions from the Greenplum Database core server so that authors of foreign-data wrappers have easy access to attributes of FDW-related objects, such as options provided when the user creates or alters the foreign-data wrapper, server, or foreign table. To use these helper functions, you must include  foreign.h  header file in your source file:
#include "foreign/foreign.h"

The FDW API includes the helper functions listed in the table below. Refer to  Foreign Data Wrapper Helper Functions  in the PostgreSQL documentation for more information about these functions.
Helper SignatureDescription
ForeignDataWrapper * GetForeignDataWrapper(Oid fdwid);

Returns the  ForeignDataWrapper  object for the foreign-data wrapper with the given OID.
ForeignDataWrapper * GetForeignDataWrapperByName(const char *name, bool missing_ok);

Returns the  ForeignDataWrapper  object for the foreign-data wrapper with the given name.
ForeignServer * GetForeignServer(Oid serverid);

Returns the  ForeignServer  object for the foreign server with the given OID.
ForeignServer * GetForeignServerByName(const char *name, bool missing_ok);

Returns the  ForeignServer  object for the foreign server with the given name.
UserMapping * GetUserMapping(Oid userid, Oid serverid);

Returns the  UserMapping  object for the user mapping of the given role on the given server.
ForeignTable * GetForeignTable(Oid relid);

Returns the  ForeignTable  object for the foreign table with the given OID.
List * GetForeignColumnOptions(Oid relid, AttrNumber attnum);

Returns the per-column FDW options for the column with the given foreign table OID and attribute number.
Greenplum Database ConsiderationsA Greenplum Database user can specify the  mpp_execute  option when they create or alter a foreign table, foreign server, or foreign data wrapper. A Greenplum Database-compatible foreign-data wrapper examines the  mpp_execute  option value on a scan and uses it to determine where to request data - from the  master  (the default),  any  (master or any one segment), or  all  segments.
Note:  Write/update operations using a foreign data wrapper are always executed on the Greenplum Database master, regardless of the  mpp_execute  setting.The following scan code snippet probes the  mpp_execute  value associated with a foreign table:
ForeignTable *table = GetForeignTable(foreigntableid); if (table-> exec_location == FTEXECLOCATION_ALL_SEGMENTS) { ... } else if (table-> exec_location == FTEXECLOCATION_ANY) { ... } else if (table-> exec_location == FTEXECLOCATION_MASTER) { ... }

If the foreign table was not created with an  mpp_execute  option setting, the  mpp_execute  setting of the foreign server, and then the foreign data wrapper, is probed and used. If none of the foreign-data-related objects has an  mpp_execute  setting, the default setting is  master.
If a foreign-data wrapper supports  mpp_execute ‘all‘, it will implement a policy that matches Greenplum segments to data. So as not to duplicate data retrieved from the remote, the FDW on each segment must be able to establish which portion of the data is their responsibility. An FDW may use the segment identifier and the number of segments to help make this determination. The following code snippet demonstrates how a foreign-data wrapper may retrieve the segment number and total number of segments:
int segmentNumber = GpIdentity.segindex; int totalNumberOfSegments = getgpsegmentCount();

Building a Foreign Data Wrapper Extension with PGXSYou compile the foreign-data wrapper functions that you write with the FDW API into one or more shared libraries that the Greenplum Database server loads on demand.
You can use the PostgreSQL build extension infrastructure (PGXS) to build the source code for your foreign-data wrapper against a Greenplum Database installation. This framework automates common build rules for simple modules. If you have a more complicated use case, you will need to write your own build system.
To use the PGXS infrastructure to generate a shared library for your FDW, create a simple  Makefile  that sets PGXS-specific variables.
Note:  Refer to  Extension Building Infrastructure  in the PostgreSQL documentation for information about the  Makefile  variables supported by PGXS.For example, the following  Makefile  generates a shared library in the current working directory named  base_fdw.so  from two C source files, base_fdw_1.c and base_fdw_2.c:
MODULE_big = base_fdw OBJS = base_fdw_1.o base_fdw_2.oPG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs)PG_CPPFLAGS = -I$(shell $(PG_CONFIG) --includedir) SHLIB_LINK = -L$(shell $(PG_CONFIG) --libdir) include $(PGXS)

A description of the directives used in this  Makefile  follows:
  • MODULE_big  - identifes the base name of the shared library generated by the  Makefile
  • PG_CPPFLAGS  - adds the Greenplum Database installation  include/  directory to the compiler header file search path
  • SHLIB_LINK  adds the Greenplum Database installation library directory ($GPHOME/lib/) to the linker search path
  • The  PG_CONFIG  and  PGXS  variable settings and the  include  statement are required and typically reside in the last three lines of the  Makefile.
To package the foreign-data wrapper as a Greenplum Database extension, you create script (newfdw--version.sql) and control (newfdw.control) files that register the FDW  handler  and  validator  functions, create the foreign data wrapper, and identify the characteristics of the FDW shared library file.
Note:  Packaging Related Objects into an Extension  in the PostgreSQL documentation describes how to package an extension.Example foreign-data wrapper extension script file named  base_fdw--1.0.sql:
CREATE FUNCTION base_fdw_handler() RETURNS fdw_handler AS ‘MODULE_PATHNAME‘ LANGUAGE C STRICT; CREATE FUNCTION base_fdw_validator(text[], oid) RETURNS void AS ‘MODULE_PATHNAME‘ LANGUAGE C STRICT; CREATE FOREIGN DATA WRAPPER base_fdw HANDLER base_fdw_handler VALIDATOR base_fdw_validator;

Example FDW control file named  base_fdw.control:
# base_fdw FDW extension comment = ‘base foreign-data wrapper implementation; does not do much‘ default_version = ‘1.0‘ module_pathname = ‘$libdir/base_fdw‘ relocatable = true

When you add the following directives to the  Makefile, you identify the FDW extension control file base name (EXTENSION) and SQL script (DATA):
EXTENSION = base_fdw DATA = https://www.songbingjia.com/android/base_fdw--1.0.sql

Running  make install  with these directives in the  Makefile  copies the shared library and FDW SQL and control files into the specified or default locations in your Greenplum Database installation ($GPHOME).
Deployment ConsiderationsYou must package the FDW shared library and extension files in a form suitable for deployment in a Greenplum Database cluster. When you construct and deploy the package, take into consideration the following:
  • The FDW shared library must be installed to the same file system location on the master host and on every segment host in the Greenplum Database cluster. You specify this location in the  .control  file. This location is typically the  $GPHOME/lib/postgresql/  directory.
  • The FDW  .sql  and  .control  files must be installed to the  $GPHOME/share/postgresql/extension/  directory on the master host and on every segment host in the Greenplum Database cluster.
  • The  gpadmin  user must have permission to traverse the complete file system path to the FDW shared library file and extension files.

    推荐阅读