Linux ABI

Motivation:

When I read:

./linux/types.h

, there is comment saying that:

aligned u64 should be used in defining kernel<->userspace ABIs to avoid common 332/64-bit comapt problems.

What is ABI:

It stands for Application Binary Interface.

ABI is important when it comes to application that use external libraries. If a program is built to use a particular library and that library is later updated, you don't want to have to re-compile that application (and from the end-user's standpoint, you may not have the source).

If the updated library uses the same ABI, then your program will not need to change.

The interface to the library (which is all your program really cares about) is the same even though the internal workings may have changed. Tow versions of a library that have the same ABI are sometimes called "binary-compatible" since they have the same low-level interface (you should be able to replace the old version with the new one and not have any major problems).

Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-comppiled to use the new version of the library. If ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.

For this reason, library writers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types and order of arguments), definitions of data types or data structures (return type and number, types, and order of argumenmts), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If you expand, say, a 16-bit data structure field into a 32-bit field, then already-compiled code that uses that data structure members gets converted into memory addresses and offsets during compilation and if the dataaaa structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.

That's why there is comment again:

aligned u64 should be used in defininng kernel<->userspace ABIs to avoid common 32/64-bit compat problems.

Let's keep going.

ABI is not necessarily something you will explicitly provide unless you are expecting people to interface with your code using assembly. It is nooot language-specific either, since (for example) a C application and a Pascal application will use the same ABI afterr they are compiled.

Forther More:

Regarding ABI is regarding ELF file format. The reason this information is included is because ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets.

This is how the application communicates important information about itself to the operating system. If you build a program in a not be able to interpret the binary file or run the application. This is one big either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.

All Comment:

aligned u64 should be used in defining kernel<->userspace ABIs to avoid common 32/64-bit compat problems.

64-bit values align to 4-byte boundaries on x86_32 (and possibly other architectures) and to 8-byte boundaries on 64-bit architectures. The new aligned_64 type enforce 8-byte alignment so that structs containing aligned 64 values have the same alignment on 32 bit and 64 bit architectures.

No conversions are necessary between 32 bit user-space and a 64 bit kernel.

 

 

Deep Dive into Neutron

Neutron ML2

The Modular Layer2 (ml2) plugin is a framework allowing OpenStack Networking to simultaneusly utilze the variety of layer2 networking technology found in complex real-world data centres.

It currently works with the existing

  • openvswitch
  • linuxbridge
  • hyperv

like:

     [ Neutron Server {ML2 Plugin} ]

      |_ [ Host A {linuxbridge agent} ] 

      |_ [ Host B{hyperv agent} ]

      |_ [ Host C{openvswitch agent} ]

      |_ [ Host D{openvswitch agent} ]

       * Exsiting ML2 Plugin works with existing agents

       * Separate agents for linuxbridge, openvswitch, and hyperv.

L2 agents, and is intended to

  • replace
  • deprecate

like:

     [ Neutron Server {ML2 Plugin} ]

      |_ [ Host A {modular agent} ]

      |_ [ Host B {modular agent} ]

      |_ [ Host C {modular agent} ]

      |_ [ Host D {modular agent} ]

       * Combine open source agents

       * Have a single agent which can support linuxbridge and openvswitch

       * Pluggable drivers for additional vswitches, infiniband, sr-iov, etc

The monolithic plugins associated with those L2 agents. The ml2 framework is also intended to greatly simplify adding support for new L2 networking technologies, requiring much less intial and ongoing effort than would be required to add a new monolithic core plugin. A modular agent may be developed as a follow-on effort.

Architecture

 ML2 Plugin {

     TypeDriver {

        VLAN, GRE, VXLAN, Flat

    },

    MechanismDriver {

        OpenvSwitch, Hyper-V, OpenDaylight, Arista, Cisco Nexus

    }

    ExtensionDriver

}

TypeDriver

Each available network type is managed by an ml2 TypeDriver. TypeDrivers maintain any needed type-specific network state, and perform provider network validation and tenant network allocation.

The ml2 plugin currently includes drivers for:

// ./neutron/plugins/ml2/driver_api.py

  38 class TypeDriver(object):

    ....

  56     def get_type(self):

Get driver's network type.

    ....

  64     def initialize(self):

Perform driver initialization

    ....

  74     def is_partial_segment(self, segment):

Return true if segment is a patially specified segment

    ....

  82     def validate_provider_segment(self, segment):

Validate attributes of a provider network segment

     ....

 102     def reserve_provider_segment(self, session, segment):

Reeserve resource associated with a provider network segment

    ....

 117     def allocate_tenant_segment(self, session):

Allocate resource for a new tenant network segment

    ....

 133     def release_segment(self, session, segment):

Release network segment

    ....

 147     def get_mtu(self, physical):

Get driver's network MTU

Mechanism Drivers

Each networking mechanism is managed by an mll2 MechanismDrivers. The MechanismDriver is responsible for taking the information established by the TypeDriver and ensuring that it is properly applied given the specific networking mechanisms that have been enabled.

The MechanismDriver interface currently supports the:

  • creatin
  • update
  • deletion

of network and port resources. For every action that can be taken on a resource, the mechanism driver exposes two methods:

  • ACTION_RESOURCE_percommit
  • ACTION_RESOURCE_postcommit

The precommit method:

 Used by mechanism drivers to validate the action being taken and make any required changes to the mechanism driver's private database.

 Should not block, and therefore cannot communicate with anything outside of Neutron.

The postcommit method:

 Is responsible for appropriately pushing the change to the resource to the entity responsible for applying that change.

// ./neutron/plugins/ml2/driver_api.py

  548 class MechanismDriver(object):

    ....

  570     def initialize(self):

Perform driver initialization

    ....

  579     def create_network_precommit(self, context):

Allocate resources for a new network

    .... 

  592     def create_network_postcommit(self, context):

Create a network

    ....

  605     def update_network_precommit(self, context):

Update resources of a network

    ....

  623     def update_network_postcommit(self, context):

Update a network

    ....

  641     def delete_network_precommit(self, context):

Delete resources for a network

    ....

  655     def delete_network_postcommit(self, context):

Delete a network

    ....

  669     def create_subnet_precommit(self, context):

Allocate resources for a new subnet

    ....

  682     def create_subnet_postcommit(self, context):

Create a subnet

    ....

  695     def update_subnet_precommit(self, context):

Update resources of a subnet

    ....

  713     def update_subnet_postcommit(self, context):

Update a subnet

    ....

  731     def delete_subnet_precommit(self, context):

Delete resources for a subnet

    ....

  745     def delete_subnet_postcommit(self, context):

Delete a subnet

    ....

  759     def create_port_precommit(self, context):

Allocate resources for a new port

    ....

  771     def create_port_postcommit(self, context):

Create a port

    ....

  783     def update_port_precommit(self, context):

Update resources of a port

    ....

  800     def update_port_postcommit(self, context):

Update a port

    ....

  818     def delete_port_precommit(self, context):

Delete resources of a port

    ....

  830     def delete_port_postcommit(self, context):

Delete a port

    ....

  844     def bind_port(self, context):

Attempt to bind a port

    ....

  882     def check_vlan_transparency(self, context):

 Check if the network supports vlan transparency

 ExtensionDriver

An extension driver extends the core resources implemented by the ML2 plugin with additional atributes.

// ./neutron/plugins/ml2/driver_api.py

  893 class ExtensionDriver(object):

    ....

  905     def initialize(self):

Perform driver initialization

    ....

  915     def extension_alias(self):

Supported extension alias

    ....

  923     def process_create_network(self, plugin_context, data, result):

Process extended attributes for create network

    ....

  937     def process_create_subnet(self, plugin_context, data, result):

Process extended attributes for create subnet

    ....

  951     def process_create_port(self, plugin_context, data, result):

Process extended attributes for create port

    ....

  965     def process_update_network(self, plugin_context, data, result):

Process extended attributes for update network

    ....

  979     def process_update_subnet(self, plugin_context, data, result):

Process extended attributes for update subnet

    ....

  993     def process_update_port(self, plugin_context, data, result):

Process extended attributes for update port

    ....

 1007     def extend_network_dict(self, session, base_model, result):

Add extended attributes to network dictionary

    ....

 1021     def extend_subnet_dict(self, session, base_model, result):

Add extened attributes to subnet dictionary

    ....

 1035     def extend_port_dict(self, session, base_model, result):

Add extended attributes to subnet dictionary

How Plugin Loaded?

// ./neutron/plugins/ml2/managers.py

 In TypeManager class, each function in driver_api.py is called:

 39 class TypeManager(stevedore.named.NamedExtensionManager):

 53     self._check_tenant_network_types(cfg.CONF.ml2.tenant_network_types)
 54     self._check_external_network_type(cfg.CONF.ml2.external_network_type)

 58     network_type = ext.obj.get_type()

170    driver.obj.initialize()

204    return driver.obj.is_partial_segment(segment)

213    driver.obj.validate_provider_segment(segment)

221    return driver.obj.reserve_provider_segment(session, segment)

225    return driver.obj.allocate_tenant_segment(session)

249    driver.obj.release_segment(session, segment)

In MechanismManager class, each function in driver_api.py is called:

284 class MechanismManager(stevedore.named.NamedExtensionManager):

318    driver.obj.initialize()

364     def create_network_precommit(self, context):

378     def create_network_postcommit(self, context):

392     def update_network_precommit(self, context):

405     def update_network_postcommit(self, context):

419     def delete_network_precommit(self, context):

490     def update_subnet_postcommit(self, context):

504     def delete_subnet_precommit(self, context):

517     def delete_subnet_postcommit(self, context):

535     def create_port_precommit(self, context):

562     def update_port_precommit(self, context):

589     def delete_port_precommit(self, context):

602     def delete_port_postcommit(self, context):

620     def bind_port(self, context):

Let's focus on TypeManager class.

                                                                                                                             To be continued...

How does Ceph OSD Handle the Request from Client

It's good starting point where you completely and perfectly understand how Ceph works.

This page will be update soon to put more perfect explanation coming from more analisys.

/*
 * Operation started.
 */
#./src/osd/OSD.cc
5676 bool OSD::dispatch_op_fast(OpRequestRef& op, OSDMapRef& osdmap)
5677 {
           ...
5700   // client ops
5701   case CEPH_MSG_OSD_OP:
5702     handle_op(op, osdmap);
           ...
/*
 * OSD started to handle request from client:
 */
7948 void OSD::handle_op(OpRequestRef& op, OSDMapRef& osdmap)
7949 {
           ...
8093   if (pg) {
8094     op->send_map_update = share_map.should_send;
8095     op->sent_epoch = m->get_map_epoch();
8096     enqueue_op(pg, op);
           ...
8160 void OSD::enqueue_op(PG *pg, OpRequestRef& op)
8161 {
           ...
8167   pg->queue_op(op);

#./src/osd/PG.cc
1835 void PG::queue_op(OpRequestRef& op)
1836 {
...
/*
 * An OSD instance will be created but it must wait for OSD map.
 */
1847   osd->op_wq.queue(make_pair(PGRef(this), op));

/*
 * An OSD looks for PG to read the data.
 */
#./src/osd/OSD.cc
8303 /*
8304  * NOTE: dequeue called in worker thread, with pg lock
8305  */
8306 void OSD::dequeue_op(
8307   PGRef pg, OpRequestRef op,
8308   ThreadPool::TPHandle &handle)
8309 {
...
8343   pg->do_request(op, handle);

/*
 * Transaction will be created on the basis of context.
 */
#./src/osd/ReplicatedPG.cc
 1238 void ReplicatedPG::do_request(
 1239   OpRequestRef& op,
 1240   ThreadPool::TPHandle &handle)
 1241 {
  ...
 1293     do_op(op); // do it now
 1294     break;
  ...
 1361 /** do_op - do an op
 1362  * pg lock will be held (if multithreaded)
 1363  * osd_lock NOT held.
 1364  */
 1365 void ReplicatedPG::do_op(OpRequestRef& op)
 1366 {
  ...

 1697   OpContext *ctx = new OpContext(op, m->get_reqid(), m->ops, obc, this);
  ...
 1760   execute_ctx(ctx);
  ...
 2218 void ReplicatedPG::execute_ctx(OpContext *ctx)
 2219 {
  ...
 2289   int result = prepare_transaction(ctx);
  ...
 5668 int ReplicatedPG::prepare_transaction(OpContext *ctx)
 5669 {
  ...
 5680   // prepare the actual mutation
 5681   int result = do_osd_ops(ctx, ctx->ops);
  ...
 3348 int ReplicatedPG::do_osd_ops(OpContext *ctx, vector<OSDOp>& ops)
 3349 {
  ...
 3463         } else if (pool.info.require_rollback()) {
 3464           ctx->pending_async_reads.push_back(
 3465             make_pair(
 3466               boost::make_tuple(op.extent.offset, op.extent.length, op.flags),
 3467               make_pair(&osd_op.outdata, new FillInExtent(&op.extent.length))));
 3468           dout(10) << " async_read noted for " << soid << dendl;
  ...
 2094   ctx->reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, false);
  ...
 2345   // read or error?
 2346   if (ctx->op_t->empty() || result < 0) {
  ...
 2348     if (result == 0)
  ...
 2351     if (ctx->pending_async_reads.empty()) {
  ...
 2353     } else {
  ...

/*

 * Read operation starts.

 */
 2355       ctx->start_async_reads(this);
  ...
 107 // OpContext
 108 void ReplicatedPG::OpContext::start_async_reads(ReplicatedPG *pg)
 109 {
 110   inflightreads = 1;
 111   pg->pgbackend->objects_read_async(

#./src/osd/ReplicatgedBackend.cc
 277 void ReplicatedBackend::objects_read_async(
 278   const hobject_t &hoid,
 279   const list<pair<boost::tuple<uint64_t, uint64_t, uint32_t>,
 280                   pair<bufferlist*, Context*> > > &to_read,
 281   Context *on_complete)
 282 {
  ...
 302       new AsyncReadCallback(r, on_complete)));
  ...
 114     new OnReadComplete(pg, this));
  ...
 93 struct OnReadComplete : public Context {
 94   ReplicatedPG *pg;
 95   ReplicatedPG::OpContext *opcontext;
 96   OnReadComplete(
 97     ReplicatedPG *pg,
 98     ReplicatedPG::OpContext *ctx) : pg(pg), opcontext(ctx) {}
 99   void finish(int r) {
 100     if (r < 0)
 101       opcontext->async_read_result = r;
 102     opcontext->finish_read(pg);
 103   }
 104   ~OnReadComplete() {}
 105 };
  ...
 117 void ReplicatedPG::OpContext::finish_read(ReplicatedPG *pg)
 118 {
 119   assert(inflightreads > 0);
 120   --inflightreads;
 121   if (async_reads_complete()) {
 122     assert(pg->in_progress_async_reads.size());
 123     assert(pg->in_progress_async_reads.front().second == this);
 124     pg->in_progress_async_reads.pop_front();
 125     pg->complete_read_ctx(async_read_result, this);
 126   }
 127 }
  ...
 5911 void ReplicatedPG::complete_read_ctx(int result, OpContext *ctx)
 5912 {
 5913   MOSDOp *m = static_cast<MOSDOp*>(ctx->op->get_req());
 5914   assert(ctx->async_reads_complete());
  ...
 5942   reply->set_result(result);
 5943   reply->add_flags(CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK);
 5944   osd->send_message_osd_client(reply, m->get_connection());
 5945   close_op_ctx(ctx, 0);
 5946 }

OpenStack Manual Installation - 4th Topic: To Build Cinder Component

Notice: This series is to expect to use kernel 3.10.0-229.

1.What Cinder provides.

 It provides block storage which allows block devices to be exposed and connected to compute instances for expanded storage, better performance and integration with enterprise storage platforms.

 It provides persistent block level storage devices for use with the OpenStack compute instances which can be exposed to applications as well.

The block storage system manages the creation, attaching and detaching of the block devices to servers. Block storage volumes are fully integrated into the OpenStack compute and the dashboard allowing for cloud users to manage their own storage requirements.

2. The capabilities.

 The block storage system manages the creation, attaching and detaching of the block devices to servers.

 Block storage is appropriate for performance sensitive scenarios such as database storage, expandable file systems, or providing a server with access to raw block level storage.

 Snapshot management provides powerful functionality for backing up data stored on block stoage volumes.

3. Architecture.

1. cinder-api

 To authenticates and routes requests throughout the block storage system.

2. cinder-scheduler

 To schedule, route volume create requests to the appropriate volume service.

3. cinder-volume

 To manage block storage devices, specifically the back-end devices themselves.

Cinder deployments will make use of a messaging queue to route information between the cinder processes as well as a database to store volume state.

4. Installation.

4.1. To install package

# yum -y install openstack-cinder

4.2. To do initial settting up.

# cp /etc/cinder/cinder.conf /etc/cinder/cinder.conf.org

# cp /usr/share/cinder/cinder-dist.conf /etc/cinder/cinder.conf

# source /root/keystonerc_admin

# openstack-db --init --service cinder --password cinder --rootpw cinder

# keystone user-create --name cinder --pass cinder

# keystone user-role-add --user cinder --type volume --description "OpenStack Block Storage Service"

# export CINDER_SERVICE_ID=`keystone service-list | grep cinder | awk '{print $2}'

 /* Ip address should be cinder's */

# keystone endpoint-create \

 --service-id ${CINDER_SERVICE_ID} \

--publicurl 'http://<ip address>/v1/%(tenant_id)s' \

--adminurl 'http://<ip address>/v1/%(tenant_id)s' \

--internalurl 'http://<ip address>/v1/%(tenant_id)s'

# keystone service-create --name=cinderv2 --type=volumev2 --description="Cinder Volume  Service V2"

# export CINDER_SERVICE_ID=`keystone service-list | grep cinder | awk '{print $2}'

/* Ip address should be cinder's */

# keystone endpoint-create \

 --service-id ${CINDER_SERVICE_ID} \

--publicurl 'http://<ip address>/v2/%(tenant_id)s' \   

--adminurl 'http://<ip address>/v2/%(tenant_id)s' \

--internalurl 'http://<ip address>/v2/%(tenant_id)s'

4.3. To configure Cinder

# crudini --set /etc/cinder/cinder.conf keystone_authtoken admin_tenant_name services

# crudini --set /etc/cinder/cinder.conf keystone_authtoken admin_user cinder

# crudini --set /etc/cinder/cinder.conf keystone_authtoken admin_password cinder

# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_userid rabbitadmin

# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_password rabbitpass

/* Ip address should be message broker's */

# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_host <ip address>

# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_use_ssl True

# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_port 5671

4.4. To reflect configurations

# systemctl enable openstack-cinder-api

# systemctl enable openstack-cinder-scheduler

# systemctl enable openstack-cinder-volume

# openstack-service start cinder

# openstack-status