Creating 8086 binary larger than 64 KiB using NASM or any other assembler

Question

For fun I'm developing an IBM PC emulator. I would like to test all instructions to see if I implemented them correctly. For most instructions this is straight forward, for "FAR CALL/JMP" it is problematic.

In my test-framework I create .bin-files that I load directly in emulated RAM (flat binaries) and then I execute them. Now as86 and NASM allow you to create .bin-files, but it seems only of 64 KiB and smaller?

My question now is: how can I create flat binary files for 16-bit x86 that are larger than 64 KiB and contain far CALL/JMP?

Example code:

    org $800

    xor ax,ax
    mov si,ax

    mov ss,ax
    mov ax,800h
    mov sp,ax

; JMP NEAR
test_001:
    mov si,0001h
    jmp test_001a_ok
    hlt
test_001a_ok:

; CALL NEAR
test_002:
    mov si,0002h
    call test_002_sub
    jmp test_002_ok
    hlt
test_002_sub:
    ret
    hlt
test_002_ok:

; JMP FAR
test_003:
    mov si,0003h
    jmp far test_003_ok
    hlt
    resb 70000
    hlt
    hlt
    hlt
test_003_ok_seg:
test_003_ok:

; CALL FAR
test_004:
    mov si,0004h
    call far test_004_sub
    jmp far test_004_ok
    hlt
    resb 70000
test_004_sub_seg:
test_004_sub:
    retf
    hlt
    resb 70000
    hlt
    hlt
    hlt
test_004_ok:

finish:
    mov ax,0xa5ee
    mov si,ax
    hlt

errors returned by nasm:

$ nasm -f bin jmp_call_ret.asm 
jmp_call_ret.asm:31: error: binary output format does not support segment base references
jmp_call_ret.asm:33: warning: uninitialized space declared in .text section: zeroing [-w+zeroing]
jmp_call_ret.asm:43: error: binary output format does not support segment base references
jmp_call_ret.asm:44: error: binary output format does not support segment base references
jmp_call_ret.asm:46: warning: uninitialized space declared in .text section: zeroing [-w+zeroing]
jmp_call_ret.asm:51: warning: uninitialized space declared in .text section: zeroing [-w+zeroing]

I'm using NASM version 2.16.01 from Ubuntu.

If the question is about how to make far calls in NASM, it's about NASM, or generating large flat binaries, not retrocomputing. You don't need files larger than 64KB to have far calls. — Justme, Commented Jul 2, 2023 at 9:12
You can create an MZ .EXE file using either -f obj to NASM and then a linker, or create the header and relocations manually with NASM's -f bin. It is possible to set up everything yourself, including segment relocations, and still create a flat format (no header) .COM file. As to the actual file size, flat fornat .COM files larger than 64 KiB are likely not supported by the DOS you use. — ecm, Commented Jul 2, 2023 at 10:31
@ecm OP is not using DOS, flat binary is directly loaded into memory by the emulator. EXE files could be used but loading them outside DOS needs to load it properly by applying the relocation table. — Justme, Commented Jul 2, 2023 at 11:03
Standard, selfish comment: if you’re building a test suite, please strongly consider releasing it as a test suite. They’re perpetually thin on the ground. — Tommy, Commented Jul 2, 2023 at 11:33
times 1024*1024 db 0x90 assembles just fine into a 1MiB flat binary with 2.16.01, presumably also with earlier NASM versions, so there isn't a hard 64K limit on output file size. Relative near jumps can't go more than +- 32KiB, so you might run into something that's not encodeable depending on your source code. — Peter Cordes, Commented Jul 3, 2023 at 13:51

Toby Speight · Accepted Answer · 2023-07-02 13:56:15Z

[As @Justme mentions, there is no need to make files larger than 64 KiB to use far calls. He's also right that this question is more about tool usage - or better picking the right format: .EXE vs .COM. Although it may be seen as border line on topic due being about some basic angles rarely mentioned today.]

What you call a 'flat binary' is built to the specs of what otherwise is called a .COM program, that is a program occupying a single combined segment for code, data and stack. The important point is that in such an environment no segment operations are needed, because the 8086 operates much like a classic 8- or 16-bit CPU with a flat 16-bit address space (*1). All references within such single segment code are absolute 16-bit constants based with no relation to the 20-bit address space. All are essentially relative addresses (*2), making the code freely movable with no relocation needed no matter (*3) at what 20-bit address that code will be loaded.

The situation is different when using far addressing. Far addressing includes a dedicated reference to 20-bit address space - the notorious Seg*10h+Ofs issue (*4). Far addresses are not location independent. They contain a segment part which contains an absolute address (*4). When such code is loaded those parts need to be adjusted (aka relocated) according to load address. Due this, the assembler output of a program containing multiple segments can never be a 'flat binary'. It will always be a relocatable binary, containing information where a loader has to adjust (segment) addresses. Under DOS that's known as an .EXE file.

There is one exception and that's ROM code. While usually rare to contain multiple segments, some linkers can produce 'flat' output when load addresses are given. I'm not sure whether the NASM tool chain supports this.

What's left is to manually calculate all needed segment addresses and have them compiled accordingly. For a "how to", you may want to consult your tools manuals.

At the most simple level, manual generation might look like this:

(Assuming it got loaded to absolute address)

         ORG      0800h
;        CALL     FAR PTR FARCALL
         DB       09Ah               ; OPC for CALL FAR
         DW       OFFSET FARCALL-100 ; Offset in Segment *1
         DW       0+10               ; Segment           *2

;        JMP      FAR PTR FARJMP
         DB       0EAh               ; OPC for JUMP FAR
         DW       OFFSET FARCALL-200 ; Offset in Segment
         DW       0+20               ; Segment

         HLT

FARCALL:
         RETF

FARJMP:
         HLT

FAR CALL/JMP are straightforward on x86; there is only a single opcode each. The subtraction at *1 and addition at *2 are meant to force CS to be changed, simplifying debugging :)

*1 - The main reason for the 8086 segmented model was that its 20-bit address space could be used by programs made for a 16-bit address space - after all, the 8086 was meant as interim offering to keep existing 8085 customers until the great all-embracing i432 was ready.

*2 - This is what makes base or segment register addressing so great for system design.

*3 - Well, in case of the 8086 in steps of 10h, marking one difference between segment and base registers.

*4 - In real mode, which is what we talk about.

"I'm not sure whether the NASM tool chain supports this." -- Yes, NASM's binary output is intended to be useful for producing ROM images without the need for an external link step (or linkers supporting either OMF, COFF, or other common object formats could also be used) — occipita, Commented Jul 10, 2023 at 19:28
@occipita The point is not about producing ROM code, as I'm quite sure it can do so, but if it can do code with multiple segment base addresses without a relocation/fixup during load — Raffzahn, Commented Jul 10, 2023 at 19:32

ecm · Accepted Answer · 2023-07-02 16:55:26Z

I wanted to make a few examples of what I mentioned in a comment. That is, to create an MZ .EXE header or to manually implement relocations in a flat format binary, all using only NASM's multi-section -f bin format.

To save on work, I actually created a single example with multiple build options to select what layout to use exactly.

Some macros used in this example are from my 8086 macro collection found at https://hg.pushbx.org/ecm/lmacros/ and all other files are in https://pushbx.org/ecm/test/20230702/

This is the main sources, test.asm:


; Public Domain

%include "lmacros3.mac"

numdef DOSEXIT, 0
numdef DOSENTRY, 0
numdef LARGEFILL, 0
numdef RELOC, 1
numdef BASESEGMENT, 0
numdef RELOCEXE, 0


        cpu 8086
%if _RELOCEXE
%assign ORIGIN 0
        org ORIGIN
        addsection HEADER, start=0
header_start:
        db "MZ"         ; exeSignature
FILESIZE equ fromparas(header_size_p + BEHINDSEGMENT)
        dw FILESIZE % 512       ; exeExtraBytes
        dw (FILESIZE + 511) / 512       ; exePages
        dw relocationtable.amount       ; exeRelocItems
        dw header_size_p        ; exeHeaderSize
        dw paras(512)           ; exeMinAlloc
        dw paras(512)           ; exeMaxAlloc
        dw +BEHINDSEGMENT       ; exeInitSS
        dw 512                  ; exeInitSP
        dw 0            ; exeChecksum
        dw 0, +0        ; exeInitCSIP
        dw relocationtable      ; exeRelocTable

        addsection FIRST, vstart=0 align=16 follows=HEADER
%elif _DOSENTRY
%assign ORIGIN 256
        org ORIGIN
        addsection FIRST, start=ORIGIN
%else
%assign ORIGIN 0
        org ORIGIN
        addsection FIRST, start=ORIGIN
%endif
first_start:
        addsection SECOND, vstart=0 align=16 follows=FIRST
second_start:
        addsection THIRD, vstart=0 align=16 follows=SECOND
third_start:

first_size equ first_end - first_start
endarea first, 1
second_size equ second_end - second_start
endarea second, 1
third_size equ third_end - third_start
endarea third, 1
%if _RELOCEXE
header_size equ header_end - header_start
endarea header, 1
%endif

FIRSTSEGMENT equ _BASESEGMENT + 0
SECONDSEGMENT equ _BASESEGMENT + paras(ORIGIN) + first_size_p
THIRDSEGMENT equ _BASESEGMENT + paras(ORIGIN) + first_size_p + second_size_p
BEHINDSEGMENT equ _BASESEGMENT + paras(ORIGIN) + first_size_p + second_size_p + third_size_p

%define RELOCATIONFROMFIRST ""
%define RELOCATIONFROMSECOND ""
%define RELOCATIONFROMTHIRD ""
        %imacro relocation 0-1.nolist -2
%%reloc equ $ + %1
%ifidn _CURRENT_SECTION, FIRST
 %xdefine RELOCATIONFROMFIRST RELOCATIONFROMFIRST, %%reloc, FIRSTSEGMENT
%elifidn _CURRENT_SECTION, SECOND
 %xdefine RELOCATIONFROMSECOND RELOCATIONFROMSECOND, %%reloc, SECONDSEGMENT
%elifidn _CURRENT_SECTION, THIRD
 %xdefine RELOCATIONFROMTHIRD RELOCATIONFROMTHIRD, %%reloc, THIRDSEGMENT
%else
 %error Unknown section for relocation
%endif
        %endmacro

        usesection FIRST
start:
        mov dx, cs
        mov ds, dx
%if !_RELOCEXE && _RELOC
        mov si, relocationtable
        mov cx, relocationtable.amount
        jcxz .noreloc
@@:
        lodsw
        xchg bx, ax
        lodsw
        add ax, dx
        mov es, ax
        add word [es:bx], dx
        loop @B
.noreloc:
%endif

displayfirst:
        mov si, firstmsg
        mov bx, 7
        mov ah, 0Eh
        db __TEST_IMM16         ; (skip int 10h)
@@:
        int 10h
        lodsb
        test al, al
        jnz @B

        call SECONDSEGMENT:secondentry
relocation

exit:
%if _DOSEXIT
        mov ax, 4C00h
        int 21h
%else
        xor ax, ax
        int 16h
        int 19h
%endif  


bouncetothird:
        jmp THIRDSEGMENT:bounced
relocation


firstmsg:       asciz "Hello from first!",13,10


        usesection SECOND

secondmsg:      asciz "Hello from second!",13,10

secondentry:
displaysecond:
        mov si, secondmsg
        mov bx, 7
        mov ah, 0Eh
        db __TEST_IMM16         ; (skip int 10h)
@@:
        int 10h
        cs lodsb
        test al, al
        jnz @B

        jmp THIRDSEGMENT:thirdentry
relocation


        usesection THIRD

        align 2, db 0
indirect_to_bounce:
                dw bouncetothird
relocation 0
                dw FIRSTSEGMENT

thirdmsg:       asciz "Hello from third!",13,10

thirdentry:
        jmp far [cs:indirect_to_bounce]

bounced:
displaythird:
        mov si, thirdmsg
        mov bx, 7
        mov ah, 0Eh
        db __TEST_IMM16         ; (skip int 10h)
@@:
        int 10h
        cs lodsb
        test al, al
        jnz @B

        retf


%ifn _RELOCEXE
        usesection FIRST
%else
        usesection HEADER
%endif
%if _RELOCEXE || _RELOC
        align 4, db 0
relocationtable:
.:
        dw RELOCATIONFROMFIRST
        dw RELOCATIONFROMSECOND
        dw RELOCATIONFROMTHIRD
.end:
.amount: equ (.end - .) / 4
%endif
%if _RELOCEXE
        align 16
header_end:
%endif

%if _LARGEFILL
        usesection SECOND
        _fill fromkib(64), 0CCh, second_start
%endif

        usesection FIRST
        align 16
first_end:
        usesection SECOND
        align 16
second_end:
        usesection THIRD
        align 16
third_end:

How does it work?

NASM allows to create far jumps and calls with a hardcoded immediate address, in the format jmp SEGMENT:OFFSET (no size keyword, two colon-separated immediate numbers).
The relocation macro defaults to creating an equate for $ - 2 (current output address minus 2), which points into the segment immediate of a preceding instruction. This reference is used to make Self-Modifying Code.
The macro can be used as relocation 0 to create a relocation table entry for the next word.
The relocation entries are emitted into a table. For simplicity this table is here always in the same format as used by the MZ .EXE header.
The segment values (FIRSTSEGMENT and so on) are defined using a base segment plus a displacement, in paragraphs, to the particular segment in our program image. The displacements are calculated from the label deltas that give the length of each segment in the program image.
The base segment can be nonzero to support loading at a fixed address.

Here are the commands I used to create three different builds of this program:

nasm -I ~/proj/lmacros/ test.asm -o test.com -l testcom.lst -D_DOSENTRY -D_DOSEXIT && dosemu -K "$PWD" -E "ldebug test.com" -dumb -td -kt

Build a flat format .COM file with DOS entry (origin = 256) and DOS termination call. (All I/O is done using the ROM-BIOS interfaces, unconditionally.)

nasm -I ~/proj/lmacros/ test.asm -o test.exe -l testexe.lst -D_DOSENTRY -D_DOSEXIT -D_RELOCEXE -D_LARGEFILL && dosemu -K "$PWD" -E "ldebug test.exe" -dumb -td -kt

Build an MZ .EXE file placing and using the relocation table within the MZ .EXE header. Still use DOS termination call. For fun, fill the SECOND segment to 64 KiB. This requires an MZ .EXE file under DOS, or alternatively what I call a .BIG file. (.BIG files are flat format files similar to .COM files but may exceed 64 KiB and the initial stack is set up in another segment. They're used internally by my debugger's build process.)

nasm -I ~/proj/lmacros/ test.asm -o test.bin -l testbin.lst -D_DOSENTRY=0 -D_RELOC=0 -D_BASESEGMENT=3000h -D_DOSEXIT=0 -D_RELOCEXE=0 -D_LARGEFILL && dosemu -K "$PWD" -E "ldebug /crcsip=3000_0000;lcs:ip /f test.bin" -dumb -td -kt

Build a flat format binary file, no DOS entry, use ROM-BIOS termination calls, and do not relocate at run time. The segment base is instead fixed to a particular address. The debugger command loads the file to that address. (Must be free!) For fun, do also fill the SECOND segment to 64 KiB.

As an addition, I tested loading the test.bin file as a kernel from my bootable debugger. These are the commands I used:

nasm -l boot12.lst ~/proj/ldosboot/boot.asm -I ~/proj/lmacros/ -D_LOAD_NAME="'LDEBUG'" -o boot12.bin; nasm ~/proj/bootimg/bootimg.asm -I ~/proj/lmacros/ -o disk12.img -D_BOOTPATCHFILE=boot12.bin -D_PAYLOADFILE=../../../proj/ldebug/bin/ldebug.com,ldebug.sld,test.bin

qemu-system-i386 -fda disk12.img -boot order=a -display curses

This is the ldebug.sld file:

:bootstartup
boot protocol freedos segment=3000 entry=0:0 test.bin

This boot command modifies the FreeDOS load protocol (which says the whole file must be loaded) to load at the same segment we specified as base address, and sets the entry parameter as well for good measure. The filename points to our test binary.

The bootimg and ldosboot repos are found at https://hg.pushbx.org/ecm/ as well. The debugger is hosted at https://pushbx.org/ecm/web/#projects-ldebug

Nice write up. Like it. You're aware that the question is about testing a bare bone emulator - no OS, no loader? No need for complex constructs just needing more code to watch (and test) before reaching the stuff it intends to test? — Raffzahn, Commented Jul 2, 2023 at 14:59
@Raffzahn Yes, I am aware. That is why I added the options to use a fixed segment base and no run time relocation. A flat format .BIN file like that should be just what you want for testing an emulator without a DOS. (In the example I used ROM-BIOS interrupt calls but if you don't have even those yet you're free to replace those of course.) — ecm, Commented Jul 2, 2023 at 15:01

Justme · Accepted Answer · 2023-07-02 11:27:52Z

Making far calls/jumps to far absolute addresses or via pointer to far absolute addresses is not that hard if you know what the address will be beforehand or you can adjust the addresses based on where the program is currently loaded.

You are anyway loading the flat binaries to certain physical memory address which can be seen as some segment with zero offset.

You have many options.

You could make two standard flat binaries, e.g. 64kB if you like, and copy/merge them into single 128kB file which you load. Or load two standard binaries to known addresses if your emulator supports it. As long as the two different "programs" know where they are in the memory, you can freely make a far calls between them and far returns.

You can manually do that in single NASM program as well, you know that if this program is at any CS base segment, then the second part of the program will be in some higher segment base which you can calculate.

supercat · Accepted Answer · 2023-07-02 16:00:13Z

3

If one can divide the program into a number of sections of 64K or less, and can assign global sequential indices to every cross-section entry point, one could produce a table in the data segment wiht a 4-byte entry for each. If each section started with the pattern:

    dw offsetOfFunction
    dw offset entryTable + (entryIndex*4)

followed by a zero word, and one had a zero-terminated table listing the full addresses of all such tables, one could start the program by pushing the segment and offset that table and invoking something like:

    push bp
    mov si,[bp+4]  ; Offset part of pushed pointer
    jmp nextSection
processEntry:
    mov di,es:[bx+2]
    mov [di],cx
    mov [di+2],es
    add bx,4
getEntry:
    mov cx,es:[bx]
    or  cx,cx
    jnz processEntry
nextSection:
    mov es,[bp+6]  ; Segment part of pushed pointer
    les bx,[es:si]
    add si,4
    mov ax,es ; Test if seg is zero
    or  ax,ax
    jnz getEntry
    pop bp
    ret

Once one did that, a cross-module call to a function with a particular index could be coded as:

    call far [offset entryTable + (entryIndex*4)]

This would be slower than using a normal far call instruction, but more compact. Any function invoked via such means would need to expect far-call invocation, but intra-segment calls to such functions can be performed via:

    push cs
    call near sameSegmentFunction

without need for any segment fix-ups anywhere in the code (which can all be read-only). Note that this approach will work even if code doesn't know what segments it will occupy in address space until it's loaded, if the program can construct a master list of entry-point tables on the stack and push the address of that list.

answered Jul 2, 2023 at 16:00

supercat

37.6k3 gold badges67 silver badges167 bronze badges

1

or cx,cx is not a useful idiom, please don't teach bad habits. Use test cx,cx to set FLAGS according to a register value; it's at least as efficient on all CPUs (except some corner cases on P6-family with register-read stalls which won't apply here since you're writing the register right before testing), and more efficient in many cases (macro-fusion with jcc). See Test whether a register is zero with CMP reg,0 vs OR reg,reg?. On actual 8086 itself, they run identically, zero benefit to the old or reg,reg idiom that probably comes from 8080.
– Peter Cordes
Commented Jul 3, 2023 at 13:57
@PeterCordes: No objection to favoring TEST as an "at least as good in essentially all cases" alternative to OR; the "OR" idiom is one that works--and is named consistently--on almost all kinds of CPUs. My main point was to illustrate the concept of building a vector table in RAM. Although intra-segment calls to a a "far" function will be a byte bigger than calls to a "near" function, far calls performed through the vector table will IIRC be a byte smaller than direct "far call" instructions would be, and will work in scenarios where code might be put in ROM without knowing...
– supercat
Commented Jul 3, 2023 at 15:37
...where it will be mapped (e.g. if one had a system with multiple slots for 64KB ROM cartridges, and one needed to be able to insert ROMs for any combination of applications that would fit, along with a startup/program-select ROM that would identify and allow users to select and run any installed program).
– supercat
Commented Jul 3, 2023 at 15:39
I wasn't commenting about your main point at all, just that one detail. or reg,reg is a worse idiom that a new generation of coders shouldn't learn by example.
– Peter Cordes
Commented Jul 3, 2023 at 16:05
@PeterCordes: Most of my 8086 coding was in an era where the instructions were equivalent, and it's hard to teach an old dog new tricks, but I'll try to keep the TEST instruction in mind for future x86 examples. BTW, I think it's a shame more processors don't have both "load while setting flags" and "load without setting flags" instructions, since both kinds of instructions are very useful.
– supercat
Commented Jul 3, 2023 at 16:13

| Show 2 more comments

Davislor · Accepted Answer · 2023-07-04 03:29:39Z

Update

Section 8.1 of the NASM manual, and particularly 8.1.3, describe how to declare multiple sections of a NASM .bin file. You then use the seg operator to obtain the segment value of each section.

So, for example, if your two modules are init_code and main_code, you would define these as two different SECTION declarations and declare all your routines within one or the other.

The manual gives the following three different ways to call code defined in a different section:

call    (seg procedure):procedure 
call    weird_seg:(procedure wrt weird_seg)
call far procedure

Where the wrt syntax calls the procedure relative to a base segment other than its default. So, if you had three sections named start, middle and end, where start and middle both fit in a segment, and middle and end both fit into an overlapping segment, you might begin with CS set to start, allowing you to make near calls to any function in start and to functions in middle wrt start. Later, you would make a far call to change CS to middle, allowing you to make near calls to functions in middle and end wrt middle.

In your case you probably want to declare your code in different progbits sections with different names, making far calls between modules and near calls within them.

NASM is capable of assembling .bin files larger than 64K, with multiple SECTION directives. You might also load two .bin files at different absolute addresses and have both source source files declare the other’s absolute base address as a constant. You might also try setting up each module’s interface as an interrupt handler.

(Incorrect paragraph removed. While you can set a 16-bit register to seg foo, you can only call far to an immediate operand or an indirect address in memory.)

Original Answer

You want to create an .exe file in MZ format, using either the Medium memory model (if all the data can fit in a single segment) or Large (if you also could have more than 64K of data and stack).

The most common way of doing this is to assemble two object files in .osf format that both have their own code segments, and link them together. In a real program, you’d give each code segment its own name, such as init_code, and split the routines into many source files that each specifies which public code segment it belongs to. Then, functions within the same module can make near calls and jumps to each other, and only calls across a module boundary would be far. NASM alternatively has a group directive to pack segments together (but if you try this with code and data as the manual suggests, note that modern OSes might not let the same page of memory be both writable and executable).

Not really what OP needs, because there is no DOS to load and relocate EXE files. — Justme, Commented Jul 3, 2023 at 21:01
call es:foo isn't encodeable. Far jumps / calls are either call far ptr16:16 absolute direct (with 32 bits of immediate seg:off), or call far [m16:16] absolute indirect. There's no form that sets CS from a different segment register and takes an offset as either immediate or indirect. felixcloutier.com/x86/call — Peter Cordes, Commented Jul 4, 2023 at 2:59
Re: outputting a large .bin: yes, this is trivially testable with times 1024*1024 db 0x90 which assembles just fine into a 1MiB flat binary with 2.16.01, presumably also with earlier NASM versions. Source code filled with relative references of course can fail to assemble when they're farther than +-32KiB, but that's not directly a file-size problem. — Peter Cordes, Commented Jul 4, 2023 at 3:03
@PeterCordes I did check that and thought the segment override prefix worked, but I must have misread. Will remove. — Davislor, Commented Jul 4, 2023 at 3:17

Stack Exchange Network

Creating 8086 binary larger than 64 KiB using NASM or any other assembler

5 Answers 5

Update

Original Answer

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
assembly
x86
.

Hot Network Questions

Creating 8086 binary larger than 64 KiB using NASM or any other assembler

5 Answers 5

Update

Original Answer

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged assemblyx86.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
assembly
x86
.