Saturday, August 20, 2005

std::ios::binary?

§27.4.2.1.4 Type ios_base::openmode
Says this about the std::ios::binary openmode flag:
*binary*: perform input and output in binary mode (as opposed to text mode)

And that is basically _all_ it says about it. What the heck does the binary
flag mean?
- Steven T. Hatton

28 Comments:

At 10:34 PM, August 20, 2005, Anonymous Anonymous said...

Basically it means that the i/o functions should not do any translation
to/from external representation of the data: they should behave as they
really should have behaved by default, but unfortunately do not.

For example, a Windows text file has each line terminated by carriage return
+ linefeed, as opposed to just linefeed in Unix, and in text mode '\n' is
translated accordingly for output, and these sequences are translated to
'\n' on input -- in practice '\r' is carriage return and '\n' is linefeed.

Also, for example, in a Windows text file Ctrl Z denotes end-of-file.
That's useful for including a short descriptive text snippet at the start of a binary file, but I suspect it was originally a misunderstanding of the
Unix shell command to send the current line immediately (which for an empty line means zero bytes, which in Unix indicates end-of-file). So in text
mode in Windows, a Ctrl Z might be translated to end-of-file on input.

Interestingly the C++ iostream utilities are so extremely badly designed that you can't make a simple copy-standard-input-to-standard-output-exactly
program using only the standard C++ library, on systems where this is
meaningful but text translation occurs in text mode.

Of course, the religious C++'ers maintain that that shouldn't be possible anyway because you can't do it on, say, a mobile phone, where C++ could be used for something, but then they forget that i/o is there for a reason.

 
At 10:35 PM, August 20, 2005, Anonymous Anonymous said...

On some platforms "\n" translates to "\r\n" when written (and the
reverse when read) to a file while on others it does not. binary mode
will make no such translation.

 
At 10:36 PM, August 20, 2005, Anonymous Anonymous said...

Is that per the Standard, or "implementation imposed"? I know that what you describe is what I typically think of when I think of binary I/O. For example, in ancient times we used to have to explicitly tell ftp to use
binary mode transfer.

 
At 10:41 PM, August 20, 2005, Anonymous Anonymous said...

Yes. I am aware of DOS spiders. ^M is what it looks like in Emacs, and it
is never a nice thing to have in a tarball. dos2unix is a great tool.

> Also, for example, in a Windows text file Ctrl Z denotes end-of-file.
> That's useful for including a short descriptive text snippet at the start
> of a binary file, but I suspect it was originally a misunderstanding of
> the Unix shell command to send the current line immediately (which for an
> empty
> line means zero bytes, which in Unix indicates end-of-file). So in text
> mode in Windows, a Ctrl Z might be translated to end-of-file on input.

> Interestingly the C++ iostream utilities are so extremely badly designed
> that you can't make a simple
> copy-standard-input-to-standard-output-exactly program using only the
> standard C++ library, on systems where this is meaningful but text
> translation occurs in text mode.

I'm now wondering if I really understood. If I read "characters" from a
std::istream, it goes into an error state when it hits an EOF. That's why
stuff like this works (when it works)

std::vector<"float_pair"> positions
(istream_iterator<"float_pair"> (file),
(istream_iterator<"float_pair"> ()));

I don't believe binary files are terminated by a special character, but I
could be wrong (again).

Take this example:
####################################
Thu Jul 28 06:03:39:> cat main.cpp
#include <"fstream">
#include <"vector">
#include <"iterator">
#include <"iostream">
#include <"sstream">

using namespace std;

main(int argc, char* argv[]){
if(argc<2) { cerr << "give me a file name" << endl; return -1; }

ifstream file (argv[1],ios::binary);

if(!file) { cerr << "couldn't open the file:"<< argv[1] << endl; return
-1; }

std::vector<"unsigned char"> data;

copy(istream_iterator<"unsigned char">(file)
, istream_iterator<"unsigned char">()
, back_inserter(data));

cout<<"read "<< data.size()<<"bytes of data"<< endl;

file.clear();
file.seekg(0,ios::beg);
ostringstream oss;
oss << file.rdbuf();
cout<<"read "<< oss.str().size()<<"bytes of data"<< endl;

}

Thu Jul 28 06:10:21:> g++ -obinio main.cpp

Thu Jul 28 06:13:04:> ./binio binio
read 22075bytes of data
read 22470bytes of data

##########################

Notice the second output is larger than the first.

> Of course, the religious C++'ers maintain that that shouldn't be possible
> anyway because you can't do it on, say, a mobile phone, where C++ could be
> used for something, but then they forget that i/o is there for a reason.

I suspect there are "political" reasons things turned out the way they did.
I really don't know how much of a performance hit it would be if certain
platforms had to do some extra endian shuffling. I do believe the lack of
real binary I/O in the Standard Library is an unexpected inconvenience.
Stroustrup bluntly states that binary I/O is beyond the scope of C++ Standard, and beyond the scope of TC++PL(SE). §21.2.1

Here's an interesting observation:
compiled with gcc 3.3.5
-rwxr-xr-x 1 hattons users 40830 2005-07-28 06:22 binio-3.3.5
compiled with gcc 4.0.1
-rwxr-xr-x 1 hattons users 22470 2005-07-28 06:23 binio-4.0.1

And 4.0.1 produces (much) faster code as well.

 
At 10:43 PM, August 20, 2005, Anonymous Anonymous said...

[snip]
> > For example, a Windows text file has each line terminated by carriage
> > return + linefeed, as opposed to just linefeed in Unix, and in text mode
> > '\n' is translated accordingly for output, and these sequences are
> > translated to
> > '\n' on input -- in practice '\r' is carriage return and '\n' is
> > linefeed.

> Yes. I am aware of DOS spiders. ^M is what it looks like in Emacs,

That's because a carriage return is ASCII 13, Ctrl M.

It would be different using EBCDIC, I imagine.

;-)

[snip]

> > Interestingly the C++ iostream utilities are so extremely badly designed
> > that you can't make a simple
> > copy-standard-input-to-standard-output-exactly program using only the
> > standard C++ library, on systems where this is meaningful but text
> > translation occurs in text mode.

> I'm now wondering if I really understood. If I read "characters" from a
> std::istream, it goes into an error state when it hits an EOF. That's why
> stuff like this works (when it works)

> std::vector<"float_pair"> positions
> (istream_iterator<"float_pair"> (file),
> (istream_iterator<"float_pair"> ()));

> I don't believe binary files are terminated by a special character, but I
> could be wrong (again).

Not in Unix, and not in Windows. However a binary file can contain any byte
values, and those that look like end-of-line markers will be translated in
text mode, and the first that looks like an end-of-file marker may be
translated. All depending on the implementation.

[snip]

> ##########################

> Notice the second output is larger than the first.

That's probably because the ostringstream does some text mode shenanigans;
although I haven't checked.

> > Of course, the religious C++'ers maintain that that shouldn't be possible
> > anyway because you can't do it on, say, a mobile phone, where C++ could be
> > used for something, but then they forget that i/o is there for a reason.

> I suspect there are "political" reasons things turned out the way they did.
> I really don't know how much of a performance hit it would be if certain
> platforms had to do some extra endian shuffling.

? The _default_ is translation. That is, the default is the overhead &
performance hit (+ other much more evil effects) you're mentioning.

> I do believe the lack of
> real binary I/O in the Standard Library is an unexpected inconvenience.
> Stroustrup bluntly states that binary I/O is beyond the scope of C++
> Standard, and beyond the scope of TC++PL(SE). §21.2.1

There is no §21.2.1.

 
At 10:45 PM, August 20, 2005, Anonymous Anonymous said...

>> I don't believe binary files are terminated by a special character, but I
>> could be wrong (again).

> Not in Unix, and not in Windows. However a binary file can contain any
> byte values, and those that look like end-of-line markers will be
> translated in text mode, and the first that looks like an end-of-file
> marker may be
> translated. All depending on the implementation.

And they call it a "standard"? :/

> [snip]
>> ifstream file (argv[1],ios::binary);

>> if(!file) { cerr << "couldn't open the file:"<< argv[1] << endl; return
>> -1; }

>> std::vector<"unsigned char"> data;

>> copy(istream_iterator<"unsigned char">(file)
>> , istream_iterator<"unsigned char">()
>> , back_inserter(data));

>> cout<<"read "<< data.size()<<"bytes of data"<< endl;

>> file.clear();
>> file.seekg(0,ios::beg);
>> ostringstream oss;
> [snip]

>> ##########################

>> Notice the second output is larger than the first.

> That's probably because the ostringstream does some text mode shenanigans;
> although I haven't checked.

Sorry, I forgot one important point.
$ls -l binio-3.3.5
-rwxr-xr-x 1 hattons users 40830 2005-07-28 06:22 binio-3.3.5

###### note the file size above, and the second output value below:

$./binio-3.3.5 binio-3.3.5
read 40034bytes of data
read 40830bytes of data

As I understand things, when I do `file.rdbuf() >> oss' I am getting a raw
stream. That, too, may be implementation defined for all I know.

>> I suspect there are "political" reasons things turned out the way they
>> did. I really don't know how much of a performance hit it would be if
>> certain platforms had to do some extra endian shuffling.

> ? The _default_ is translation. That is, the default is the overhead &
> performance hit (+ other much more evil effects) you're mentioning.

So is it the case that ios::binary may still not produce real binary
streams? That is, the stream could still act in some ways like a text
stream, e.g., eof?

>> I do believe the lack of
>> real binary I/O in the Standard Library is an unexpected inconvenience.
>> Stroustrup bluntly states that binary I/O is beyond the scope of C++
>> Standard, and beyond the scope of TC++PL(SE). §21.2.1

> There is no §21.2.1.

TC++PL(SE).

 
At 10:46 PM, August 20, 2005, Anonymous Anonymous said...

"Binary" and "text" are both terms of art from the C Standard, which
is included by reference in the C++ Standard. Binary I/O is byte
transparent, with the possible exception of padding NUL bytes.
Text I/O endeavors to translate between internal newline-delimited
text lines and however the system commonly represents text outside
the program.

 
At 10:47 PM, August 20, 2005, Anonymous Anonymous said...

> Also, for example, in a Windows text file Ctrl Z denotes end-of-file.
> That's useful for including a short descriptive text snippet at the start
> of
> a binary file, but I suspect it was originally a misunderstanding of the
> Unix shell command to send the current line immediately (which for an
> empty
> line means zero bytes, which in Unix indicates end-of-file).

No, the usage originated in early systems that couldn't describe the
length of a file to the nearest byte. A CTL-Z delimited the logical
end of text, so your program didn't read trailing garbage.

> So in text
> mode in Windows, a Ctrl Z might be translated to end-of-file on input.
> Interestingly the C++ iostream utilities are so extremely badly designed
> that you can't make a simple
> copy-standard-input-to-standard-output-exactly
> program using only the standard C++ library, on systems where this is
> meaningful but text translation occurs in text mode.

Well, yes you can. Open files in binary mode and use read/write.

> Of course, the religious C++'ers maintain that that shouldn't be possible
> anyway because you can't do it on, say, a mobile phone, where C++ could be
> used for something, but then they forget that i/o is there for a reason.

Nonsense.

 
At 10:48 PM, August 20, 2005, Anonymous Anonymous said...

> Is that per the Standard, or "implementation imposed"?

The C and C++ Standards both require that text mode I/O do whatever
is necessary to convert between the universal internal form of text
streams and whatever the execution environment requires instead.

> I know that what you
> describe is what I typically think of when I think of binary I/O. For
> example, in ancient times we used to have to explicitly tell ftp to use
> binary mode transfer.

And you still do, sometimes, if your FTP utility can't make an
intelligent guess.

 
At 10:49 PM, August 20, 2005, Anonymous Anonymous said...

> I don't believe binary files are terminated by a special character, but I
> could be wrong (again).

Correct. But the I/O subsystem on every OS has *some* way to tell
you when you run out of input characters.

> ...
>> Of course, the religious C++'ers maintain that that shouldn't be possible
>> anyway because you can't do it on, say, a mobile phone, where C++ could
>> be
>> used for something, but then they forget that i/o is there for a reason.

> I suspect there are "political" reasons things turned out the way they
> did.

Nonsense. In the early 1970s, Unix pioneered the notion of a universal
format for text streams, by pushing any needed mappings out to the
device drivers. That text stream model became an integral part of C,
which first evolved under Unix. In the late 1970s and early 1980s,
Whitesmiths, Ltd. ported C to several dozen operating systems. We
elaborated the text/binary I/O model as a way of preserving both the
universal text stream format and the transparent text stream, as
needed. All that technology was captured in the C Standard in the
mid 1980s. It was then incorporated by reference in the C++ Standard
in the mid 1990s. It's there because it works.

> I really don't know how much of a performance hit it would be if certain
> platforms had to do some extra endian shuffling. I do believe the lack of
> real binary I/O in the Standard Library is an unexpected inconvenience.

Might be, if there were such a lack.

> Stroustrup bluntly states that binary I/O is beyond the scope of C++
> Standard, and beyond the scope of TC++PL(SE). §21.2.1

Stroustrup is not nearly as familiar with the Standard C++ library
as he is with the language he invented.

 
At 10:49 PM, August 20, 2005, Anonymous Anonymous said...

>> Not in Unix, and not in Windows. However a binary file can contain any
>> byte values, and those that look like end-of-line markers will be
>> translated in text mode, and the first that looks like an end-of-file
>> marker may be
>> translated. All depending on the implementation.

> And they call it a "standard"? :/

Yes they do. The C Standard describes how to impose order over a
diverse range of operating systems. You can describe practically
every car made today as having a steering wheel, an accelerator,
and a brake -- if you want to emphasize what's standard about
them. Or you can discuss at great length the different kinds of
linkages and braking systems -- if you want to emphasize how
they differ. Depends on your "political" goal, I suppose.

 
At 10:50 PM, August 20, 2005, Anonymous Anonymous said...

> Well, yes you can. Open files in binary mode and use read/write.

That should be only a few lines; could you please present the code?

> > Of course, the religious C++'ers maintain that that shouldn't be possible
> > anyway because you can't do it on, say, a mobile phone, where C++ could be
> > used for something, but then they forget that i/o is there for a reason.

> Nonsense.

See above. ;-)

 
At 10:51 PM, August 20, 2005, Anonymous Anonymous said...

>> Well, yes you can. Open files in binary mode and use read/write.

> That should be only a few lines; could you please present the code?

#include <"fstream">

int main(int argc, char **argv)
{ // copy a file
if (2 < argc)
{ // copy argv[1] to argv[2] transparently
std::ifstream ifs(argv[1],
std::ios_base::in | std::ios_base::binary);
std::ofstream ofs(argv[2],
std::ios_base::out | std::ios_base::binary);

ofs << ifs.rdbuf();
}
return (0);
}

(I was wrong about needing read and write.)

 
At 10:52 PM, August 20, 2005, Anonymous Anonymous said...

Thanks for the code.

Since I (naturally) don't use iostreams much -- and in fact it's been some time since I did C++ development -- I learned a new idiom.

And you weren't wrong about read and write: it can be done that way.

What you were wrong about:

The above doesn't copy standard input to standard output. ;-)

 
At 10:53 PM, August 20, 2005, Anonymous Anonymous said...

Josuttis provides a similar example.

What I would like to know is how to get in iterator over a buffer such as std::basic_filebuf, so that I can do something like:

copy(istream_iterator<"unsigned char">(file_buf.eback())
, istream_iterator<"unsigned char">(file_buf.egptr())
, back_inserter(data));

Which I can't do without inheriting from the buffer. That makes binary I/O seem inconsistent with the rest of the library.

 
At 10:54 PM, August 20, 2005, Anonymous Anonymous said...

Dose this potentially modify the stream data? If so, where?

/* The following code example is taken from the book
* "The C++ Standard Library - A Tutorial and Reference"
* by Nicolai M. Josuttis, Addison-Wesley, 1999
*
* (C) Copyright Nicolai M. Josuttis 1999.
* Permission to copy, use, modify, sell and distribute this software
* is granted provided this copyright notice appears in all copies.
* This software is provided "as is" without express or implied
* warranty, and with no claim as to its suitability for any purpose.
*/
#include <"iostream">

int main ()
{
// copy all standard input to standard output
std::cout << std::cin.rdbuf();

}

 
At 10:54 PM, August 20, 2005, Anonymous Anonymous said...

> Which I can't do without inheriting from the buffer. That makes binary
> I/O seem inconsistent with the rest of the library.

Actually I believe that should be more like:

copy(file_buf.eback(), file_buf.egptr(), back_inserter(data));

 
At 10:56 PM, August 20, 2005, Anonymous Anonymous said...

> Dose this potentially modify the stream data?

Yes.

> If so, where?

Well, I don't really care exactly where -- when you know the car has
square wheels it doesn't really matter exactly what prevents the motor from starting and the door from opening, so I've never been interested in that.

> int main ()
> {
> // copy all standard input to standard output
> std::cout << std::cin.rdbuf();
> }

With MSVC 7.1 under Windows XP Professional:

P:\> dir | find "exe"
28.07.2005 15:05 233 472 vc_project.exe

P:\> vc_project < vc_project.exe >x

P:\> dir | find "x"
28.07.2005 15:05 233 472 vc_project.exe
28.07.2005 15:09 4 484 x

P:\> _

 
At 10:57 PM, August 20, 2005, Anonymous Anonymous said...

Well, I _am_ interested because I am (was) under the impression that
std::cout.rdbuf() would give me raw data. but now, it seems as if it might
kill spiders, or something like that. The standard streams may actually be
bad examples since they are very OS specific.

 
At 10:59 PM, August 20, 2005, Anonymous Anonymous said...

> Well, I _am_ interested because I am (was) under the impression that
> std::cout.rdbuf() would give me raw data. but now, it seems as if it
> might
> kill spiders, or something like that. The standard streams may actually
> be bad examples since they are very OS specific.

How 'bout this?

#include < fstream >
#include < iostream >

using namespace std;

int main(int argc, char* argv[]) {
if(argc < 3) { cerr<<"in_file_name out_file_name"<< endl; return -1; }

ifstream ifs(argv[1],ios::binary);
if(!ifs) { cerr<<" failed to open input file: "<< argv[1]<< endl; return
-1; }

ofstream ofs(argv[2],ios::binary);
if(!ofs) { cerr<<" failed to open output file: "<< argv[2]<< endl; return
-1; }
ofs << ifs.rdbuf();

}

 
At 11:00 PM, August 20, 2005, Anonymous Anonymous said...

> Well, I _am_ interested because I am (was) under the impression that
> std::cout.rdbuf() would give me raw data. but now, it seems as if it might
> kill spiders [control characters], or something like that.

It does.

> The standard streams may actually be bad examples since they are very
> OS specific.

The OS specificity is common to all streams; the standard streams are not special in this regard and do no special processing, have special features etc., except there's no standard C++ way, AFAIK, to turn off their standard C++ stream objects' trashing of the data.

 
At 11:00 PM, August 20, 2005, Anonymous Anonymous said...

> Also, for example, in a Windows text file Ctrl Z denotes end-of-file.
> That's useful for including a short descriptive text snippet at the start
> of a binary file, but I suspect it was originally a misunderstanding of
> the Unix shell command to send the current line immediately (which for an
> empty line means zero bytes, which in Unix indicates end-of-file).
> So in text mode in Windows, a Ctrl Z might be translated to end-of-file
> on input.

The ctrl-z convention in ms-dos is inherited from cp/m operating systems,
they used such convention because they not stored the exact length of the
file, only the number of sectors used. Mark the end with an special
character was the easier solution.

But the ctl-z was not required. If the text file size was a multiple of the sector size, inserting the eof mark was not required to avoid wasting 1 sector (dozens of GiB disks were not very popular on small systemes those days ;-) ).

 
At 11:02 PM, August 20, 2005, Anonymous Anonymous said...

> And you weren't wrong about read and write: it can be done that way.

I was wrong that read and write are necessary, that's all.

> What you were wrong about:

> The above doesn't copy standard input to standard output. ;-)

Sorry, I missed that bit. It is true in C90/C95 that you can't
freopen a standard stream to change its mode; it *might* work
properly in C99 but it's not guaranteed. So it has indeed been
a longstanding limitation of Standard C that you can't idly
switch between text and binary I/O, any more than you can idly
switch between byte and wide-character I/O. But that's generally a minor nuisance. The existence of portable software tools, such as the MKS Toolkit, shows that you can still implement much of the Unix idiom on arbitrary operating systems.

 
At 11:02 PM, August 20, 2005, Anonymous Anonymous said...

> Which I can't do without inheriting from the buffer. That makes binary
> I/O
> seem inconsistent with the rest of the library.

You're presuming that you have to use unsigned char to transmit
binary date. While in principle you can make a perverse implementation
of C that corrupts char data and still conforms, in the real world
that doesn't happen. Just do the obvious with an ifstream opened in
binary mode and it'll work fine.

 
At 11:03 PM, August 20, 2005, Anonymous Anonymous said...

> int main ()
> {
> // copy all standard input to standard output
> std::cout << std::cin.rdbuf();
> }

It can, on all but Unix systems. As frequently described earlier
in this thread, reading a text stream under Windows converts CR/LF
to LF and stops reading at CTL-Z.

 
At 11:03 PM, August 20, 2005, Anonymous Anonymous said...

> The OS specificity is common to all streams; the standard streams are not
> special in this regard and do no special processing, have special features
> etc., except there's no standard C++ way, AFAIK, to turn off their
> standard
> C++ stream objects' trashing of the data.

What's special about the standard streams is that they're opened
in text mode prior to program startup. At least that's how command
interpreters (shells) almost always work. Nothing prevents you from
starting a program with its standard streams opened in binary
mode, however.

 
At 11:04 PM, August 20, 2005, Anonymous Anonymous said...

I have no doubt whatsoever that C++ apps _can_ be ported to Windows and act like the run on GNU/Linux an other Unix oriented systems. You can run the KDE on Windows! But I am fairly well convinced that C#, C#++ (aka C++/CLI) and Java will be more portable for the average developer than C++ is. Get
yourself a copy of Java I/O, and read it. Consider what that is like for the average 18 to 24-year-old trying cs major. Forget that you've been doing this stuff for so ling that you can writhe hello worl in assebler without looking anything up.

http://www.cafeaulait.org/book s/javaio/

How can C++ I/O be more like that without breaking it? I want a clear and easy way to create a std::vector< unsigned char > v(begin, end);
where /begin/ and /end/ are the start and end+1 of a file opened in binary mode.

Any, yes, the last I looked, Java is impemented in C and C++ with a whole
bunch of asm stuff mixed in.

 
At 11:10 PM, August 20, 2005, Anonymous Anonymous said...

Hey, you have a great blog here! I'm definitely going to bookmark you!

I have a fast investment make money no site/blog. It pretty much covers fast investment make money no related stuff.

Come and check it out if you get time :-)

 

Post a Comment

<< Home